Adapting Win32 Enumeration APIs to STL Iterator Concepts

Courtesy of Windows Developer Network (March 2003)

By Matthew Wilson

Operating system APIs that allow access to collections of entities are common; for example, the UNIX opendir()/readdir() functions. The Win32 API has many functions for enumerating the elements in collections, expressing one of a number of common models — including call-backs (i.e., EnumChildWindows()), get-first/get-next (i.e., FindFirst/NextFile()), and get-nth (i.e., RegEnumValue()) — which are quite different in semantics.

The C++ community is gradually moving towards an STL-compliant common coding form, in terms of containers (sequence and associative), iterators, functionals (or functors, or function objects), algorithms, and adaptors. The great advantage with this is that disparate entities can be manipulated in a common fashion, dramatically reducing developer effort, and at the same time improving robustness, maintainability, and reuse.

One of the reasons that the use of STL techniques is not as widespread as it might be is that, apart from the classes and functions provided by the standard library itself (list, vector, for_each, and so on), there is a paucity of available STL-compliant libraries, particularly for operating system APIs. This is in part due to the complexity involved in translating one programming model to another, particularly when talking about collections and sequences. This article describes the practical application of STL sequence concepts to two Win32 enumeration models, creating lightweight sequence classes that wrap the APIs and provide iterators that can be used for manipulating the enumerated entities by algorithms in an STL-standard fashion.

The classes presented form part of the WinSTL libraries. WinSTL [1] is a subproject of STLSoft [2], a public domain, open-source organization that applies STL concepts to a variety of technologies and operating systems.

The first class presented, winstl::findfile_sequence, demonstrates an adaptation of the get-first/get-next enumeration model, as used by the FindFirstFile() API for the enumeration of filesystem entities, to the STL Input Iterator concept. As described later, the get-first/get-next model may only be straightforwardly adapted to the Input Iterator concept because the handles they provide are not usually cloneable (i.e., copying the handle value does not do a deep copy of the enumeration state).

The second class presented is the winstl::reg_key_sequence, which is an adaptation of the get-nth model as used in the RegEnumKeyEx() API for the numeration of registry keys. As we shall see, get-nth enumeration models are readily adaptable to both the Input Iterator concept and the Forward Iterator concept and, with a little effort, the Bidirectional and Random Access Iterator concepts. This is because iteration state is represented as a numeric index, so iterators may be copied, iteration may be forked, and sequence-subsets iterated.

Also discussed in this article (and supplied in the archive) are a couple of example programs that demonstrate the ease of use of these sequence classes. One is a simple GUI program that recursively searches a registry key and places the items in a tree, and the other is an STL reworking of the classic WHEREIS program.

Sequence Concept

The STL defines a Sequence [3] as a "variable-sized container whose elements are arranged in a strict linear order, [that] supports insertion and removal of elements," and stipulates a number of characteristics, properties, and operations that a sequence should support.

These stipulations are vital for the development of the actual STL (container) class libraries, since they provide guarantees that users of such libraries rely on when selecting one type of container over another (for instance, one may prefer a list for constant time insertion, or a vector to be able to pass the contents as an array). Unfortunately, when adapting different enumeration models, over whose implementation we likely have no control, many of the constraints of the Sequence concept may not be achievable. Most of this nonconformance is moot when the adapted sequence is read only, as are those presented in this article, but I shall nevertheless refer to the classes presented as sequences with a small "s," in respect of their lack of strict conformance. (To further ease the burden of understanding in this nomenclatural haze, I shall refer to the "iteration" of the sequence classes and their iterators, as opposed to the "enumeration" of the underlying API collections.)

The important characteristics of adapted sequences are that they provide a common and well-understood syntax, notably in the form of their iterators, and that they provide correct and well-defined semantics. As this article will demonstrate, the first of these challenges is usually relatively straightforward, and the second one usually isn't.

Input Iterator Concept

All Sequences provide iterators, which support one or more of the iterator concepts [4, 5]. When dealing with read-only Sequences, the four models of concern are Input, Forward, Bidirectional, and Random Access, each of which is a superset of the previous one. There are a number of good sources for definitions of the iterator concepts [5, 4], so I will give a brief explanation in so far as the characteristics of the concepts pertain to implementing sequences.

An Input Iterator [6] has many of the qualities — equality-comparable [7], dereferenceable, incrementable — commonly associated with iterators in general. The important characteristic missing from an Input Iterator is that of being cloneable (copying with deep enumeration semantics, in particular the enumeration state), which means that a range defined by Input Iterators may support only single-pass algorithms. The salient phrases from the SGI web site [6] are "After executing ++i, it is not required that copies of the old value of i be dereferenceable or that they be in the domain of operator==()" and "It is not guaranteed that it is possible to pass through the same input iterator twice." Matt Austern [5] also provides an excellent theoretical exposition of this subject, but I want to give a simple practical example, since this is about the most misunderstood aspect of the differences between the iterator concepts.

Consider the case where we want to identify files residing on a case-sensitive filesystem that would clash as duplicates on case-insensitive filesystems. We could use the unixstl::readdir_sequence (UNIXSTL [8] is a peer project with WinSTL) class, which is similar to the WinSTL findfile_sequence class described below, to locate them, as in Listing 1.

For most iterator concepts, the code would have the expected behavior, and successfully identify and trace all case-insensitive name clashes. However, because unixstl:: readdir_sequence::const_iterator is an Input Iterator, the code will compile without problems, and will also execute and find any duplicates of the first file in the sequence, but will never execute the outer loop more than once. This is because when the begin_o iterator is copied to the begin_i iterator, the state of the iteration will also be passed over to it. Therefore, the inner loop will be affecting (either directly or indirectly) begin_o as well as begin_i. Depending on the precise nature of the sequence implementation, a subsequent increment of begin_o will either crash/hang the program or will modify begin_o to be equal to end_o. In either case, the intended behavior will not be seen.

This is the characteristic restriction of Input Iterator sequences to single-pass iteration. In most cases this may be all that is required, but multiple passes can be more subtle than you might think, since iteration loops are also contained within many algorithms, such as for_each, sort, and transform. For example, the code in Listing 2 is intended to count the number of matching files, using the count_if algorithm and the not1, ptr_fun, and bind1st function adapters (which effectively allow the free function, strcmp_no_case(), along with the value of *begin_o, to act as an invariant against which to test the inner-loop items). Nevertheless, this would fail in exactly the same way as in the previous version; despite the fact that there is no obvious copying and iterating, the count_if() algorithm does both, hence the effective multipass.

Interestingly, the very fact that algorithms need to be (singly) applicable to Input Iterators require that they have copy constructors, due to the fact that algorithms take their iterator parameters by value. This forces them to have copyable syntax, despite this being semantically more like a move than a copy. (Move semantics are actually quite a lot more complex than one might imagine, as described in [9].)

It is possible to write iterator classes that do not have copy constructors at all, and this would prevent the problem of multipass algorithms for Input Iterators, protecting against the problem by causing a compile error. It would, however, also prevent their use with algorithms. But algorithms are an essential and useful part of the STL, and especially so when used with other types of iterators that support inner-loop scenarios as the one just mentioned. Conversely, if algorithm iterator arguments were by reference to help their use with input-iterators, then such inner-loops would have the undesirable side effect of altering outer-loop state. On balance, therefore, algorithms take iterators by value, all other iterator concepts are correctly and usefully supported, and we just need to be mindful of the multipass caveat for Input Iterator types.

Forward Iterator Concept

The Forward Iterator [10, 5] concept essentially describes a mechanism for traversing a linear sequence of values. (SGI's site [10] describes it thus: "It is possible to use Forward Iterators in multipass algorithms. Forward Iterators do not, however, allow stepping backwards through a sequence, but only, as the name suggests, forward.") The Forward Iterator concept is considered to be a refinement of both Input Iterator and Output Iterator [11] concepts. For our purposes (since we are only implementing sequences that are essentially constant), we need focus only on the capability over and above Input Iterators of supporting multipass algorithms.

It comes in two types: mutable and constant. Mutable forward iterators allow modification of the values in the sequence. Constant forward iterators (usually defined as the member type const_iterator of the providing sequence) allow only read-only access to the values in the sequence. All the sequences we consider in this article provide constant iterators only. (Of course, the terminology is not the best. They could be better named as mutating-iterators and nonmutating-iterators, since the iterators themselves are not mutable or constant, rather the values they refer to.)

In terms of implementation, the ability to support multipass algorithms may or may not be a major technical challenge, depending on the particular API for which we are providing iterable access. For example, the Win32 FindFile API does not contain any facility for cloning a search handle. If you copy the search handle, then the underlying enumeration state is affected in the same way whether you call FindNextFile() on the original handle or on the copy. We shall see how this issue restricts the implementation described later.

Bidirectional Iterator Concept

Conceptually, the Bidirectional Iterator [12] concept is identical to that of the Forward Iterator concept, with the additional ability to iterate backwards as well as forwards. In practice, however, providing the decrement operators adds an overhead, although in most cases this is the same or less work than was required to implement the forward iteration.

Bidirectional containers (and our sequences) provide, in addition to the begin() and end() methods for forward iteration, the methods rbegin() and rend(), which provide reverse iterators that may be used in iterator backwards through the range of values contained in the sequence. Scott Meyers provides an illuminating examination of the relationship of forward and reverse iterators, which I highly recommend, in Effective STL [13].

The mechanics of reverse iteration is such that it is built on the ability of a container's iterators to iterate in both the forward and reverse directions, hence bidirectional. Thus, in addition to the ++() operator (or operators, if providing both pre and post versions), the — () operator(s) must be implemented. Then the rbegin() and rend() methods are implemented in terms of end() and begin() and a reversing type (usually std::reverse_iterator), as can be seen in Listing 3.

It is important to note that, unlike the Input and Forward Iterator concepts, the end iterator value (that returned by end()) must be decrementable in Bidirectional Iterators, because it is by decrementing the underlying end() value that rbegin() increments from its starting position. This can have a bearing on the choice of implementation for an enumeration adapting implementation, as we shall see in the implementation of the reg_key_sequence iterator class.

Random Access Iterator Concept

The Random Access Iterator concept represents the most powerful of all the iterator concepts, and includes the semantics of pointers. Iterators supporting the Random Access Iterator concept have all the abilities expressed in the concepts already described, along with the ability to engage in pointer, or rather iterator, arithmetic. For example, one can now write statements such as:

X::iterator   it = x.begin();
  size_t        cElems = x.end() - it;

  it += cElems - 1;
  ( — it)[1];

Because of this, whereas iterators supporting all other Iterator concepts must support the Equality Comparable [7] concept, those supporting the Random Access concept must also be LessThan Comparable [14]. This is due to the requirement for testing whether one iterator has exceeded another one, not just whether it has equaled it.

It is frequently the case that when a sequence (or any STL Container) provides Random Access Iterators, it can also provide indexing operators (i.e., operator [](int index)). The best example of a Random Access sequence is that of a vector, with which items may be access by the usual begin()-end() iteration, or by the index operator (i.e., v[12] = "13th element").

As I mentioned, the Random Access Iterator concept incorporates the semantics of pointer types. This is an important point, since it is a challenge for most new (and some not so new) to the STL to realize that pointers are actually supported by only this most sophisticated iterator model, even though they have the simplest implementation (nothing to implement!) and the most straightforward semantics.

Steps in Implementing Sequences

A common usage scenario for STL-compliant sequences is:

Declare the sequence, instantiating with sequence class-specific information.
Iterate through the items from the beginning iteration point — obtained from begin() — incrementing it until the end point — obtained from end() — is reached, or the iteration is stopped for another reason.
For each item in the sequence, process it, obtaining it by dereferencing the iterator.

This can best be seen by a simple example (Listing 4), using the WinSTL findfile_sequence class.

This declares an ANSI version of the findfile_sequence template, and instantiates it with parameters requesting a search for all files in the directory C:\. It then iterates through all the items and prints out the names of all the files found. Sequences are that simple to use; the complexity is in implementing them in a correct, robust, and efficient way.

When implementing STL-compliant sequences, there are a number of decisions to be made.

The most important decision is whether your sequence is going to be a read-only view on the collection it is representing, or whether it will allow modification of the collection. If the collection is to be modifiable, then you also need to decide whether modifications will apply to the items in the collection, or to the collection contents itself (i.e., removing, adding, or sorting elements). Nonmodifiable collections are the most commonly needed type and are by far the easiest to implement, and all the sequences described in this article are of this type.
Nonmodifiable sequences only need to provide const begin() and end() methods, each returning instances of const_iterator. (Modifiable sequences would provide non-const begin() and end() methods, and also erase(), insert(), and all the others described in Sequence Concept [3].)
The next decision is to determine what kind of iteration concept your sequence will provide. This is dependent not only on which model the underlying API facilitates, but also on how much effort you are willing to expend in order to leverage that support, or supplement it. For example, you may decide that even though the API will support the Random Access Iterator concept [15], you will be quite happy with the Bidirectional model. Conversely, you might need a minimum of Forward Iterator concept, and decide to provide that despite the extra work, even when the underlying API supports only Input Iteration.
Another important factor is determining how the items of the enumeration will be represented as the value_type of the sequence class and the reference type of the iterator. Dereferencing iterators (applying the indirection operator — operator *()) can return values, references, or pointers to class/built-in types, and the selection of which will depend on whether you have a value to point/refer to, and also on efficiency and API stability. For example, if you were implementing a sequence that represented child windows, then the value-type could be HWND. For more sophisticated types, you need to carefully consider the various options. Occasionally, the value_type returned can be by reference (i.e., string *, string const &), where the actual underlying item can be referenced (usually via pointer), but it is much more common to need to return the value_type by value (i.e., string). Care must be taken, therefore, to provide correct copy construction/assignment semantics for your value type. A further, more complex, option is to use a proxy type as the value_type, and defer expensive operations until the value_type instances are actually used.
Once you have decided what your value type is, you then need to work out the best way of implementing your iterator. There are a number of factors involved here (see the "Deriving User-Defined Iterator Types" sidebar).
1. First, you need to determine whether your iterator will support the member-selection operator — operator ->() — or will support only the indirection operator — operator *(). This is a very important distinction, but the decision is rarely difficult once you get into the design of your class. Because the dereferencing operator must return an instance of a type to which the dereferencing operator can also be (re)applied, this constrains your using it to sequences from which one can obtain a pointer or reference to the instance. Since there usually isn't a pointer available to the underlying value residing within the API implementation, you may need to synthesize a copy in order to provide something to which the dereferencing operator of your iterator can point. However, most times you will want to implement as efficient a sequence as possible over the enumeration API you are using, so you might prefer to provide only the indirection operator. This should not significantly reduce the appeal of your API, since common idiom, and all the standard library algorithms, access iterators via the indirection operator only. The only likely complaints will come from users of your code who prefer typing:
```
  it->do_something();
```
  to
```
  (*it).do_something();
```
  and who claim, correctly in a few cases, that the former syntax is more efficient. However, as we have seen, most sequence containers that you implement over enumeration APIs will not have access to anything for which this argument could be put, since an instance of the value_type does not exist to be pointed to and so is synthesized by operator *(). (This is in contrast to standard containers, which contain elements directly, and therefore can give out pointers and references to these elements.) Hence, constraining your iterator syntax to the latter form will, overall, lead to writing algorithms that are more widely applicable.
2. Second, there is the issue of where the enumeration state is actually stored: Is it entirely in the iterator instance, or does the iterator store merely a part of this state and use a reference to the container in order to access the rest?
3. Closely related to this is whether the actual API enumeration starts in the container when it creates the begin iterator in its begin() method, or whether the iterator itself starts the enumeration when it is used.
4. Another important factor to consider is whether to provide post-increment/decrement operators in addition to the pre-increment/decrement ones. Whilst most STL algorithms use the pre-form, there are a few that need to use the post-form, and although some compilers will adapt the pre-form and use a temporary, other will fail to compile. (For example, Borland C++ 5.5 reports "Overloaded prefix 'operator ++' used as a postfix operator in function for_each.") For maximal compatibility, therefore, it is usually worth the extra effort.
5. Next, you must work out what your end condition is and, therefore, what end() should be implemented to return. Most often it is implemented to return a default-constructed instance of the iterator type, but sometimes it is necessary to construct the iterator instance with a marker value (perhaps a NULL pointer, or a negative index), known as a sentinel.
6. For Input and Forward iterators, this sentinel can be an arbitrary unique value. However, for Bidirectional and Random Access Iterators it is necessary that the decrement operator ( — ()) be callable on the end() value, the results of which must be a meaningful step in the reverse iteration sequence.
7. Finally, when it comes to comparing iterators, what matters is that the Equality and/or LessThan comparisons are always valid and meaningful. In particular, it is important not that the return from calls to end() and a one-past-the-end iterator are the same, but that they are equality-comparable; in other words, that there is an accessible and applicable operator ==() that returns True.
Normally, the use of containers and their modifications of the container contents are based on an explicit relationship between modification and its effects on iteration state. This is not the case with API collection wrappers; especially those based on OS & shared state. The sequence and its iterators must be coded such that they are robust in the face of such changes. The client code must neither crash nor go into an infinite loop. This can lead to taking some extra measures in the copy (or move [9]) construction and also in moving to and from the end() value.

findfile_sequence

Now that we've discussed the issues involved lets look at some actual implementations. The winstl::findfile_sequence class provides the ability to enumerate files and/or directories from a given search specification (i.e., "xyz*.p??"), in a given directory. The sequence is layered on top of the Win32 FindFile API, which comprises the three functions:

HANDLE FindFirstFile(
    LPCTSTR lpszSearchSpec, 
    WIN32_FIND_DATA *lpFindData);
BOOL FindNextFile(
    HANDLE hFind, 
    WIN32_FIND_DATA lpFindData);
BOOL FindClose(HANDLE hFind);

FindFirstFile() starts an enumeration and creates a search handle to the caller, or returns INVALID_HANDLE_VALUE to indicate failure. FindNextFile() moves to the next element in the collection, or returns FALSE to indicate that the enumeration is complete. FindClose() closes a valid search handle and must be called, otherwise kernel resources are leaked until the process terminates. A typical program for the use of these functions is shown in Listing 5. Note that the API returns both files and directories, including the current directory "." and (for nonroot directories) the parent directory "..".

The findfile_sequence class wraps this functionality in the form of an Input Iterator-providing sequence. In addition, it contains logic for filtering out files, directories, and the "dots" (. and ..), such that you can request any permutation of these types at construction, relieving client code of the logic you see in Listing 5, as can be seen in Listing 4.

The class is actually a template class, winstl::basic_findfile_sequence, which supports different parameterizations for use with ANSI and Unicode. The full implementation of this and all the supporting classes (winstl::basic_findfile_sequence_value and winstl::basic_findfile_sequence_const_iterator) is available in the archive and online [1], but simplified and truncated versions are shown in Listings 6 and 7. We focus on the ANSI parameterization, findfile_sequence_a, in the article and sample programs, but the principles described apply equally to any parameterization. The implementation addresses the decisions outlined above in the following ways:

The sequence is read only.
Only the Input Iterator concept is supported. (In actual fact, the full implementation of this class library also provides the more powerful but, in this case, less efficient forward iteration via its begin_fwd() and begin_end() methods. You can select this as the default via a preprocessor symbol definition.)
The value_type is the winstl::basic_findfile_sequence_value class, which is a simple class containing a fixed-sized (based on _MAX_PATH) character buffer, the search handle, and a back pointer to the sequence class (in order to obtain the full path). (Note that because filesystem entities have finite name lengths, all the members of the findfile_sequence_value class are plain members, avoiding expensive heap allocations.) The findfile_sequence_value class is also the iterator reference type, since the iterator does not store an instance to which a reference could be given. The original implementation, which used the raw WIN32_FIND_DATA as the value_type, did in fact return (WIN32_FIND_DATA const &) as the iterator reference, which was slightly more efficient. However, this structure contains filenames only (no path) so the usability of the enhanced functionality provided in the current version was deemed to be worth the slight cost in efficiency (though the path is synthesized only when the value_type instance is retrieved via the iterator indirection operator) and complexity.
1. Using the findfile_sequence_value class also means that the original implementation's dereferencing operator (operator ->()) can no longer be supported.
2. The iteration state is comprised of the search handle only, on which subsequent calls to FindNextFile() are made. When the enumeration is complete (i.e., when FindNextFile() returns FALSE), the iterator closes the handle with FindClose().
3. The iteration is commenced in the container (in the _begin() method called by begin()), which calls FindFirstFile(). If this call succeeds, then a valid iterator is returned; otherwise, the end() iterator value is returned.
4. The iterator class provides both pre- and post-increment operators, the latter being a simple implementation (see Listing 7) using a temporary and a call to the former. The post-forms are less efficient, since they incur the additional cost of creation and destruction of the temporaries.
5. The end() condition is implemented simply by setting the iterator's handle to the well-known INVALID_HANDLE_VALUE pseudo-value.
6. Since we are only supporting the Input Iterator concept, we do not provide the — () operator(s).
7. Iterator comparison is affected by determining that either both iterator's handles are set to INVALID_HANDLE_VALUE, or that neither are, in which case a test against filename is conducted. This supports testing end() iterator values, and has the nice side effect of allowing safe and meaningful testing of iterators from one (begin(), end()) [5] range with those of another, in effect emulating multiple passes by separate enumerations on the same sequence class. If we did not want to support this capacity, then the comparison could simply be based on handle equality, and since all handle values are distinct but constant for any enumeration, the only positively equating comparison would be between end() values, which is correct behavior for Input Iterators.
The sequence and iterator classes are robust in the face of changes to the underlying file-system contents, since the end() value is based on INVALID_HANDLE_VALUE. This prevents any iteration sequence from going uncontrollably "off the end," even when the filesystem is being actively changed during an iteration. Furthermore, since the iterator copy constructor is implemented as a move-constructor [9], which sets the copied iterator to the end() value, the infinite looping mentioned in the Input Iterator Concept section is avoided, and multipass iteration sequences fail gracefully. (Without the use of the move-constructor technique, the sample programs shown in the Input Iterator Concept section using the findfile_sequence class actually hang inside KERNEL32.DLL.)

reg_key_sequence

The API function used for enumerating registry keys is RegEnumKeyEx(), which has the following signature:

LONG RegEnumKeyEx(
  HKEY hKey,		// handle to key to enumerate
  DWORD dwIndex,		// subkey index
  LPTSTR lpName,		// subkey name
  LPDWORD lpcName,		// size of subkey buffer
  LPDWORD lpReserved,		// reserved
  LPTSTR lpClass,		// class string buffer
  LPDWORD lpcClass,		// size of class string buffer
  PFILETIME lpftLastWriteTime	// last write time
);

The important thing to notice is the second parameter, dwIndex, which is the index to the requested item. To enumerate through a series, you call it with an index of 0 and continue with incrementing values until the return value is non-0. (The function returns ERROR_NO_MORE_ITEMS when the index is greater than or equal to the number of subkeys available.) A sequence iterator class would, of course, use the index for storing enumeration sequence state, and you do, therefore, get Forward Iterator support for free. (Indeed, it would be very hard to devise an implementation that did not provide Forward Iterator behavior.)

The full implementation of the winstl::basic_reg_key_sequence template class (and its supporting winstl::basic_reg_key_sequence_ const_iterator and winstl::basic_reg_key_sequence_value classes) is available in the archive and online, but a simplified and truncated version is shown in Listings 3 and 8. (We focus on the ANSI parameterization, reg_key_sequence_a.) The implementation addresses the decisions outlined above in the following ways:

The sequence is read only.
The Bidirectional Iterator concept is supported. Since the enumeration state is represented by the index, it was a simple matter to implement a Bidirectional Iterator by adding operator — (), implemented simply by decrementing the index. The support is completed by supplying rbegin() and rend() returning instances of the sequence member type const_iterator. (This is implemented by using a class, stlsoft::reverse_bidirectional_iterator_base, to insulate the implementation from compiler specifics). It would be eminently feasible to provide Random Access iteration, which allows pointer arithmetic to be applied to iterators and the index operator — operator []() — to be applied to iterators and containers. However, no users have yet expressed a requirement for such capabilities, so it simply has not been done yet. (If anyone wants to take this up as an exercise for the reader...)
The value_type chosen is the winstl::basic_reg_key template class, which is a simple class that wraps a registry key and provides a few methods for querying the attributes of a key, including its name. The name is of string class type, since despite the fact that registry entity name lengths are limited to 255 characters on Windows 9x (95, 98, Me), other Win32 operating systems allow an unlimited representation. Because the registry API is accessed by an index, we do not have access to any underlying value state, so the reference type for the iterator is also reg_key; i.e., the iterator returns the result of operator *() by value. The reg_key class stores a duplicate copy of the key because it is reasonable for values to exist outside the lifetime scope of the sequence (something that cannot be said for the iterators).
The iterator is implemented as a separate class, which has a friendship relationship with the sequence class. (This is one of the few valid uses of the powerful C++ friend keyword, since the sequence and iterator classes are closely related and not meaningful independently of each other.)
1. The registry enumeration model does not readily support the dereferencing operator (operator ->()).
2. The iteration state is comprised of the registry key (against which to query for the next item name) and the index (representing the iteration state). The iterator class maintains a thin-copy, not a duplicate, of the registry key because the iterators are not valid outside the lifetime of the sequence from which they are obtained. Although the RegEnumKeyEx() takes an unsigned 32-bit integer for the index, the iterator's index type is signed, in order that the sentinel value can be negative. (It is unlikely that any registry key will have in excess of 2 billion subkeys, so there is little likelihood of this causing a problem!) No back pointer is maintained on the iterator's sequence class, nor is the key duplicated (it is simply copied), because STL iterators are only valid within the scope of their providing-sequence/container's lifetime. Note that for the sake of coherency, the iterator also contains the subkey name. While it is true that getting the name at the time of enumeration is less efficient in the cases where not every iteration state has its value taken (i.e., when doing a distance() or advance() [13]), in most cases this is not relevant. Moreover, since the registry may change due to action in another thread or process, it is arguably more correct to reflect the real state at iteration time rather than indirection time.
3. The enumeration is begun in the container's begin() method, which returns end() if RegEnumKeyEx() fails.
4. The iterator class provides both pre- and post-increment and decrement operators, with the post-forms implemented in terms of temporaries and the pre-forms. As usual, the pre-forms are more efficient.
5. Ordinarily, the iterator returned by a container's end() function contains a value that is linearly one off the end of the sequence, hence for pointer iterators end() == begin() + size(). This supports tests for equality as well as allowing for decrementing the end iterator back into the sequence proper. However, API enumeration changes would cause such an implementation to have potentially serious problems. If the end condition was implemented by the index value being equal to the number of subkeys of the registry key, this would be very fragile, since changes to the underlying sequence could lead to the end() method returning an iterator instance that fails to equate to one that had been iterated "off-the-end." Therefore, the iterator class has a static method returning a negative index (a sentinel value) and when an attempt to access the next element fails (i.e., end has been reached) the iterator index is set to this value. The iterator operator ==() simply checks for equality between the index members (though it does contain an assert verifying that the two iterators are operating on the same registry key).
6. In order to support the Bidirectional concept, the — () operator looks for the sentinel value, and then sets the actual index to the current number of subkeys just prior to its being decremented and RegEnumKeyEx() being called. This scheme is robust even in the case where the sequence has no elements, because a failure of the RegEnumKeyEx() function simply results in the iterator taking the sentinel; i.e., the end() iterator value.
7. The comparison of the iterators simply compares the index members of the two iterators. There is an assert that the iterators being compared are in respect of the same registry key, but this is not a run-time feature because this would be an unnecessary test in correctly written code. It should be noted that the STL iteration model generally does not assume any kind of bounds checking, for performance reasons, and relies on an iterator to be checked against the end() value. I have followed this policy in the implementations of the classes in the WinSTL library.
The sequence and iterator classes are robust in the face of changes to the underlying registry contents, since the use of the sentinel prevents the end() value comparisons from erroneously comparing an off-the-end index (say 17), with another obtained subsequent to a couple of subkeys being deleted (say 15). In both cases the actual index will be the sentinel value, and comparison will be correct. It is also more efficient to use the sentinel than to query for the number of subkeys, particularly if doing so every time the end() value is returned in order to catch underlying changes to collection. The only caveat is that the value obtained from the size() method prior to an iteration and the number of elements iterated through can differ under conditions of underlying changes.

Using the Sequences

Using the sequences is as simple or as complex as using any other STL sequence. In Listing 4 we saw a simple example. A more complex example is seen in Listing 9, which shows part of a replacement for the common WHEREIS utility. This program uses the winstl::findfile_sequence along with another WinSTL class, the searchpath_sequence (which provides an ordered collection of search paths for the calling process), such that a set of search sequences, such as "*.cpp;*.d;*.xml", can be applied to either a given directory, or to the system search paths. The full source is in the archive.

Also included in the archive is the source to REGTREE (Listing 10, Figure 1), a simple registry enumerator program, that demonstrates a standard enumeration using for_each, as well as illustrating a combination of algorithms and sequences to perform a recursive enumeration of the registry.

Summary

This article has described some of the ideas represented in the STL Sequence and Iterator concepts. It has also highlighted some issues that are important when implementing your own STL-compliant sequences and iterators, and has illustrated these issues by presenting two enumeration API- adapting sequence classes, findfile_sequence, and reg_key_sequence, and their supporting classes.

I've endeavored to show that the implementation of correct, robust, and efficient sequences and iterators, though involved, is not as complex as might be expected. Hopefully, I've demonstrated that using such classes is straightforward indeed, especially so given that they can be used with standard algorithms and functionals, as shown in the example programs included with this article.

All the WinSTL classes, and supporting STLSoft code, are available online ([1]), along with an increasing number of other classes, algorithms, and functionals. Furthermore, the libraries are open to expansion for anyone who wishes to submit their own classes.

References

[1] http://winstl.org/.
[return to text]

[2] http://stlsoft.org/.
[return to text]

[3] http://www.sgi.com/tech/stl/Sequence.html.
[return to text]

[4] http://www.sgi.com/tech/stl/Iterators.html.
[return to text]

[5] Generic Programming and the STL: Using and Extending the C++ Standard Template Library, Matthew Austern, Addison-Wesley, 1998.
[return to text]

[6] http://www.sgi.com/tech/stl/InputIterator.html.
[return to text]

[7] http://www.sgi.com/tech/stl/EqualityComparable.html.
[return to text]

[8] http://unixstl.org/.
[return to text]

[9] "Move Constructors," Matthew Wilson, Synesis Technical Report, http://www.synesis.com.au/resources/articles/cpp/movectors.pdf.
[return to text]

[10] http://www.sgi.com/tech/stl/ForwardIterator.html.
[return to text]

[11] http://www.sgi.com/tech/stl/OutputIterator.html.
[return to text]

[12] http://www.sgi.com/tech/stl/BidirectionalIterator.html.
[return to text]

[13] Effective STL, 50 Specific Ways to Improve Your Use of the Standard Template Library, Scott Meyers, Addison-Wesley, 2001.
[return to text]

[14] http://www.sgi.com/tech/stl/LessThanComparable.html.
[return to text]

[15] http://www.sgi.com/tech/stl/RandomAccessIterator.html.
[return to text]

[16] The C++ Programming Language, Third Edition, Bjarne Stroustrup, Addison-Wesley, 1997.

Matthew Wilson holds a degree in Information Technology and a Ph.D. in Electrical Engineering, and is a software development consultant for Synesis Software. Matthew's work interests are in writing bulletproof real-time, GUI, and software-analysis software in C, C++, and Java. He has been working with C++ for over 10 years and is currently bringing STLSoft.org and its offshoots into the public domain. Matthew can be contacted via matthew@synesis.com.au or at http://stlsoft.org/.