Courtesy of Windows Developer Network (March 2003)
Operating system APIs that allow access to collections of entities are common; for example, the UNIX opendir()/readdir() functions. The Win32 API has many functions for enumerating the elements in collections, expressing one of a number of common models including call-backs (i.e., EnumChildWindows()), get-first/get-next (i.e., FindFirst/NextFile()), and get-nth (i.e., RegEnumValue()) which are quite different in semantics.
The C++ community is gradually moving towards an STL-compliant common coding form, in terms of containers (sequence and associative), iterators, functionals (or functors, or function objects), algorithms, and adaptors. The great advantage with this is that disparate entities can be manipulated in a common fashion, dramatically reducing developer effort, and at the same time improving robustness, maintainability, and reuse.
One of the reasons that the use of STL techniques is not as widespread as it might be is that, apart from the classes and functions provided by the standard library itself (list, vector, for_each, and so on), there is a paucity of available STL-compliant libraries, particularly for operating system APIs. This is in part due to the complexity involved in translating one programming model to another, particularly when talking about collections and sequences. This article describes the practical application of STL sequence concepts to two Win32 enumeration models, creating lightweight sequence classes that wrap the APIs and provide iterators that can be used for manipulating the enumerated entities by algorithms in an STL-standard fashion.
The classes presented form part of the WinSTL libraries. WinSTL [1] is a subproject of STLSoft [2], a public domain, open-source organization that applies STL concepts to a variety of technologies and operating systems.
The first class presented, winstl::findfile_sequence, demonstrates an adaptation of the get-first/get-next enumeration model, as used by the FindFirstFile() API for the enumeration of filesystem entities, to the STL Input Iterator concept. As described later, the get-first/get-next model may only be straightforwardly adapted to the Input Iterator concept because the handles they provide are not usually cloneable (i.e., copying the handle value does not do a deep copy of the enumeration state).
The second class presented is the winstl::reg_key_sequence, which is an adaptation of the get-nth model as used in the RegEnumKeyEx() API for the numeration of registry keys. As we shall see, get-nth enumeration models are readily adaptable to both the Input Iterator concept and the Forward Iterator concept and, with a little effort, the Bidirectional and Random Access Iterator concepts. This is because iteration state is represented as a numeric index, so iterators may be copied, iteration may be forked, and sequence-subsets iterated.
Also discussed in this article (and supplied in the archive) are a couple of example programs that demonstrate the ease of use of these sequence classes. One is a simple GUI program that recursively searches a registry key and places the items in a tree, and the other is an STL reworking of the classic WHEREIS program.
The STL defines a Sequence [3] as a "variable-sized container whose elements are arranged in a strict linear order, [that] supports insertion and removal of elements," and stipulates a number of characteristics, properties, and operations that a sequence should support.
These stipulations are vital for the development of the actual STL (container) class libraries, since they provide guarantees that users of such libraries rely on when selecting one type of container over another (for instance, one may prefer a list for constant time insertion, or a vector to be able to pass the contents as an array). Unfortunately, when adapting different enumeration models, over whose implementation we likely have no control, many of the constraints of the Sequence concept may not be achievable. Most of this nonconformance is moot when the adapted sequence is read only, as are those presented in this article, but I shall nevertheless refer to the classes presented as sequences with a small "s," in respect of their lack of strict conformance. (To further ease the burden of understanding in this nomenclatural haze, I shall refer to the "iteration" of the sequence classes and their iterators, as opposed to the "enumeration" of the underlying API collections.)
The important characteristics of adapted sequences are that they provide a common and well-understood syntax, notably in the form of their iterators, and that they provide correct and well-defined semantics. As this article will demonstrate, the first of these challenges is usually relatively straightforward, and the second one usually isn't.
All Sequences provide iterators, which support one or more of the iterator concepts [4, 5]. When dealing with read-only Sequences, the four models of concern are Input, Forward, Bidirectional, and Random Access, each of which is a superset of the previous one. There are a number of good sources for definitions of the iterator concepts [5, 4], so I will give a brief explanation in so far as the characteristics of the concepts pertain to implementing sequences.
An Input Iterator [6] has many of the qualities equality-comparable [7], dereferenceable, incrementable commonly associated with iterators in general. The important characteristic missing from an Input Iterator is that of being cloneable (copying with deep enumeration semantics, in particular the enumeration state), which means that a range defined by Input Iterators may support only single-pass algorithms. The salient phrases from the SGI web site [6] are "After executing ++i, it is not required that copies of the old value of i be dereferenceable or that they be in the domain of operator==()" and "It is not guaranteed that it is possible to pass through the same input iterator twice." Matt Austern [5] also provides an excellent theoretical exposition of this subject, but I want to give a simple practical example, since this is about the most misunderstood aspect of the differences between the iterator concepts.
Consider the case where we want to identify files residing on a case-sensitive filesystem that would clash as duplicates on case-insensitive filesystems. We could use the unixstl::readdir_sequence (UNIXSTL [8] is a peer project with WinSTL) class, which is similar to the WinSTL findfile_sequence class described below, to locate them, as in Listing 1.
For most iterator concepts, the code would have the expected behavior, and successfully identify and trace all case-insensitive name clashes. However, because unixstl:: readdir_sequence::const_iterator is an Input Iterator, the code will compile without problems, and will also execute and find any duplicates of the first file in the sequence, but will never execute the outer loop more than once. This is because when the begin_o iterator is copied to the begin_i iterator, the state of the iteration will also be passed over to it. Therefore, the inner loop will be affecting (either directly or indirectly) begin_o as well as begin_i. Depending on the precise nature of the sequence implementation, a subsequent increment of begin_o will either crash/hang the program or will modify begin_o to be equal to end_o. In either case, the intended behavior will not be seen.
This is the characteristic restriction of Input Iterator sequences to single-pass iteration. In most cases this may be all that is required, but multiple passes can be more subtle than you might think, since iteration loops are also contained within many algorithms, such as for_each, sort, and transform. For example, the code in Listing 2 is intended to count the number of matching files, using the count_if algorithm and the not1, ptr_fun, and bind1st function adapters (which effectively allow the free function, strcmp_no_case(), along with the value of *begin_o, to act as an invariant against which to test the inner-loop items). Nevertheless, this would fail in exactly the same way as in the previous version; despite the fact that there is no obvious copying and iterating, the count_if() algorithm does both, hence the effective multipass.
Interestingly, the very fact that algorithms need to be (singly) applicable to Input Iterators require that they have copy constructors, due to the fact that algorithms take their iterator parameters by value. This forces them to have copyable syntax, despite this being semantically more like a move than a copy. (Move semantics are actually quite a lot more complex than one might imagine, as described in [9].)
It is possible to write iterator classes that do not have copy constructors at all, and this would prevent the problem of multipass algorithms for Input Iterators, protecting against the problem by causing a compile error. It would, however, also prevent their use with algorithms. But algorithms are an essential and useful part of the STL, and especially so when used with other types of iterators that support inner-loop scenarios as the one just mentioned. Conversely, if algorithm iterator arguments were by reference to help their use with input-iterators, then such inner-loops would have the undesirable side effect of altering outer-loop state. On balance, therefore, algorithms take iterators by value, all other iterator concepts are correctly and usefully supported, and we just need to be mindful of the multipass caveat for Input Iterator types.
The Forward Iterator [10, 5] concept essentially describes a mechanism for traversing a linear sequence of values. (SGI's site [10] describes it thus: "It is possible to use Forward Iterators in multipass algorithms. Forward Iterators do not, however, allow stepping backwards through a sequence, but only, as the name suggests, forward.") The Forward Iterator concept is considered to be a refinement of both Input Iterator and Output Iterator [11] concepts. For our purposes (since we are only implementing sequences that are essentially constant), we need focus only on the capability over and above Input Iterators of supporting multipass algorithms.
It comes in two types: mutable and constant. Mutable forward iterators allow modification of the values in the sequence. Constant forward iterators (usually defined as the member type const_iterator of the providing sequence) allow only read-only access to the values in the sequence. All the sequences we consider in this article provide constant iterators only. (Of course, the terminology is not the best. They could be better named as mutating-iterators and nonmutating-iterators, since the iterators themselves are not mutable or constant, rather the values they refer to.)
In terms of implementation, the ability to support multipass algorithms may or may not be a major technical challenge, depending on the particular API for which we are providing iterable access. For example, the Win32 FindFile API does not contain any facility for cloning a search handle. If you copy the search handle, then the underlying enumeration state is affected in the same way whether you call FindNextFile() on the original handle or on the copy. We shall see how this issue restricts the implementation described later.
Conceptually, the Bidirectional Iterator [12] concept is identical to that of the Forward Iterator concept, with the additional ability to iterate backwards as well as forwards. In practice, however, providing the decrement operators adds an overhead, although in most cases this is the same or less work than was required to implement the forward iteration.
Bidirectional containers (and our sequences) provide, in addition to the begin() and end() methods for forward iteration, the methods rbegin() and rend(), which provide reverse iterators that may be used in iterator backwards through the range of values contained in the sequence. Scott Meyers provides an illuminating examination of the relationship of forward and reverse iterators, which I highly recommend, in Effective STL [13].
The mechanics of reverse iteration is such that it is built on the ability of a container's iterators to iterate in both the forward and reverse directions, hence bidirectional. Thus, in addition to the ++() operator (or operators, if providing both pre and post versions), the () operator(s) must be implemented. Then the rbegin() and rend() methods are implemented in terms of end() and begin() and a reversing type (usually std::reverse_iterator), as can be seen in Listing 3.
It is important to note that, unlike the Input and Forward Iterator concepts, the end iterator value (that returned by end()) must be decrementable in Bidirectional Iterators, because it is by decrementing the underlying end() value that rbegin() increments from its starting position. This can have a bearing on the choice of implementation for an enumeration adapting implementation, as we shall see in the implementation of the reg_key_sequence iterator class.
The Random Access Iterator concept represents the most powerful of all the iterator concepts, and includes the semantics of pointers. Iterators supporting the Random Access Iterator concept have all the abilities expressed in the concepts already described, along with the ability to engage in pointer, or rather iterator, arithmetic. For example, one can now write statements such as:
X::iterator it = x.begin(); size_t cElems = x.end() - it; it += cElems - 1; ( it)[1];
Because of this, whereas iterators supporting all other Iterator concepts must support the Equality Comparable [7] concept, those supporting the Random Access concept must also be LessThan Comparable [14]. This is due to the requirement for testing whether one iterator has exceeded another one, not just whether it has equaled it.
It is frequently the case that when a sequence (or any STL Container) provides Random Access Iterators, it can also provide indexing operators (i.e., operator [](int index)). The best example of a Random Access sequence is that of a vector, with which items may be access by the usual begin()-end() iteration, or by the index operator (i.e., v[12] = "13th element").
As I mentioned, the Random Access Iterator concept incorporates the semantics of pointer types. This is an important point, since it is a challenge for most new (and some not so new) to the STL to realize that pointers are actually supported by only this most sophisticated iterator model, even though they have the simplest implementation (nothing to implement!) and the most straightforward semantics.
A common usage scenario for STL-compliant sequences is:
This can best be seen by a simple example (Listing 4), using the WinSTL findfile_sequence class.
This declares an ANSI version of the findfile_sequence template, and instantiates it with parameters requesting a search for all files in the directory C:\. It then iterates through all the items and prints out the names of all the files found. Sequences are that simple to use; the complexity is in implementing them in a correct, robust, and efficient way.
When implementing STL-compliant sequences, there are a number of decisions to be made.
Nonmodifiable sequences only need to provide const begin() and end() methods, each returning instances of const_iterator. (Modifiable sequences would provide non-const begin() and end() methods, and also erase(), insert(), and all the others described in Sequence Concept [3].)
it->do_something();
to
(*it).do_something();
and who claim, correctly in a few cases, that the former syntax is more efficient. However, as we have seen, most sequence containers that you implement over enumeration APIs will not have access to anything for which this argument could be put, since an instance of the value_type does not exist to be pointed to and so is synthesized by operator *(). (This is in contrast to standard containers, which contain elements directly, and therefore can give out pointers and references to these elements.) Hence, constraining your iterator syntax to the latter form will, overall, lead to writing algorithms that are more widely applicable.
Now that we've discussed the issues involved lets look at some actual implementations. The winstl::findfile_sequence class provides the ability to enumerate files and/or directories from a given search specification (i.e., "xyz*.p??"), in a given directory. The sequence is layered on top of the Win32 FindFile API, which comprises the three functions:
HANDLE FindFirstFile(
LPCTSTR lpszSearchSpec,
WIN32_FIND_DATA *lpFindData);
BOOL FindNextFile(
HANDLE hFind,
WIN32_FIND_DATA lpFindData);
BOOL FindClose(HANDLE hFind);
FindFirstFile() starts an enumeration and creates a search handle to the caller, or returns INVALID_HANDLE_VALUE to indicate failure. FindNextFile() moves to the next element in the collection, or returns FALSE to indicate that the enumeration is complete. FindClose() closes a valid search handle and must be called, otherwise kernel resources are leaked until the process terminates. A typical program for the use of these functions is shown in Listing 5. Note that the API returns both files and directories, including the current directory "." and (for nonroot directories) the parent directory "..".
The findfile_sequence class wraps this functionality in the form of an Input Iterator-providing sequence. In addition, it contains logic for filtering out files, directories, and the "dots" (. and ..), such that you can request any permutation of these types at construction, relieving client code of the logic you see in Listing 5, as can be seen in Listing 4.
The class is actually a template class, winstl::basic_findfile_sequence, which supports different parameterizations for use with ANSI and Unicode. The full implementation of this and all the supporting classes (winstl::basic_findfile_sequence_value and winstl::basic_findfile_sequence_const_iterator) is available in the archive and online [1], but simplified and truncated versions are shown in Listings 6 and 7. We focus on the ANSI parameterization, findfile_sequence_a, in the article and sample programs, but the principles described apply equally to any parameterization. The implementation addresses the decisions outlined above in the following ways:
The API function used for enumerating registry keys is RegEnumKeyEx(), which has the following signature:
LONG RegEnumKeyEx( HKEY hKey, // handle to key to enumerate DWORD dwIndex, // subkey index LPTSTR lpName, // subkey name LPDWORD lpcName, // size of subkey buffer LPDWORD lpReserved, // reserved LPTSTR lpClass, // class string buffer LPDWORD lpcClass, // size of class string buffer PFILETIME lpftLastWriteTime // last write time );
The important thing to notice is the second parameter, dwIndex, which is the index to the requested item. To enumerate through a series, you call it with an index of 0 and continue with incrementing values until the return value is non-0. (The function returns ERROR_NO_MORE_ITEMS when the index is greater than or equal to the number of subkeys available.) A sequence iterator class would, of course, use the index for storing enumeration sequence state, and you do, therefore, get Forward Iterator support for free. (Indeed, it would be very hard to devise an implementation that did not provide Forward Iterator behavior.)
The full implementation of the winstl::basic_reg_key_sequence template class (and its supporting winstl::basic_reg_key_sequence_ const_iterator and winstl::basic_reg_key_sequence_value classes) is available in the archive and online, but a simplified and truncated version is shown in Listings 3 and 8. (We focus on the ANSI parameterization, reg_key_sequence_a.) The implementation addresses the decisions outlined above in the following ways:
The registry enumeration model does not readily support the dereferencing operator (operator ->()).
Using the sequences is as simple or as complex as using any other STL sequence. In Listing 4 we saw a simple example. A more complex example is seen in Listing 9, which shows part of a replacement for the common WHEREIS utility. This program uses the winstl::findfile_sequence along with another WinSTL class, the searchpath_sequence (which provides an ordered collection of search paths for the calling process), such that a set of search sequences, such as "*.cpp;*.d;*.xml", can be applied to either a given directory, or to the system search paths. The full source is in the archive.
Also included in the archive is the source to REGTREE (Listing 10, Figure 1), a simple registry enumerator program, that demonstrates a standard enumeration using for_each, as well as illustrating a combination of algorithms and sequences to perform a recursive enumeration of the registry.
This article has described some of the ideas represented in the STL Sequence and Iterator concepts. It has also highlighted some issues that are important when implementing your own STL-compliant sequences and iterators, and has illustrated these issues by presenting two enumeration API- adapting sequence classes, findfile_sequence, and reg_key_sequence, and their supporting classes.
I've endeavored to show that the implementation of correct, robust, and efficient sequences and iterators, though involved, is not as complex as might be expected. Hopefully, I've demonstrated that using such classes is straightforward indeed, especially so given that they can be used with standard algorithms and functionals, as shown in the example programs included with this article.
All the WinSTL classes, and supporting STLSoft code, are available online ([1]), along with an increasing number of other classes, algorithms, and functionals. Furthermore, the libraries are open to expansion for anyone who wishes to submit their own classes.
[1] http://winstl.org/.
[return to text]
[2] http://stlsoft.org/.
[return to text]
[3] http://www.sgi.com/tech/stl/Sequence.html.
[return to text]
[4] http://www.sgi.com/tech/stl/Iterators.html.
[return to text]
[5] Generic Programming and the STL: Using and Extending
the C++ Standard Template Library, Matthew Austern, Addison-Wesley, 1998.
[return to text]
[6] http://www.sgi.com/tech/stl/InputIterator.html.
[return to text]
[7] http://www.sgi.com/tech/stl/EqualityComparable.html.
[return to text]
[8] http://unixstl.org/.
[return to text]
[9] "Move Constructors," Matthew Wilson, Synesis Technical
Report, http://www.synesis.com.au/resources/articles/cpp/movectors.pdf.
[return to text]
[10] http://www.sgi.com/tech/stl/ForwardIterator.html.
[return to text]
[11] http://www.sgi.com/tech/stl/OutputIterator.html.
[return to text]
[12] http://www.sgi.com/tech/stl/BidirectionalIterator.html.
[return to text]
[13] Effective STL, 50 Specific Ways to Improve Your Use
of the Standard Template Library, Scott Meyers, Addison-Wesley, 2001.
[return to text]
[14] http://www.sgi.com/tech/stl/LessThanComparable.html.
[return to text]
[15] http://www.sgi.com/tech/stl/RandomAccessIterator.html.
[return to text]
[16] The C++ Programming Language, Third Edition, Bjarne Stroustrup, Addison-Wesley, 1997.