Courtesy of Dr. Dobb's Journal (March 2004)
Manipulation of collections is a common operation in software engineering. Indeed, it is hard to conceive of a meaningful program that does not deal with at least one collectionargv in C, lists in Perl and Python, vector and list in C++, and strings in pretty much any language.
Enumeration is the process of sequentially visiting each element in a collection of elements. This works great for arrays. The most common (or, at least, most widely recognized) form of collection is the array, and the usual way of visiting each element of an array is with the for statement. Regardless of language, it is common to see something like Listing One.
But there are many other collection types, including linked lists, trees, associative arrays (maps), hash tables, sets, bags, and deques, to name a few. When you move beyond arrays, the syntactic commonality in the manipulation of collections is reduced, both between collection types in a given language and between the same collection types across languages. For example, in C, you can enumerate an array with Listing Two, whereas enumerating a list (Listing Three) looks quite different. This variation not only makes it difficult to change to a different containment model, but also places a greater burden of learning on you. Some languages, such as Perl and Python, have built-in support for enumerating collection types in a generalized form. Perl has the foreach statement (Listing Four), while Python has its for statement (Listing Five).
For C++, the Standard Template Library (STL) has done an amazing job of homogenizing the syntax of manipulation of disparate collection types, and has done so according to the philosophy of C++by adding library elements rather than the specification of new language features. STL supports generic programming by defining the concepts of Containers, Iterators, Algorithms, Function Objects, and Adaptors. Iterators navigate over ranges of elements. Containers yield Iterators, via their begin() and end() methods, that define a range of elements that may be enumerated. Algorithms operate on ranges of elements defined by Iterators. Function Objects (also called "Functionals" and "Functors") are types that can act as functions, with or without maintaining state on a per-instance basis, and may be applied to elements within a range by algorithms.
Hence, it is common to see code such as:
SomeContainerType cont;
for_each(cont.begin(), cont.end(), f);
where SomeContainerType can be any type that satisfies the STL Container concept requirements. In fact, it does not even have to do that: All that is required of SomeContainerType is that it has begin() and end() member functions, whose return values are (or can act as) Iterators. SomeContainerType could be a list, vector, or any other STL container type, or it could be a user-defined collection type, such as those provided in the STLSoft libraries (http://stlsoft.org/), an open-source organization that provides freely available STL-like extensions.
Similarly, f, which could be either a function or an instance of a class providing the function call operator operator () can do almost anything, so long as it is compile-compatible with the types manipulated by the Iterators provided by SomeContainerType. For example, if SomeContainerType is std::vector<std::string>, f would manipulate std::string references. Alternatively, if SomeContainerType is WinSTL's winstl::treeview_child_sequence (http://winstl.org/), f would manipulate HTREEITEM values. In either case, the enumeration algorithm for_each is simply a generic template definition, whose instantiating types are selected by the compiler at compile time. This is the basis of STL's generic programming, and it is powerful, flexible, and extensible.
There can be little doubt that STL saved C++ from obsolescence and, in our opinion, has placed it at the forefront of current language technology. However, D is another language that can make this claim. D looks like C and C++, but eliminates features that make programs difficult to write, debug, test, and maintain (see "The D Programming Language," by Walter Bright, DDJ, February 2002).
To support a generalized form of programming requires an enumeration construct that can iterate over any collection. D (available at http://www.digitalmars.com/d/) implements its collection enumeration in a novel wayin the guise of the foreach statement. Using foreach to enumerate the contents of an array looks like Listing Six. The body of the foreach statement works equivalently to other loop statements (in C-family languages); the break, continue, goto, and return statements within the body all have the usual meaning.
There is no longer an obvious loop index variable. Indeed, the idea of a loop index variable makes no sense for many kinds of collections. foreach works just as well on associative arrays; see Listing Seven.
Where the novel implementation of D's foreach really comes into its own is in its support for enumeration of any user-defined collection type. Other languages that support enumeration of user-defined collection types in a syntactically standard form usually require the implementation of specific enumerator types, whose instances are returned by the enumerated object.
For example, a COM Automation collection is required to implement the _NewEnum method, which returns an instance implementing the IEnumVARIANT interface. .NET requires that the collection type contains a method GetEnumerator() that returns an instance implementing the IEnumerator interface. In either case, a separate enumerator object must be created to act as a translating intermediary and to maintain the current state of the enumeration. Such intermediaries generally impact on efficiency since the additional level(s) of indirection have a speed cost; the intermediary itself is usually heap allocated and (in the case of COM, at least) requires timely deallocation.
In C++, intermediaries are represented by Iterators, which can be raw pointers or user-defined types. Such types can often be efficient in implementation, and are usually entirely (including member variables) representable on the stack, which means that they can achieve high efficiency in execution, though this is not always the case. However, even where this does hold true, the syntax of enumeration of STL Containers can leave a little to be desired. If you have a suitable function or Function Object or one that is Adaptable, then for_each may be employed. However, where the manipulation of the enumerated items is more complex, you either have to write a customand usually one-offFunction Object, or hand-code the enumeration (as in Listing Eight).
In D, none of these issues are relevant. The actual translation of a foreach statement by the compiler depends on the type of collection being enumerated. If it is an array, then the compiler generates a loop somewhat equivalent to the for loop described at the start of the article. However, if the collection is a user-defined type, then the compiler translates the foreach statement into a loop construct and the body of the statement into a delegate (a callback function). (Associative arrays are treated in much the same way as a user-defined collection.) All that is required of the collection type (which can be a heap-based class or stack-based struct) is that it defines an opApply() method (Listing Nine). This method is called in place of the foreach, passing the loop-body delegate, and the enumeration is conducted by the collection, which knows best how to do that.
The opApply() method handles all the details of traversing the data structure, including maintaining any state. The return value from the delegate dg() is used to communicate with the foreach control code. All values other than 0 are reserved by the implementation and represent the language of interaction between the compiler-generated delegate and the compiler-generated foreach handler. This is why the opApply() function must preserve it and return it if it is not zero. Authors of opApply() methods have a responsibility, therefore, to return 0 or delegate result, and not any other value.
This means that D's notion of a collection is extremely flexible. It can be a bonafide container, such as a list, or it can be a type that is not really a collection at all, similar to the STL notion of an input stream being a read-once collection. Furthermore, it is also simple to wrap operating system or framework collection APIs, and present them as first-class enumerable D types. This is the case with the D standard library's Win32 registry module (std.windows.registry); see the accompanying text box entitled "std.windows.registry."
The registry module consists of several cooperating typesRegistry, Key, Value, KeyNameSequence, KeySequence, ValueNameSequence, and ValueSequencethat together provide access to, and manipulation of, the registry on Win32 systems. A given registry Key provides access to its child keys and values in the form of instances of KeySequence and ValueSequence, returned via its Keys and Values properties. Listing Ten shows the implementation of the KeySequence class. (The _Reg_* functions are internal functions that map the raw Win32 API to D-friendly signatures, but that otherwise have the semantics of their mapped equivalents.)
There are several points to note:
Again, the interesting aspect of this technique is that there doesn't need to be a separate enumerator object to maintain the state from one enumeration point to the next. All state is held by the opApply() function. Since D also supports nested functions (a powerful and much missed feature in C/C++), this represents an extremely powerful mechanism. For example, if the collection needs an arbitrarily large amount of state, the implementation of the enumeration steps can be entirely represented within the opApply() method, and is, therefore, simplified considerably; see Listing Eleven. The state required to visit each TreeEntry is neatly handled on the stack by the recursive nested function traverse(). Summing all the data on the Tree looks like Listing Twelve. If you were to do that via another technique, you would be in for some rather complex code. Naturally, sum could be abstracted in a template, as in Listing Thirteen.
We've seen how the foreach statement provides a uniform structure around which different kinds of enumerations can be supported. No complicated state-preserving iterator or enumerator classes are needed, even for arbitrarily complex element traversal algorithms. Naturally, all this (and more) can be achieved in C++ using the concepts of the STL. (One of us spends a lot of time doing just that!) However, it is clearly significantly easier to make an arbitrary collection enumerable using D's foreach/apply than it is to write an equivalent in C++. If you're not convinced, you're welcome to download the WinSTL registry classes (http://winstl.org/downloads.html) and the source for the std.windows.registry module (http://www.digitalmars.com/d/) and compare them for complexity and readability; they had the same author.
Debate in the D community is in the early stages regarding an equivalent to the STL for D. Unless and until such a library is implemented, foreach will be the primary enumeration mechanism. Only time will tell whether foreach wins out over STL-like techniques in D's generic containment libraries. What is certain is that for the enumeration of nongeneric container (or container-like) types (such as registry keys), foreach will continue to be a first choice mechanism.
DDJ
T ar[]; // Declare the array
for(int i = 0; i < ar.length; ++i)
{
ar[i]; // Do something with the array element
}
T ar[100]; // Declare the array
int i;
for(i = 0; i < sizeof(ar) / sizeof(ar[0]); ++i)
{
ar[i]; // Do something with the array element
}
struct Link
{
int value;
Link *next;
} *g_head;
struct Link *l;
for(l = g_head; NULL != l; l = l->next)
{
l->value; // Do something with the list element
}
@allfiles = ...;
foreach $file (@allfiles)
{
print "File: " . $file . "\n";
}
languages = {}
for language in languages.keys():
print language
T ar[]; // Declare the array
foreach(T t; ar)
{
ar[i]; // Do something with the array element
}
uint[char[]] aa; // associative array of uints indexed by a string
...
foreach (uint v; aa)
{
v; // Do something with the value
}
foreach (char[] k; aa.keys)
{
k; // Do something with the key
}
foreach (uint v, char[] k; aa)
{
k; // Do something with the key
v; // Do something with the value
}
SomeContainerType cont;
SomeContainerType::iterator begin = cont.begin();
SomeContainerType::iterator end = cont.end();
for(; begin != end; ++begin)
{
// Do something(s) with the elements
}
struct IntegerRange
{
this(int low, int high)
{
m_from = low;
m_to = high;
}
int opApply(int delegate(int value) dg)
{
int res;
for(int i = m_from; i < m_to; ++i)
{
if(0 != (res = dg(i)))
{
break;
}
}
return res;
}
}
public class KeySequence
{
private:
this(Key key)
{
m_key = key;
}
invariant
{
assert(null !== m_key);
}
public:
uint Count()
{
return m_key.SubKeyCount();
}
Key GetKey(uint index)
{
... // Does something similar to the opApply()
// method to get hold of the requisite key
}
Key opIndex(uint index) // Provides indexing operator []
{
return GetKey(index);
}
public:
int apply(int delegate(Key key) dg)
{
int result = 0;
HKEY hkey = m_key.m_hkey;
DWORD cSubKeys;
DWORD cchSubKeyMaxLen;
LONG res = _Reg_GetNumSubKeys(hkey, cSubKeys, cchSubKeyMaxLen);
char[] sName = new char[1 + cchSubKeyMaxLen];
for(DWORD index = 0; 0 == result; ++index)
{
DWORD cchName = 1 + cchSubKeyMaxLen;
LONG res = _Reg_EnumKey(hkey, index, sName, cchName);
if(ERROR_NO_MORE_ITEMS == res)
{
// Enumeration complete
}
else if(ERROR_SUCCESS == res)
{
try
{
Key key = m_key.GetKey(sName[0 .. cchName]);
result = dg(key);
}
catch(RegistryException x)
{
if(x.Error == ERROR_ACCESS_DENIED)
{
// Skip inaccessible keys; they are
// accessible via the KeyNameSequence
continue;
}
throw x;
}
}
else
{
throw new RegistryException("Enumeration incomplete", res);
break;
}
}
return result;
}
private:
Key m_key;
}
struct TreeEntry
{
TreeEntry *m_left;
TreeEntry *m_right;
char[] m_data;
}
struct Tree
{
TreeEntry *m_root;
int opApply(int delegate(inout char[] value) dg)
{
int traverse(TreeEntry* te)
{
int result;
while(null != te)
{
result = dg(te.m_data);
if(0 != result)
{
return result;
}
result = traverse(te.m_left);
if(0 != result)
{
return result;
}
te = te.m_right;
}
return 0;
}
return traverse(m_root);
}
}
int sumTree(Tree tree)
{
int sum = 0;
foreach(int value; tree)
{
sum += value;
}
return sum;
}
template sum(R, V)
{
R sumit(T t)
{
R sum;
foreach (int value; t)
{
sum += value;
}
return sum;
}
}
DDJ