Wednesday, 28 February 2007

Evolution of C# enumerators

Introduction

I needed a very simple tree data structure the other day. Unfortunately C#/.NET doesn’t provide one so I implemented a simple one. The need was to create a hierarchy of folders from a flat data structure where each node contained a unique id and its parent id. The list wasn’t particularly ordered, i.e. not by depth or breadth however parent nodes always preceded child nodes so that when a node was added to the tree the parent was guaranteed to already be there. This wasn’t the important part of the exercise. The main thing I wanted to accomplish was to perform a depth first search displaying the node’s contents indented to reflect the depth that it occurred in the tree.

As this was written in C# and .NET collections are expected to support the IEnumerable interface I didn’t just want to implement a basic tree walker but do it in such a way that it conforms to the .NET paradigm. As it’s a while since I’ve implemented a tree and this is the first time I’ve used C# 2.0 enumerators I ended up going through 3 iterations of enumerator until I had what I thought was proper .NET style enumerator. This article describes this evolution. Please note that thread safety is not considered and little attention is paid to error handling or invalid conditions.

The following code shows the minimal implementation of the tree bar any enumeration code.

class Node
{
  public static Node MakeRoot() 
  {
    return new Node(0); 
  }
  private Node(int id) { id_ = id; }
  public bool Add(int id, int parentId)
  {
    if (parentId == id_)
    {
      children_.Add(new Node(id));
      return true;
    }
    else
    {
      foreach (Node child in children_)
        if (child.Add(id, parentId) == true)
          return true;
      return false;
    }
  }

  public int Id { get { return id_; } }
  private int id_ = -1;
  ArrayList children_ = new ArrayList();
}
The main program just loads a simple tree.

Node root = Node.MakeRoot();

root.Add(1, 0);
root.Add(2, 0);
root.Add(3, 0);
root.Add(11, 1);
root.Add(12, 1);
root.Add(13, 1);
root.Add(121, 12);
root.Add(122, 12);
root.Add(1211, 121);
root.Add(12111, 1211);

C Approach

Performing a depth first search is not difficult. The easiest way to implement this is to use recursion. The following is a quick implementation that proved the tree was functioning correctly.

Note that this is a member of the Node class.

public void Walk(int level)
{
  for (int i = 0; i < level; ++i)
    Console.Write("\t");

  Console.WriteLine(id_);

  foreach (Node node in children_)
    node.Walk(level + 1);
}
This implementation is how a tree would have been walked in the era of C programming. In the OO and .NET world it has a number of problems:
  1. Breaks encapsulation as the application code, i.e. the display code is mixed with that of the data structure. Using straight C this code could be fixed by having the method take a function pointer as an argument which would be invoked for each node to display the node. In C++ this could be improved by using a functor or by implementing the whole thing as an STL compatible iterator.
  2. The method runs until completion, i.e. it is not possible to obtain an item from the tree and perform some action on it then retrieve the next or abandon the walk altogether. However, the method could be extended to provide a mechanism to terminate early depending on the return value of the invoked method.
  3. .NET 2.0 collections are expected to implement the IEnumerable interface which means they must provide a IEnumertor GetEnumerator().  This allows the collection to be used by any .NET entity that recognizes the standard enumeration interfaces in particular the foreach keyword which is recognized as the best mechanism for enumerating a collection.

C# 1.0 Approach

These issues can be addressed by implementing the tree walker as an enumerator. The main difficulty in doing this is that an enumerator is an iterative operation which means a separate call is made to advance the iterations whereas the tree walker is called once and only returns when the traversal is complete.

An enumerator for data structures that are suited to recursive traversal can be implemented using various techniques. Firstly on the initial call the enumerator could walk the tree recursively and build up a linear data structure that is ideally suited to iteration, i.e. create a list of depth first traversal whose members reference the actual nodes of the tree then iterate over members of the list.

The method I choose was to mimic recursion by maintaining context in the enumerator that upon the following call would take the enumerator back to where it left off. The enumerator is implemented in a separate class which can hold the state and as per the iterator pattern allow multiple instances to co-exist. The following code shows the implementation but note that even though not explicitly shown this is a nested class so has access to Node’s private members and is not visible to the caller which works only against the IEnumerator interface. It is this interface that requires the implementation of MoveNext(), Current & Reset().

class WalkNode : IEnumerator
{
  public WalkNode(Node root)
  {
    root_ = root;
  }
  bool IEnumerator.MoveNext()
  {
    if (current_ == null)
    {
      current_ = root_;            
      activeIts_.Push(root_.children_.GetEnumerator());
      return true;
    }
    else
    {
      while (activeIts_.Count > 0)
      {
        IEnumerator it = (IEnumerator)activeIts_.Pop();
        if (it.MoveNext() == true)
        {
          current_ = (Node)it.Current;
          activeIts_.Push(it);
          activeIts_.Push(current_.children_.GetEnumerator());
          return true;
        }
      }

      return false;
    }
  }
  object IEnumerator.Current
  {
    get 
    { 
      // Need to handle invalid current_
      return new NodeAndLevel(current_, activeIts_.Count - 1); 
    }
  }
  void IEnumerator.Reset() { current_ = null; activeIts_.Clear(); }
  Node root_;
  Node current_;
  Stack activeIts_ = new Stack();
}
The interesting stuff happens in MoveNext(). This is called to advance the enumerator and must be called to set the enumerator to the first item before actually using its value, i.e. calling the Current property.

On the first call the current node is set to the root node. This means that if the Current property is called then the root node will be returned. Secondly it sets the enumerator up for the subsequent call by obtaining an enumerator to the root node’s children and pushing it onto a stack stored in the enumerator. It is this stack that effectively mimics the recursion as it persists the enumerator for the current node and those in the branch above across each call and so maintains a reference to the next node to visit and which one to return to when there are no more children.

On the next (non-root node) and all subsequent calls a test is made to see if the stack contains any items. If it doesn’t then this means there are no more nodes to visit and the tree has been fully traversed. When called the second time the stack will contain the enumerator for the root node’s children. Due to the nature of the Node class a node will always be able to provide a enumerator even if doesn’t have any children but the first call to MoveNext() on it will return false indicating the end of the children, i.e. none. If the root has no children then this will be case as the enumerator (for its children) obtained from the stack will return false, the stack count will be zero so the loop will terminate and the main MoveNext() will return false indicating an end of the traversal. This will also be case when all the children of the root have been traversed.

The other case is that the root does have children in which case the internal call to MoveNext(), i.e. using the enumerator from the stack will return true in which case the member variable current_ used to implement the Current property will be set to the child node obtained from the Current property of the internal enumerator. As this is a depth first search the next node to be traversed should be the first child of this node so the current enumerator is pushed onto the stack along with the newly obtained internal enumerator for the current node’s children so on the next call its children will be enumerated. True is then returned.

In the case of the bottom of a branch being reached the internal call to MoveNext() from the just popped enumerator will return false. As the stack still has the parent enumerators on it the loop will not terminate unlike the case of a root node without children or having traversed its last. This then causes the current node’s parent’s enumerator to become the current enumerator allowing the next child along to be traversed if there is one. This continues until the stack has been popped so that current internal enumerator is that of the root node. If this contains more children the each of these nodes is traversed deeply until no more children are found causing the loop to terminate as the stack is empty and finally returning false indicating the end of the traversal.

The downside of the mimicking approach is the number of live internal enumerators that are kept alive on the stack whilst traversing deeper children. If the deepest point of the tree was 100 nodes deep then this would require 100 active internal enumerators to be on the stack. The deepest enumerator, i.e. number 100 would be for the leaf node and will yield no children.

Rather than returning the reference to a node an instance of a wrapper type is returned which allows the additional level information to be supplied. The definition of this is shown below.

class NodeAndLevel
{
  public NodeAndLevel(Node node, int level)
  {
    node_ = node;
    level_ = level;
  }

  public Node Contents { get { return node_; } }
  public int Level { get { return level_; } }

  readonly Node node_;
  readonly int level_;
} 
The other modification is that Node implements the IEnumerable interface which means it must implement the GetEnumerator() method which it does. This creates an instance of the WalkNode enumerator that we have just discussed as can be seen below.

IEnumerator IEnumerable.GetEnumerator()
{
  return new WalkNode(this);
}

C# 2.0 Approach

Version 2.0 of C# added language support to make the implementation of enumerators simpler. This allows the implementation of this particular enumerator to be reduced and simplified dramatically.

class Node : IEnumerable
{
  IEnumerator IEnumerable.GetEnumerator()
  {
    return GetEnumeratorWithLevel(0);
  }
  private IEnumerator GetEnumeratorWithLevel(int level)
  {
    yield return new NodeAndLevel(this, level);
    foreach (Node node in children_)
    {
      using (IEnumerator it = node.GetEnumeratorWithLevel(level + 1))
      {
        while (it.MoveNext())
          yield return it.Current;
      }
    }
  }
  System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
  {
    return this.GetEnumeratorWithLevel(0);
  }

… the rest is the same as Node at the beginning

The entire enumerator code has now been reduced to the implementation of GetEnumerator(), well almost. The helper GetEnumeratorWithLevel() is also required so an additional argument can be passed. As GetEnumerator() is implementing the interface IEnumerable<> as inherited by Node its signature cannot vary hence why it calls the helper method. As IEnumerable<> derives from the C# 1.0 interface IEnumerable as used in the previous example the weakly typed IEnumerable.GetEnumerator() must also be implemented but as can be seen this is done by calling the strongly typed version.

As generics is available the tree code has switched to using generics. This is a good thing as the caller of the enumerator is dealing with strongly typed objects that don’t require a runtime cast any more.

The enumerator class hasn’t really gone away rather the compiler has implemented it for us as a nested class just as we had to do by hand in the previous version. Again, just like the C# 1.0 implementation which used a stack to maintain the recursive state the generated code also maintains state. The key to allowing the compiler to do this is the yield return statements. This informs the compiler where the enumerator should return and that the subsequent call to MoveNext() should begin immediately after it.

In this example, the enumerator is initially called with the root node. This then yields (returns) immediately, creating a new instance of NodeLevel which will be accessible via the generated Current property.

The next call causes the enumerator to create an internal enumerator (as before in the C# 1.0 implementation) of the current node’s children, i.e. from the children_ list. This will then call MoveNext() on this internal enumerator. If the root node has no children then it will terminate and false will be returned indicating that the traversal is complete. If there are children then the yield will cause a return true and the current node will become that child. As the enumerator was created by calling GetEnumeratorWithLevel() this is effectively a recursive call but with the yield causing the return with the generated code maintaining the state including storing the references to all the internal enumerators.

Even though the generated code is hard to read and full of gotos using Reflector allows the generated code to be seen. As this implements IEnumerator<> and the derived IEnumerator Current is implemented for both and Reset for the latter. In addition, as IEnumerator<> is also derived from IDisposable Dispose is also created. This is why the apparent recursive call to GetEnumeratorWithLevel() wraps the resulting call with a using statement so that Dispose() is called.

The generated code still has to do everything the hand-coded version did including keeping open any active enumerators. As it’s generated it might be faster too but I haven’t tested this. However, it has freed us from having to write & maintain this code.

How to prevent Visual Studio 2005 designer from running your code

I created a control that inherited from System.Windows.Forms.TreeView. As it derived from the a control class Visual Studio 2005 (VS05) kindly automatically added to the Project Components section of the Toolbox (sidebar). From here I could happily drag'n'drop my control on to a form, which I did! However, my derived control upon construction performed a recursive enumeration of all the folders on my C: drive to populate the tree view. This was fine when the program was executing but it appears that in the design view VS05 also runs the constructor. This meant the recurisve and I did I say fairly time consuming enumeation occured in the design view effectively grinding all my work to a halt.

Anyway, I figured there must be a way to turn this off so; most likely using an Attribute. However, I could find nothing. Following some googling I came across some code that could be used detect from where your code was being run or to be more accurate was it being be run from within VS05.

So, just in case you have the same problem here it is:


if (this.Site == null this.Site.DesignMode == false)
; // This will only executee code when not running in VS05, i.e. normal execution