A Day In The Lyf

…the lyf so short, the craft so longe to lerne

True Multiple Dispatch

When I first started learning about Design Patterns some years ago, I found myself having to revisit the Visitor pattern over and over again. Something about it just didn’t click with me.

Several years later, I now have experience using the Visitor pattern in a few applications. But the idea of multiple dispatch still seems a bit foreign to me. The authors of Design Patterns claimed that Visitors allowed you to imitate multiple dispatch in languages that didn’t natively support it. That was a rather mind-blowing statement to me – how can a language natively support something as strange as multiple dispatch?

Lately, I’ve been learning a little bit about Lisp, and Common Lisp (CL) does natively support multiple dispatch. It turns out that understanding CL’s approach to object-orientation is useful for more than just a better understanding of the Visitor pattern.

The Fundamental Unit of Behavior

Most programmers in mainstream OO languages will tell you that the class is the fundamental unit of behavior. There are subtleties – for example, in Smalltalk and Ruby, classes are actually objects, which arguably makes the object the fundamental unit, but both of those languages still use classes to organize behavior. Languages like JavaScript don’t use classes at all, but achieve similar results through adding behavior to an object’s prototype.

Common Lisp is class-based, but in CL, the class simply holds the data, like a C struct. When it comes to behavior, the method reigns supreme.

Putting methods into classes, which would have made CL look more like other object-oriented languages, never worked out very well. The message passing paradigm popularized by Smalltalk made methods and traditional Lisp functions semantically different, meaning much of the power that stems from Lisp’s functional roots was lost. So the authors of the Common Lisp object system (CLOS) unified methods with functions, giving birth to the generic function.

Generic functions are like abstract methods in other languages. They define an interface for behavior, but defer the implementation of that behavior. But – and this is an awfully big but – generic functions are not attached to any class.

(defgeneric area (shape)
   (:documentation "Returns the area of the given shape."))

In typical functional style, the area function takes the object as a parameter, instead of sending the object an area message as you would in traditional OO languages. So far, this isn’t too different from Python, which requires you to pass self to all methods.

The actual implementation of a generic function is defined by methods. But methods don’t belong to classes; they belong to the generic function. Each method for a generic function is a specialization of that function for specific classes. For example:

(defmethod area ((shape square))
   (* (width shape) (width shape)))

(defmethod area ((shape circle))
   (* (radius shape) (radius shape) *pi*))

In a traditional OO language, you would have the Shape class define an abstract Area method, which was implemented in both the Square and Circle subclasses. In contrast, when the area generic function is called in CL, it compares the actual argument to the formal arguments in each of its applicable methods. If it finds a matching method, then its code is executed (and, although its outside the scope of this article, if it finds multiple matching methods, they are all called according to certain combination rules).

Multimethods

The area example above is simple because it takes only one argument (equivalent to taking no arguments in message passing languages (except Python (of course!))).

It is obviously possible to have generic methods that operate on multiple arguments. And – and this is an awfully big and – methods can specialize on each of those arguments, not just the first one. Methods that specialize on multiple parameters are called multimethods (the following silly example was adapted from here):

(defgeneric test (x y)
   (:documentation "Returns the type of the two arguments))

(defmethod test ((x number) (y number))
   '(num num))

(defmethod test ((i integer) (y number))
   '(int num))

(defmethod test ((x number) (j integer))
   '(num int))

(defmethod test ((i integer) (j integer))
   '(int int))

(test 1 1)      => (int int)
(test 1 1/2)    => (int num)
(test 1/2 1)    => (num int)
(test 1/2 1/2)  => (num num)

Specializing on more than one parameter doesn’t help with the combinatorial explosion, but it is occasionally useful And it’s not something you can do in a message passing language natively, which is why you need support from design patterns. The great thing about native support for multiple dispatch is that you can do it without any of the plumbing code that design patterns require (no confusing Visit and Accept methods). There is a common complaint against design patterns that says they exist simply to cover up language limitations. At least in the case of the poor Visitor, that certainly seems to be the case.

Written by Brandon Byars

February 14, 2008 at 10:57 am

Posted in Design Patterns, Languages, Lisp

Tagged with

Understanding Syntactic Macros

Paul Graham finally persuaded me to pick up a little Lisp. I’d already learned some Scheme, but the books I read were mainly academic in nature (here and here), and didn’t show off the power that Paul keeps talking about. So I switched over to Practical Common Lisp hoping that would give me a better view.

It’s very obvious that a number of programming concepts that have been added in various languages to “revolutionize” the language landscape are nothing more than rediscoveries of Lisp concepts. But the One Big Thing that Lisp has and no other language does is its macro system. I don’t think I’d being doing Lisp a disservice by saying that macros are at the heart of why Lisp programmers claim their language is the most powerful one on the planet.

Lisp macros take some getting used to. I found it useful to compare them to related concepts from other languages.

Preprocessor Substitution

It is perhaps unfortunate that the word “macro” is used to describe superficially similar concepts in both C and Lisp. In actuality, the two concepts are as different as night and day.

C’s macro system is nothing more than a preprocessor textual substitution. It really was never intended as a tool for abstraction, although some rather miserable attempts were made to treat them as such (remember MFC message maps?). Rather it was added as a performance hack, like a low-level inline function. There are some good uses of C macros, but in general they’re overused.

Let’s pretend, though, that we wish to go ahead and abstract looping into a C macro. The idiomatic C loop looks something like this:

int i;
for (i=0; i < n; i++) {
    /* loop body */
}

There’s an awful lot of duplication in doing that again and again, so let’s try and move it into a macro:

/* The \ character at the end of a line is a continuation character */
#define LOOP(n) \
    int i; \
    for (i=0; i < (n); i++)

Now the looping code might look something like this:

LOOP(n) {
    /* loop body */
}

The obvious problem with this macro is that you can only use it once in any given lexical scope, and only if you haven’t already declared the variable i. C macros don’t define their own scope as a function would, which is one large reason for they struggle to act as good abstractions.

But, to continue the fantasy, let’s pretend that LOOP works just fine for us. So good, in fact, that we use it often enough to notice the following recurring pattern:

LOOP(10) {
    printf("%d", i);
}

Beautiful code, without question, but because we see it so often, we’d like to abstract it out further. Something like this:

#define LOOP10 \
    LOOP(10) { \
        printf("%d", i); \
    }

Except, of course, that doesn’t work. The preprocessor only makes one pass through the file, meaning a macro can’t call another macro.

Code Generation

So the C macro system is a simple code generation scheme. We could, in theory, expand the code generation capability to allow multiple passes through a file, which would allow one C-like-macro to expand into another C-like-macro, which is then expanded into real code in a subsequent pass.

However, without some static analysis, we still won’t be able to generate unique variable names within any given lexical scope. Actually, even with static analysis, we won’t be able to do so. The problem is that any run-time code generation, like using Reflection.Emit in C#, or instance_variable_set in Ruby, thwarts the static analysis.

To make matters worse, a code generator sits outside the surrounding code. To make it work, you have to write the instructions in a pseudo-language, different from the surrounding language. C macros, for example, have rules different from normal C code. It’s common to uppercase them, which helps make it obvious that this clip of code you’re looking at isn’t standard C code.

Metaprogramming

Reflection in powerful languages like Ruby can give you some of the same benefits of macros, but they are very different concepts. Metaprogramming is a run-time concept. Macros are like metaprogramming at compile time (or interpretation-time—Lisp lets you decide if you want to compile the code first or not). What’s the difference?

Run-time metaprogramming uses dynamic data types to remove duplication. The ActiveRecord class in Ruby on Rails is a wonderful example. Unlike traditional object-relational mappers, you don’t have to specify the mapping from columns to properties using ActiveRecord. In fact, you don’t even have to specify the properties at all!

At run-time, an ActiveRecord class queries the database table whose name matches the name of the class, according to a well-known (and configurable) naming scheme. It then automatically adds property getters and setters for each column in the table. So you could have the following class definition:

class Order < ActiveRecord::Base
end

And you could use it like this:

order = Order.find(1)
puts order.id
puts order.ship_to_name

Notice that we never had to define id or ship_to_name. Rails used metaprogramming to define them for us. Some languages, like Ruby and Smalltalk, also let you intercept messages sent to objects for which no method could be found to bind to. Rails uses this feature to also give the appearance of adding special find methods:

Order.find_by_ship_to_name
Order.find_by_city_and_state

Metaprogramming is essentially code-generation at run-time, which provides extreme flexibility at the cost of some performance that static code-generation might be able to provide. But – and here’s what separates metaprogramming from syntactic macros – metaprogramming magic is bound by the rules of the language. You cannot change the language at run-time.

Syntactic Macros

So how do Lisp macros differ from static code-generation or metaprogramming?

To answer that question, we first need to understand something about Lisp source code. Lisp is one of the few languages that is truly homoiconic, or self-representing. Take a look at the following Lisp lists:

(2 3)
(> 2 3)
(if (> 2 3) True False)

The first list is just data – a collection of atoms. The second list, if interpreted, returns T (true) or NIL (false), depending on the result of calling > with the given arguments. The third list returns the atom True (not the same as T) or False, depending on the answer to the predicate.

But here’s the mind-stretching part: all three of these lists are just data. if and > are nothing more than atoms in a list. Lisp just has special processing rules that say, if the first atom in a list binds to a function name, apply that function with the rest of the list. The fact that Lisp has those processing rules doesn’t change the fact that its code is nothing more than a list of data.

Let’s look at a more involved if statement:

(if (hungry Spot)
    (progn
      (bark Spot)
      (wag-tail Spot)))

If Spot is hungry, he barks and wags his tail. Typically, an if statement allows only one clause if the predicate is true (and, optionally, one if it’s false). progn is a special Lisp form that lets you group multiple clauses together. It’s like the curly braces in C:

if (hungry(spot)) {
    bark(spot);
    wagTail(spot);
}

Without those braces, wagTail would happen unconditionally. Notice that Lisp provides the same functionality without any special syntax. progn is just another Lisp atom in a list, which, like any atom, is interpreted as a function (or something like a function) if it happens to be the first element of a list. Also notice that the if expression shown above is the source code for a full program – and that source code is just a list of atoms!

Maybe I shouldn’t get too carried away with that. In his book, Peter Seibel says too many people get so distracted by Lisp’s list-heritage that they miss some of the more practical aspects of the language, and in fact, lists probably aren’t used much in day-to-day usage. But the fact that there is no representational difference between Lisp data and Lisp source code means that a Lisp program can interpret its source code like any other data. Understanding that single point is the key to understanding macros.

Seibel introduces macros with a silly example, but one that’s useful for showing the code-as-data mentality. Here’s hello, world in Lisp:

(format t "hello, world")

The t atom (like true in C# and Java) tells format to print to standard output. Now imagine that we wrote the following macro:

(defmacro backwards (expr) (reverse expr))

backwards is the name of the macro, which takes an expression (represented as a list), and reverses it. Here’s hello, world again, this time using the macro:

(backwards ("hello, world" t format))

When the Lisp compiler sees that line of code, it looks at the first atom in the list (backwards), and notices that it names a macro. It passes the unevaluated list ("hello, world" t format) to the macro, which rearranges the list to (format t "hello, world"). The resulting list replaces the macro expression, and it is what will be evaluated at run-time. The Lisp environment will see that its first atom (format) is a function, and evaluate it, passing it the rest of the arguments.

So Lisp macros provide compile-time code generation, but in doing so, you have access not just to the code-generation instructions – which are just Lisp data – but also to the entire Lisp language and environment. And the resulting code doesn’t suffer a run-time performance penalty—the call to reverse above happens at compile time; the run-time environment sees only the expanded macro.

This really opens your eyes to new ways of removing duplication. Take the (if ... (progn ...)) example listed above. While progn is nice in that it gives you functionality similar to C’s braces without any special syntax, it can get a bit bulky to have to type it all the time. Common Lisp provides a when macro to make it easier; you’d use it like this:

(when (hungry Spot)
   (bark Spot)
   (wag-tail Spot))

But when doesn’t have to be baked into the language. Given if and progn, we could compile-time generate it with the following macro:

(defmacro when (condition &rest body)
   `(if ,condition (progn ,@body)))

OK, there’s some funky Lisp syntax in there (more than I want to go into to), but underneath the covers, it’s still just lists and atoms. The &rest atom gathers up any remaining parameters (after the condition parameter has been satisfied), and puts them into a single list which you can access through the variable body. The if list is quoted because we don’t want the macro to evaluate the if expression – remember it will try to if the first atom matches a function (or special form, or macro). Instead, we want to return the unevaluated list that will be evaluated later, at run-time. The comma and comma-at-sign are ways of splicing the parameters into the list, similar to string interpolation in Ruby.

So, in our when example with Spot the dog, (hungry Spot) will be the condition parameter, and ((bark Spot) (wag-tail Spot)) will be the body parameter. After when evaluates its arguments, it will return a list (source code) that looks just like the (if...(progn...)) that we wrote by hand.

Lisp also has a handy way of dealing with the variable naming problem we saw with our C macro, where we hard-coded the variable name i. You can use the function gensym to create a variable name that is guaranteed to be unique within its lexical scope. You can assign the result of gensym to another variable, and use that other variable in your macro. It will be replaced with the actual variable name when the macro expands. (Nifty, no?)

Finally, Lisp solves the multiple-pass code-generation problem. Macros can expand into macros, which can themselves expand into macros, for as long as you want. In fact, macros are so integral to the language, that this seems to be pretty common. Macros can call other macros or other functions. Some macros can become some large and complex that you might have an entire package dedicated to supporting functions for it.

The great thing about macros is that they let you actually change the language (to an extent), something that metaprogramming cannot accomplish. Let’s say you don’t care for Lisp’s clunky looping mechanisms. You’ve seen list comprehensions in Python or Erlang and want to bring some of that elegance to Lisp.

Too late – somebody else already did. The loop macro is incredibly powerful and complex. Some old-school Lispers hate it, because it really does change the language, which, for Lispers, is a way of saying it’s not using as many parentheses as it should. The following example is taken from Practical Common Lisp:

(loop for i in *random*
   counting (evenp i) into evens
   counting (oddp i) into odds
   summing i into total
   maximizing i into max
   minimizing i into min
   finally (return (list min max total evens odds)))

The result of that expression is a list containing the minimum, maximum, and sum of the numbers in the *random* global variable, as well as a list containing all even numbers and a list containing all odd numbers.

Try doing that in C.

Written by Brandon Byars

February 12, 2008 at 2:54 pm

Posted in Languages, Lisp

Tagged with ,

Managing Config Files

There’s a discussion on the altdotnet Yahoo group about managing configuration files. How do you manage updating multiple configuration files to change the appropriate values when deploying to a different environment?

The solution I hit on was to create a custom MSBuild task. When called from our build script, it looks something like this:

<ItemGroup>
    <ConfigFiles Include="$(DeployDir)/**/*.exe.config"/>
    <ConfigFiles Include="$(DeployDir)/**/*.dll.config"/>
    <ConfigFiles Include="$(DeployDir)/**/web.config"/>
</ItemGroup>

<ItemGroup>
    <HibernateFiles Include="$(DeployDir)/**/hibernate.cfg.xml"/>
</ItemGroup>

<ItemGroup>
    <Log4NetFiles Include="$(DeployDir)/**/log4net.config"/>
</ItemGroup>

<Target Name="UpdateConfig">
    <UpdateConfig
        ConfigFiles="@(ConfigFiles)"
        ConfigMappingFile="$(MSBuildProjectDirectory)\config\config.xml"
        Environment="$(Environment)" />
    <UpdateConfig
        ConfigFiles="@(HibernateFiles)"
        ConfigMappingFile="$(MSBuildProjectDirectory)\config\hibernate_config.xml"
        Environment="$(Environment)"
        NamespaceUri="urn:nhibernate-configuration-2.2"
        NamespacePrefix="hbm" />
    <UpdateConfig
        ConfigFiles="@(Log4NetFiles)"
        ConfigMappingFile="$(MSBuildProjectDirectory)\config\log4net_config.xml"
        Environment="$(Environment)" />
</Target>

Notice that each call to UpdateConfig takes the list of config files that will be changed and a config mapping file. That mapping file is what is read to update the config files given the environment. Here’s an example of what the mapping file looks like:


<configOptions>
    <add xpath="configuration/appSettings/add[@key='dbserver']">
        <staging>
            <add key="dbserver" value="stagingServer"/>
        </staging>
        <production>
            <add key="dbserver" value="productionServer"/>
        </production>
    </add>
</configOptions>

Each config file is scanned looking for each XPath expression in the mapping file. On each match, the entire node (and all its child nodes) of the original config file are replaced with the node under the appropriate environment tag in the mapping file. It’s a bit verbose, but simple enough, and it supports as many environments as you want to have.

The MSBuild task itself is fairly simple, delegating most of its work to a separate object called XmlMerger:

private void MergeChanges()
{
    foreach (ITaskItem item in ConfigFiles)
    {
        string configFile = item.ItemSpec;
        XmlDocument configFileDoc = LoadXmlDocument(configFile);
        XmlDocument configMappingDoc = LoadXmlDocument(configMappingFile);

        XmlMerger merger = new XmlMerger(configFileDoc, configMappingDoc);
        if (!string.IsNullOrEmpty(NamespaceUri) && !string.IsNullOrEmpty(NamespacePrefix))
            merger.AddNamespace(NamespacePrefix, NamespaceUri);

        merger.Merge(environment.ToLower());
        configFileDoc.Save(configFile);
    }
}

XmlMerger just finds the nodes that need updating and replaces them from the mapping file. Notice that it also accepts namespace information (see the NHibernate example in the build script snippet above), which is occasionally needed:

public class  XmlMerger
{
    private readonly XmlDocument configFile;
    private readonly XmlDocument configMapping;
    private readonly XmlNamespaceManager namespaces;

    public XmlMerger(XmlDocument configFile, XmlDocument configMapping)
    {
        this.configFile = configFile;
        this.configMapping = configMapping;
        namespaces = new XmlNamespaceManager(configFile.NameTable);
    }

    public void AddNamespace(string prefix, string uri)
    {
        namespaces.AddNamespace(prefix, uri);
    }

    public void Merge(string environment)
    {
        foreach (XmlNode mappingNode in configMapping.SelectNodes("/configOptions/add"))
        {
            string xpath = mappingNode.Attributes["xpath"].Value;
            XmlNode replacementNode = FindNode(mappingNode, environment).FirstChild;
            XmlNode nodeToReplace = configFile.SelectSingleNode(xpath, namespaces);
            if (nodeToReplace != null)
                ReplaceNode(nodeToReplace, replacementNode);
        }
    }

    private void ReplaceNode(XmlNode nodeToReplace, XmlNode replacementNode)
    {
        nodeToReplace.InnerXml = replacementNode.InnerXml;

        // Remove attributes not in nodeToReplace.  There's probably a cleaner solution,
        // but I didn't see it.
        for (int i = nodeToReplace.Attributes.Count - 1; i >= 0; i--)
        {
            if (replacementNode.Attributes[nodeToReplace.Attributes[i].Name] == null)
                nodeToReplace.Attributes.RemoveAt(i);
        }

        foreach (XmlAttribute attribute in replacementNode.Attributes)
        {
            if (nodeToReplace.Attributes[attribute.Name] == null)
                nodeToReplace.Attributes.Append(configFile.CreateAttribute(attribute.Name));

            nodeToReplace.Attributes[attribute.Name].Value = attribute.Value;
        }
    }

    private XmlNode FindNode(XmlNode node, string xpath)
    {
        XmlNode result = node.SelectSingleNode(xpath);
        if (result == null)
            throw new ApplicationException("Missing node for " + xpath);
        return result;
    }
}

That's it. Now the whole process is hands-free, so long as you remember to update the mapping file when needed. The config files we put into subversion are set to work in the development environment (everything is localhost), so anybody can checkout our code and start working without having to tweak a bunch of settings first. The deployment process calls our build script, which ensures that the appropriate config values get changed.

Written by Brandon Byars

January 10, 2008 at 9:39 pm

Posted in .NET, Configuration Management

Tagged with

Using Closures to Implement Undo

While it seems to be fairly common knowledge in the functional programming world, I don’t think most object-oriented developers realize that closures and objects can be used to implement each other. Ken Dickey showed how it can be done rather easily in Scheme, complete with multiple inheritance and dynamic dispatch.

That’s not to say, of course, that all OO programmers should drop their object hats and run over to the world of functional programming. There is room for multiple paradigms.

Take the well-known Command pattern, often advertised as having two advantages over a more traditional API:

  1. Commands can be easily decorated, giving you some measure of aspect-oriented programming. CruiseControl.NET uses a Command-pattern dispatch for the web interface, and decorates each command with error-handling, etc, providing a nice separation of concerns.
  2. Commands can give you easy undo functionality. Rails migrations are a good example.

Recently, I had to retrofit Undo onto an existing legacy (and ugly) codebase, and I was able to do it quite elegantly with closures instead of commands.

What are closures?

Briefly (since better descriptions lie elsewhere), a closure is a procedure that “remembers” its bindings to free variables, where free variables are those variables that lie outside the procedure itself. The name come from LISP, where the procedure (or “lambda”, as LISPers call them) was said to “close over” its lexical environment. In C# terms, a closure is simply an anonymous delegate with a reference to a free variable, as in:

string mark = “i wuz here”;
DoSomething(delegate { Console.WriteLine(mark); });

Notice that the anonymous delegate references the variable mark. When the delegate is actually called, it will be within a lexical scope that does not include mark. To make that work, the compiler wraps the closure in a class that remembers both the code to execute and any variable bindings (remember – objects and closures can be interchanged).

As always, Wikipedia has a nice write-up. A C#-specific description can be found here.

What does a closure-based Undo look like?

The legacy code I needed to update maintained the entire object state serialized in XML. This was terrible for a number of reasons, but it did have the advantage of making undo easy in principle; just swap out the new XML with the XML before making the previous API call. I wanted something like this:

public delegate void Action();

public void AddItem(OrderItemStruct itemInfo)
{
    string originalXml = orderXml;
    Action todo = delegate
    {
        OrderApi.AddOrderItem(currentSession, ref itemInfo,
            ref orderXml, out errorCode, out errorMessage);
    };
    Action undo = delegate { orderXml = originalXml; };
    processor.Do(todo, undo);
}

In actual practice, the undo part of that could be wrapped up in some boilerplate code:

public void AddItem(OrderItemStruct itemInfo)
{
    CallApiMethod(delegate
    {
        OrderApi.AddOrderItem(currentSession, ref itemInfo,
            ref orderXml, out errorCode, out errorMessage);
    });
}

private void CallApiMethod(Action method)
{
    string originalXml = orderXml;
    processor.Do(method, delegate { orderXml = originalXml; });
    // error handling, etc…
}

Notice that the undo procedure is referencing originalXml. That variable will be saved with the closure, making for a rather lightweight syntax, even with the static typing.

Getting Started

Implementing a single undo is really quite easy. Here’s a simple test fixture for it:

[Test]
public void SingleUndo()
{
    CommandProcessor processor = new CommandProcessor(5);
    int testValue = 0;
    processor.Do(delegate { testValue++; },
        delegate { testValue--; });

    processor.Undo();

    Assert.AreEqual(0, testValue);
}

…and the code to make it work:

public delegate void Action();

public class CommandProcessor
{
    private CircularBuffer undoBuffer;

    public CommandProcessor(int capacity)
    {
        undoBuffer = new CircularBuffer(capacity);
    }

    public void Do(Action doAction, Action undoAction)
    {
        doAction();
        undoBuffer.Add(undoAction);
    }

    public void Undo()
    {
        if (!undoBuffer.IsEmpty)
        {
            Action action = undoBuffer.Pop();
            action();
        }
    }
}

I won’t go into how CircularBuffer works, but it’s such a simple data structure that you can figure it out.

Naturally, with undo, we’ll want redo:

[Test]
public void SingleRedo()
{
    CommandProcessor processor = new CommandProcessor(5);
    int testValue = 0;
    processor.Do(delegate { testValue++; }, delegate { testValue--; });
    processor.Undo();

    processor.Redo();

    Assert.AreEqual(1, testValue);
}

Conceptually, this should be fairly easy:

public void Undo()
{
    PopAndDo(undoBuffer);
}

public void Redo()
{
    PopAndDo(redoBuffer);
}

private void PopAndDo(CircularBuffer buffer)
{
    if (!buffer.IsEmpty)
    {
        Action action = buffer.Pop();
        action();
    }
}

However, we’re not actually adding anything to the redo buffer yet. What we need to do is rather interesting—we don’t want to add to the redo buffer until Undo is called. Closures to the rescue:

public void  Do(Action doAction, Action undoAction)
{
    doAction();
    undoBuffer.Add(delegate
    {
        undoAction();
        redoBuffer.Add(doAction);
    });
}

But let’s say I undo, redo, and then want to undo and redo again. That won’t work as written, and making it work is starting to get pretty ugly:

public void Do(Action doAction, Action undoAction)
{
    doAction();
    undoBuffer.Add(delegate
    {
        undoAction();
        redoBuffer.Add(delegate
        {
            doAction();
            undoBuffer.Add(delegate
            {
                undoAction();
                redoBuffer.Add(doAction);
            });
        });
    });
}

It’s becoming apparent that what we really want is infinite recursion, lazily-evaluated. How ‘bout a closure?

public void  Do(Action doAction, Action undoAction)
{
    doAction();
    undoBuffer.Add(DecoratedAction(undoAction, undoBuffer, doAction, redoBuffer));
}

private Action DecoratedAction(Action undoAction, CircularBuffer undoBuffer,
        Action redoAction, CircularBuffer redoBuffer)
{
    return delegate
    {
        undoAction();
        redoBuffer.Add(DecoratedAction(
            redoAction, redoBuffer, undoAction, undoBuffer));
    };
}

Now we see how easy it is to decorate closures—remember that the ability to decorate commands is an oft-quoted advantage of them. However, closures provide a more lightweight approach to programming than commands.

The elegance of this approach is hard to deny. All it takes is getting over the conceptual hump that functions are just data. Think about it—we just added a function that took two functions as arguments and returned another function.

What also was apparent to me is how much TDD helped me get to this point. It may not be obvious from the few snippets I’ve shown here, but building up to the DecoratedAction abstraction was a very satisfying experience.

For reference, here’s the full CommandProcessor class. The bit I haven’t shown, CanUndo and CanRedo, along with an event that fires when either one change, is there so that we know when to enable or disable a menu option in a UI.

public class CommandProcessor
{
    public event EventHandler UndoAbilityChanged;

    private CircularBuffer undoBuffer;
    private CircularBuffer redoBuffer;

    public CommandProcessor(int capacity)
    {
        undoBuffer = new CircularBuffer(capacity);
        redoBuffer = new CircularBuffer(capacity);
    }

    public void Do(Action doAction, Action undoAction)
    {
        FireEventIfChanged(delegate
        {
            doAction();

            // Redo only makes sense if we’re redoing a clean undo stack.
            // Once they do something else, redo would corrupt the state.
            redoBuffer.Clear();

            undoBuffer.Add(DecoratedAction(
                undoAction, undoBuffer, doAction, redoBuffer));
        });
    }

    private Action DecoratedAction(Action undoAction, CircularBuffer undoBuffer,
        Action redoAction, CircularBuffer redoBuffer)
    {
        return delegate
        {
            undoAction();
            redoBuffer.Add(DecoratedAction(
                redoAction, redoBuffer, undoAction, undoBuffer));
        };
    }

    public void Undo()
    {
        FireEventIfChanged(delegate { PopAndDo(undoBuffer); });
    }

    public void Redo()
    {
        FireEventIfChanged(delegate { PopAndDo(redoBuffer); });
    }

    public void Clear()
    {
        undoBuffer.Clear();
        redoBuffer.Clear();
    }

    public bool CanUndo
    {
        get { return !undoBuffer.IsEmpty; }
    }

    public bool CanRedo
    {
        get { return !redoBuffer.IsEmpty; }
    }

    private void PopAndDo(CircularBuffer buffer)
    {
        if (!buffer.IsEmpty)
        {
            Action action = buffer.Pop();
            action();
        }
    }

    private void FireEventIfChanged(Action action)
    {
        bool originalCanUndo = CanUndo;
        bool originalCanRedo = CanRedo;

        action();

        if (originalCanUndo != CanUndo || originalCanRedo != CanRedo)
            OnUndoAbilityChanged(EventArgs.Empty);
    }

    protected void OnUndoAbilityChanged(EventArgs e)
    {
        EventUtils.FireEvent(this, e, UndoAbilityChanged);
    }
}

Written by Brandon Byars

November 5, 2007 at 11:26 pm

Posted in .NET, Design Patterns, TDD

Tagged with

Configuring ActiveRecord to work with SQL Server 2005

As much as possible, I like a zero-install configuration. In other words, I want to simply checkout a codebase, run an automated build process, and start working. Ideally, I’d like to be able to do that on a clean machine.

It doesn’t always work, of course. For instance, even though most of our production code is written in .NET, we use Ruby extensively for automation. Installing Ruby is one of those dependencies that we live with. But installing Ruby isn’t enough; we also need Rails (or at least ActiveRecord) for some data management scripts we have, Watir, and some fixes for ActiveRecord to work with SQL Server.

All of that can be done fairly easily by somebody who knows what they’re doing, but new developers often don’t know what they’re doing, and I strive to be dispensable. We wrote a script that configured
Ruby and the necessary libraries to work for our development environment.

First, we needed to install the necessary gems. This is quite easy to do on the command line, but it took me a little digging before I figured out how to do it in code:

needed_gems = ["rails"]

require "rubygems"
Gem.manage_gems

needed_gems.each do |gem|
  puts "Installing gem #{gem}..."
  Gem::GemRunner.new.run ["install", gem, "--include-dependencies"]
end

SQL Server requires an ADO adapter that doesn’t ship with Ruby. You can read about it here. All that’s needed is to download the ADO.rb file (which we have versioned next to our setup script) and copy it to the right place, creating the directory if needed:

setup_dir = File.dirname(__FILE__)
ruby_dir = "C:/ruby/lib/ruby"
ruby_version = 1.8

# The ADO adapter needed for using dbi (the standard db access)
# with SQL Server, which does not come with ruby by default.
# See http://wiki.rubyonrails.org/rails/pages/HowtoConnectToMicrosoftSQLServer.

puts "creating ADO adapter..."
ado_dir = "#{ruby_dir}/site_ruby/#{ruby_version}/DBD/ADO"
system "if not exist #{ado_dir} mkdir #{ado_dir}"
system "copy /Y #{setup_dir}/ADO.rb #{ado_dir}"

Finally, we use SQL Server 2005, and we want to use Windows Authentication for all of our Ruby scripts. Neither SQL Server 2005 nor Windows Authentication is supported by rails out of the box. The problem, described on the SQL Server wiki for rails, is the connection string rails builds. At first, we were taking the suggestions of some of the comments on the wiki and changing the sqlserver_adapter.rb file that ships with rails. This obviously isn’t ideal, so now we monkey-patch it in our code that accesses the database:

module DBI
  # We have to monkey patch this because the SQL Server adapter that comes
  # with rails (active_record\connection_adapters\sqlserver_adapter.rb)
  # doesn't work with SQL Server 2005 or with Integrated Security yet.
  class << self
    alias_method :original_connect, :connect

    def connect(driver_url, username, password)
      # Change to SQL 2005
      driver_url.sub! "SQLOLEDB", "SQLNCLI"

      # Change to Windows Authentication
      driver_url.sub! /User Id=[^;]*;Password=[^;]*;/, "Integrated Security=SSPI;"

      original_connect(driver_url, username, password)
    end
  end
end

And that’s it. You still can’t checkout the codebase and start working on a clean machine, but it’s not bad. Install Ruby, run setup.rb. All database code has been patched to deal with our environment.

Written by Brandon Byars

October 23, 2007 at 7:13 pm

Auto-merging fixes

Paul Gross recently blogged about a rake task to automatically merge changes to the trunk if the change was made in a branch. This seemed like a useful feature, even though we don’t use rake.

Fixing productions bugs and merging is no fun, but why not take some of the pain out of the process? Depending on the developer and their mood, we either fix the bug in the trunk and merge to the branch, or fix it it the branch and merge to the trunk. Where it really gets ugly is when we have to merge two release branches back, because we make our release branch a few days before actually pushing it to production. Any urgent bug fix requiring a merge during that time has to be merged both to the new release branch as well as to the release branch currently in production.

Using Paul’s code as a base, I wrote automerge.rb, which, by default, either merges to the trunk (if you’re in a branch), or merges to the latest release branch (if you’re in the trunk). Alternatively, you can pass a release number, and automerge.rb will merge to that release branch. In all cases, you have to have the working copy checked out on your machine, and, if you’re on Windows, you need to make sure to put patch in your path.

The script assumes that your directory structure looks something like the following:

/trunk
 /branches
   /Release-2.8
   /Release-2.9

The Release-major-minor format is just our branch naming standard; it’s easy to change.

if !ARGV[0].nil?
  dest_dir = "branches/Release-#{ARGV[0]}"
  replace_pattern = /(trunk|branches).*$/i
elsif %x[svn info].include? "branches"
  dest_dir = "trunk"
  replace_pattern = /branches.*$/i
elsif %x[svn info].include? "trunk"
  pattern = /^Release-(\d+).(\d+)\/$/

  branches = %x[svn list http://devtools01:8080/svn/lazarus/branches]
  releases = branches.split.find_all {|branch| branch =~ pattern}

  # sort them in reverse order by major, minor
  releases.sort! do |first, second|
    first_major, first_minor = pattern.match(first)[1..2]
    second_major, second_minor = pattern.match(second)[1..2]

    if first_major != second_major
      second_major.to_i <=> first_major.to_i
    else
      second_minor.to_i <=> first_minor.to_i
    end
  end

  dest_dir = "branches/#{releases[0]}"
  replace_pattern = /trunk/
end

puts "Merging changes into #{dest_dir}.  Don't forget to check these in."
dest_path = Dir.pwd.gsub(replace_pattern, dest_dir)
puts dest_path
system "svn diff | patch -p0 -d #{dest_path}"

The only tricky part here is figuring out the latest release branch, which is done by using the svn list command followed by a custom sort.

Written by Brandon Byars

October 22, 2007 at 8:06 pm

Posted in Configuration Management, Ruby

Tagged with ,

C# Enum Generation

Ayende recently asked on the ALT.NET mailing list about the various methods developers use to provide lookup values, with the question framed as one between lookup tables and enums. My own preference is to use both, but keep it DRY with code generation.

To demonstrate the idea, I wrote a Ruby script that generates a C# enum file from some metadata. I much prefer Ruby to pure .NET solutions like CodeSmith—I find it easier and more powerful (I do think CodeSmith is excellent if there is no Ruby expertise on the team, however). The full source for this example can be grabbed here.

The idea is simple. I want a straightforward and extensible way to provide metadata for lookup values, following the Ruby Way of convention over configuration. XML is very popular in the .NET world, but the Ruby world views it as overly verbose, and prefers lighter markup languages like YAML. For my purposes, I decided not to mess with markup at all (although I’m still considering switching to YAML—the hash of hashes approach describes what I want well). Here’s some example metadata:

enums = {
  'OrderType' => {},
  'MethodOfPayment' => {:table => 'PaymentMethod',},
  'StateProvince' => {:table => 'StateProvinces',
                      :name_column => 'Abbreviation',
                      :id_column => 'StateProvinceId',
                      :transformer => lambda {|value| value.upcase},
                      :filter => lambda {|value| !value.empty?}}
}

That list, which is valid Ruby code, describes three enums, which will be named OrderType, MethodOfPayment, and StateProvince. The intention is that, where you followed your database standards, you should usually be able to get by without adding any extra metadata, as shown in the OrderType example. The code generator will get the ids and enum names from the OrderType table (expecting the columns to be named OrderTypeId and Description) and create the enum from those values. As StateProvince shows, the table name and two column names can be overridden.

More interestingly, you can both transform and filter the enum names by passing lambdas (which are like anonymous delegates in C#). The ‘StateProvince’ example above will filter out any states that, after cleaning up any illegal characters, equal an empty string, and then it will upper case the name.

We use a pre-build event in our project to build the enum file. However, if you simply overwrite the file every time you build, you may slow down the build process considerably. MSBuild (used by Visual Studio) evidently sees that the timestamp has been updated, so it rebuilds the project, forcing a rebuild of all downstream dependent projects. A better solution is to only overwrite the file if there are changes:

require File.dirname(__FILE__) + '/enum_generator'

gen = EnumGenerator.new('localhost', ‘database-name’)
source = gen.generate_all(‘Namespace', enums)

filename = File.join(File.dirname(__FILE__), 'Enums.cs')
if Dir[filename].empty? || source != IO.read(filename)
  File.open(filename, 'w') {|file| file << source}
end

I define the basic templates straight in the EnumGenerator class, but allow them to be swapped out. In theory, the default name column and the default lambda for generating the id column name given the table name (or enum name) could be handled the same way. Below is the EnumGenerator code:

class EnumGenerator
  FILE_TEMPLATE = <<EOT
//------------------------------------------------------------------------------
// <auto-generated>
//     This code was generated by a tool from <%= catalog %> on <%= server %>.
//
//     Changes to this file may cause incorrect behavior and will be lost if
//     the code is regenerated.
// </auto-generated>
//------------------------------------------------------------------------------

namespace <%= namespace %>
{
    <%= enums %>
}
EOT

  ENUM_TEMPLATE = <<EOT
public enum <%= enum_name %>
{
<% values.keys.sort.each_with_index do |id, i| -%>
    <%= values[id] %> = <%= id %><%= ',' unless i == values.length - 1 %>
<% end -%>
}

EOT

  # Change the templates by calling these setters
  attr_accessor :enum_template, :file_template

  attr_reader :server, :catalog

  def initialize(server, catalog)
    @server, @catalog = server, catalog
    @enum_template, @file_template = ENUM_TEMPLATE, FILE_TEMPLATE
  end
end

The code generation uses erb, the standard Ruby templating language:

def transform(template, template_binding)
  erb = ERB.new(template, nil, '-')
  erb.result template_binding
end

template_binding describes the variables available to use in the template in much the same way that Castle Monorail’s PropertyBag describes the variables available to the views. The difference is that, because Ruby is dynamic, you don’t have to explictly add values to the binding. The rest of the code is shown below:

def generate(enum_name, attributes)
  table = attributes[:table] || enum_name
  filter = attributes[:filter] || lambda {|value| true}
  values = enum_values(table, attributes)
  values.delete_if {|key, value| !filter.call(value)}
  transform enum_template, binding
end

def generate_all(namespace, metadata)
  enums = ''
  metadata.keys.sort.each {|enum_name| enums << generate(enum_name, metadata[enum_name])}
  enums = enums.gsub(/\n/m, "\n\t").strip
  transform file_template, binding
end

private
def enum_values(table, attributes)
  sql = get_sql table, attributes
  @dbh ||= DBI.connect("DBI:ADO:Provider=SQLNCLI;server=#{server};database=#{catalog};Integrated Security=SSPI")
  sth = @dbh.execute sql
  values = {}
  sth.each {|row| values[row['Id']] = clean(row['Name'], attributes[:transformer])}
  sth.finish

  values
end

def get_sql(table, attributes)
  id_column = attributes[:id_column] || "#{table}Id"
  name_column = attributes[:name_column] || "Description"
  "SELECT #{id_column} AS Id, #{name_column} AS Name FROM #{table} ORDER BY Id"
end

def clean(enum_value, transformer=nil)
  enum_value = '_' + enum_value if enum_value =~ /^\d/
  enum_value = enum_value.gsub /[^\w]/, ''
  transformer ||= lambda {|value| value}
  transformer.call enum_value
end

Caveat Emptor: I wrote this code from scratch today; it is not the same code we currently use in production. I think it’s better, but if you find a problem with it please let me know.

Written by Brandon Byars

October 21, 2007 at 9:54 pm

Posted in .NET, Code Generation, Ruby

Tagged with ,

On Being Dispensable

A question I like to give when interviewing prospective developers is, “what separates good software developers from bad ones?” I’ve heard different answers, ranging from “passion” to “experience.” I actually consider “experience” to be a pretty poor answer, as experience seems to correlate pretty weakly with skill.

I’ve worked with many bad developers, and I’ve been a bad developer. When I started out, I was just another Mort itching for the next Big Thing from Redmond. However, the one quality I did pick up very early was a sense of modesty about development. In my first job, I said “it’s impossible”—once. Thirty seconds later, I was shown that “it” was, in fact, not impossible, and oh, by the way, here’s how you do it. I haven’t said anything in software development is impossible since; it’s just that some things are harder than others.

I also make a habit of not saying “that makes no sense” when troubleshooting, as I’ve found out all too often that it really does make sense, once you change your way of thinking about the problem. “I don’t understand” is a much more accurate way of describing such problems. Ron Jeffries occasionally quotes Mary Doria Russell in his email signatures, and while I don’t know who Mary Doria Russell is, I certainly agree with her observation: Wisdom begins when we discover the difference between
“That makes no sense” and “I don’t understand.”

But by itself, modesty does not a good developer make. And I can’t look back over my career and say that, the better I’ve gotten, the more modest I’ve become. I’ve been thinking about that quite a bit lately—what can I correlate to skill in my own development?

Dispensability

It was thinking about that question that reminded me of a quote by Jerry Weinburg in a book I read a few years back: “If a programmer is indispensable, get rid of him as quickly as possible.”

No better quote represents the distinction I’ve seen between the good programmers I’ve worked with and the bad ones. Simply put, the bad ones all seem absolutely indispensable. And the more skilled I have become, the more dispensable I have become. A few years ago, I was considered indispensable by my supervisor and peers. Now, I’m interviewing with other companies, and my company and I are considering the ramifications of me leaving soon. What we’re realizing is that the IT department should be fine without me.

Correlating dispensability with skill will likely come across as paradoxical (or just plain wrong) to people who don’t develop software, and to people who do develop software but aren’t very good at it. I say this because, when I first read the aforementioned Weinburg book, I thought it paradoxical, and wrong. After all, I was indispensable. That I also wasn’t any good didn’t occur to me until later.

No, any good developer is a dispensable developer. “That makes no sense,” you ask? Of course it does. What makes a developer dispensable?

Dispensable developers write simple, easy-to-read code. The code is cohesive and reads based on its intent. Bad developers write code that belies the fact that the only thing their interested in is getting something to work. The code therefore lacks cohesion and is a procedural ball of mud as the use case gets more and more involved. Bad code is written from the standpoint of “how can I make the computer do this?” Good code is written from the standpoint of “how can I make this understandable to a person?” It should be no surprise that, were the bad programmer to leave, nobody else would know what the hell to do with their code, whereas when a good programmer leaves, anybody should be able to maintain their code.

Good developers write tests for their code. Not the kind of tests that are written down as a series of steps in an Excel spreadsheet (“open the Doodle form, enter ‘blurf’ in the textbox, click the ‘Snarf’ button, and make sure that 2 appears in the status bar). The kind of tests where, at the press of a button, you can find out whether your changes have broken anything. Automated tests, and for those good developers who are test-infected, there will be no shortage of automated tests to help you out when the good developer leaves.

Good developers get bored doing the same tasks over and over, so they automate them. We have some code we purchased from a vendor. It is not good code. It’s terrible to read, has no tests, and we still can’t really figure out how to deploy it. On the other hand, our code has automated builds and automated deployments. We have automated database creations and automated database migrations. We’re developers, and good developers don’t like doing things that the computer can do better. That makes it much nicer when good developers leave.

Good developers make sure that other people understand their code. At my current job, in the dark days of A Few Years Ago, we all “owned” pieces of the system and worked in separate offices with little communication. Predictably, the system sucked, but we could console ourselves that it sucked because the other person sucked, and besides, we all had nice big offices and felt important. When I became manager, I started introducing the idea of collective code ownership. Then we started meeting face-to-face each day for stand-up meetings to decide who was to do what. Then we started pair programming. Then we gave up our offices and moved into a shared war room environment to do all of our work. With one exception, our developers took the changes well. That one exception was our worst developer, and he’d do anything he could to keep people out of code he had written. He’s been gone for a while now, and we still have trouble working with “his” code. In many cases, we’re seeing it for the first time. The code sucks, it’s untested, and he made sure that nobody else ever understood it.

Another developer moved on last year, but he did everything he could to make sure we knew how to handle what he had worked on. He paired with others often and worked well with the team. Before he left, he gave a presentation with a projector on everything he could think of that we might need to know. While I hated to lose him much more than I hated to lose the bad developer, I have to admit that it’s been much easier to live without him than it has been to live without the bad developer.

Nowadays, we have a very good team. We all work together well, and everybody gets to see everybody’s code. We model our environment on XP, and it’s somewhat enlightening to reflect how XP tends to make team members dispensable over time. Frequent releases? Gotta have automation. Pair programming and collective code ownership? Gotta let others see your code. Simplicity and shared standards? Gotta make sure others can read what you’ve written. Test-driven development and customer tests? Gotta make sure others know if they’ve broken code you’ve written without having to read it.

Good developers are hard to find. But it’s much easier to lose a good developer than it is to lose a bad developer.

Written by Brandon Byars

September 16, 2007 at 7:25 pm

Posted in Agile

log4net Connection String Blues

We use log4net as our production logger, which has proven to be tremendously flexible. However, one problem I ran into was configuring the AdoNetAppender that logs to the database. It expects the connection string to be defined in the configuration file, which I didn’t want to do since it was already defined in our NHibernate config file.

This proved to be a relatively easy fix (found here):

private void  SetConnectionStrings()
{
    Hierarchy hierarchy = LogManager.GetRepository() as Hierarchy;
    if (hierarchy == null)
        return;

    using (UnitOfWork unitOfWork = new UnitOfWork())
    {
        foreach (IAppender appender in hierarchy.GetAppenders())
        {
            AdoNetAppender dbAppender = appender as AdoNetAppender;
            if (dbAppender != null)
            {
                dbAppender.ConnectionString = unitOfWork.ConnectionString;
                dbAppender.ActivateOptions();
            }
        }
    }
}

However, the problem is that log4net whined to standard error about not having the connection string defined. The result was that any console application had its output garbled (including our tests, since some of them used the production logger).

The solution turned out to be going ahead and putting a connection string in the config file, but making it obviously invalid (e.g., “<ignore>”). Then, when the logger is configured, temporarily redirect standard error:

public void ConfigureLogger()
{
    FileInfo file = new FileInfo(ConfigUtils.GetFilePath(“log4net.config”));
    TextWriter stdErr = Console.Error;
    Console.SetError(new StreamWriter(new MemoryStream()));
    XmlConfigurator.ConfigureAndWatch(file);
    ServiceRegistry.Logger = new Log4NetLogger();
    Console.SetError(stdErr);
}

Voila.

Written by Brandon Byars

September 9, 2007 at 10:01 pm

Posted in .NET

Tagged with

Throw Out Those Utility Classes

How many times have you written an xxxUtils class, where xxx is some framework supplied class that you can’t extend or subclass? I always seem to end up with several in any decent sized project, StringUtils, DateUtils, DictionaryUtils, etc. In most cases, these classes are the result of language limitations. In Ruby and Smalltalk, for example, what would be the point of a StringUtils class when you could simply add methods to the String class directly? But C# and Java make String sealed (final) so you can’t even subclass it.

Utility classes like these tend to suffer from logical cohesion. In spite of the friendly-sounding name, logical cohesion is actually a fairly weak form of cohesion; it’s just a loose jumble of functions that have something in common. It can in no way be considered object-oriented.

Our DictionaryUtils makes an interesting case study because it was small. It only did two things: compared two dictionaries key-by-key for equality (useful in testing), and converting the entries to a Ruby-esque string. That last method made me a little jealous of how convenient Hashes are in Ruby:

middlestate:~ bhbyars$ irb
>> {'a' => 1, 'b' => 2, 'c' => 3}
=> {"a"=>1, "b"=>2, "c"=>3}

For the non-Ruby readers, I just created a 3-element Hash in one line. The command-line interpreter spit out a string representation. Our DictionaryUtils.ConvertToText could manage that last bit, but I wanted to be able to create hashtables as easily in C# as I could in Ruby. Naturally, that meant a third method on DictionaryUtils. Or did it?

C# on Hash

DictionaryUtils.Create seemed bloviated and ugly as soon as I first wrote it, so I quickly scratched it out and started a new class:

public class Hash
{
    public static Hashtable New(params object[] keysAndValues)
    {
        if (keysAndValues.Length % 2 != 0)
            throw new ArgumentException(“Hash.New requires an even number of parameters”);

        Hashtable hash = new Hashtable();
        for (int i = 0; i < keysAndValues.Length; i += 2)
        {
            hash[keysAndValues[i]] = keysAndValues[i + 1];
        }
        return hash;
    }
}

This allowed me to create small loaded Hashtables in one line, which was convenient, especially for test methods (although the syntax isn’t as explicit as Ruby’s). I then decided to merge the static DictionaryUtils methods into Hash, as instance methods. First, of course, I had to make Hash an actual dictionary implementation. This was trivial:

private IDictionary proxiedHash;

public Hash(IDictionary dictionary)
{
    proxiedHash = dictionary;
}

public bool Contains(object key)
{
    return proxiedHash.Contains(key);
}

public void Add(object key, object value)
{
    proxiedHash.Add(key, value);
}

public void Clear()
{
    proxiedHash.Clear();
}

// etc…

Then I changed the return value of Hash.New to a Hash instead of a Hashtable. The last line became return new Hash(hash) instead of return hash.

Next I moved the ConvertToText method, which, as an instance method, conveniently mapped to ToString.

public override  string ToString()
{
    SeparatedStringBuilder builder = new SeparatedStringBuilder(", ");
    ICollection keys = CollectionUtils.TryToSort(Keys);
    foreach (object key in keys)
    {
        builder.AppendFormat("{0} => {1}", Encode(key), Encode(this[key]));
    }
    return "{" + builder.ToString() + "}";
}

private object Encode(object value)
{
    if (value == null)
        return "<NULL>";

    IDictionary dictionary = value as IDictionary;
    if (dictionary != null)
        return new Hash(dictionary).ToString();

    if (value is string)
        return "\"" + value + "\"";

    return value;
}

The SeparatedStringBuilder class is a StringBuilder that adds a custom separator between each string. It’s very convenient whenever you’re a building a comma-separated list, as above. It’s proven to be handy in a variety of situations. For example, I’ve used it to build a SQL WHERE clause by making ” AND ” the separator. It’s included with the code download at the bottom of this article.

Notice, also, that we’re still using a CollectionUtils class. Ah, well. I’ve got to have something to look forward to fixing tomorrow…

The DictionaryUtils.AreEqual method conveniently maps to an instance level Equals method:

public override bool Equals(object obj)
{
    IDictionary other = obj as IDictionary;
    if (other == null) return false;
    Hash hash = new Hash(other);
    return hash.ToString() == ToString();
}

public override int GetHashCode()
{
    return proxiedHash.GetHashCode();
}

The syntax is much cleaner than the old DictionaryUtils class. It’s nicely encapsulated, fits conveniently into the framework overrides, and is object-oriented, allowing us to add other utility methods to the Hash class easily. It’s especially nice for testing, since the Equals method will work against any dictionary implementation, not just Hashes:

Assert.AreEqual(Hash.New(“address”, customer.Address), propertyBag);

The approach was simple, relying on proxying for fulfilling the IDictionary implementation (I’m probably abusing the word “proxying,” since we’re not doing anything with the interception. Really, this is nothing more than the Decorator design pattern). That was easy only because the framework actually provided an interface to subtype; the same isn’t true of String and Date. However, it isn’t true of StringBuilder either; if you look at the code, SeparatedStringBuilder looks like a StringBuilder, it talks like a StringBuilder, and it quacks like a StringBuilder, but there is no syntactic relationship between them since StringBuilder is sealed and doesn’t implement an interface. While the need for SeparatedStringBuilder may represent a special case, I think I’d prefer creating similar-looking objects rather than relying on a framework-provided xxx and a custom built xxxUtils class. Proxying, as used by Hash, generally makes such implementations trivial and clean, leaving you to spend your time developing what you really want without making the API unnecessarily ugly.

All the code needed to compile and test the Hash class can be found here.

Written by Brandon Byars

August 28, 2007 at 11:40 pm

Posted in .NET, Design

Tagged with

Follow

Get every new post delivered to your Inbox.