A Day In The Lyf

…the lyf so short, the craft so longe to lerne

Posts Tagged ‘C#

RestMvc – RESTful Goodies for ASP.NET MVC

with 4 comments

Last summer, I found myself building a RESTful ASP.NET MVC service that had an HTML admin UI. Oftentimes, the resource that was being edited in HTML was the same resource that needed to be sent out in XML via the service, which mapped nicely to the REST ‘multiple representations per resource’ philosophy.

There are obviously some very nice RESTful libraries for ASP.NET MVC, but none quite met my needs. Simply Restful Routing, which comes with MVC Contrib, takes a Rails-inspired approach of handing you a pre-built set of routes that more or less match a RESTful contract for a resource. While obviously convenient, that’s never been my preferred way to manage routing. It adds a bunch of routes that you probably have no intention of implementing. It keeps the routes centralized, which never seemed to read as well to me as the way Sinatra keeps the routing configuration next to the block that handles requests to that route.

Additionally, one of the problems I encountered with other routing libraries like Simply Restful is that they define the IRouteHandler internally, which removes your ability to add any custom hooks into the routing process. I needed just such a hook to add content negotiation. I also wanted some RESTful goodies, like responding with a 405 instead of a 404 status code if we did route to a resource (identified by a URI template), but not to a requested HTTP verb on that resource. I wanted the library to automatically deal with HEAD and OPTIONS requests. In the end, I created my own open-source library called RestMvc which provides such goodies with Sinatra-like routing and content negotiation.

Routing

public class OrdersController : Controller
{
    [Get("/orders")]
    public ActionResult Index() { ... }

    [Post("/orders"]
    public ActionResult Create() { ... }

    [Get("/orders/{id}.format", "/orders/{id}")]
    public ActionResult Show(string id) { ... }

    [Put("/orders/{id}")]
    public ActionResult Edit(string id) { ... }

    [Delete("/orders/{id}")]
    public ActionResult Destroy(string id) { ... }
}

Adding the routes for the attributes above is done in Global.asax.cs, in a couple of different ways:


RouteTable.Routes.Map();
// or RouteTable.Routes.MapAssembly(Assembly.GetExecutingAssembly());

That is, in effect, the entire routing API of RestMvc. The Map and MapAssembly extension methods will do the following:

  • Create the routes defined by the HTTP methods and URI templates in the attributes. Even though System.Web.Routing does not allow you to prefix URI templates with either / or ~/, I find allowing those prefixes can enhance readability, and thus they are allowed.
  • Routes HEAD and OPTIONS methods for the two URI templates (“orders” and “orders/{id}”) to a method within RestMVC capable of handling those methods intelligently.
  • Routes PUT and DELETE for /orders, and POST for /orders/{id}, to a method within RestMvc that knows to return a 405 HTTP status code (Method Not Supported) with an appropriate Allow header. This method and the ones that handle HEAD and OPTIONS, work without any subclassing for the Controller as shown above. However, if you need to customize their behavior — for example, to add a body to OPTIONS — you can subclass RestfulController and override the appropriate method.
  • Adds routes for tunnelling PUT and DELETE through POST for HTML browser support. RestMvc takes the Rails approach of looking for a hidden form field called _method set to either PUT or DELETE. If you don’t want the default behavior, or you do want the tunnelling but with a different form field, you can call ResourceMapper directly instead of accepting the defaults that the Map and MapAssembly extension methods provide.
  • Notice the optional format parameter on the Get attribute above the Show method. Routes with an extension are routed such that the extension gets passed as the format parameter, if the resource supports multiple representations (e.g. /orders/1.xml routes to Show with a format of xml). The ordering of the URI templates in the Get attribute is important. Had I reversed the order, /orders/1.xml would have matched with an id of “1.xml” and an empty format
  • The last point is a convenient way to handle multiple formats for a resource. Since it’s in the URL, it can be bookmarked and emailed, or tested through a browser, with the same representation regardless of the HTTP headers. Even if content negotiation is used, it allows you to bypass the standard negotiation process. Note that having different URLs for different representations of the same resource is generally frowned upon by REST purists. RestMvc does not automatically provide these routes for you, but lets you add them if you want.

    Content Negotiation

    Content negotiation is provided as a decorator to the standard RouteHandler. Doing it this way allows you to compose additional custom behavior that needs access to the IRouteHandler.

    // In Global.asax.cs
    var map = new MediaTypeFormatMap();
    map.Add(MediaType.Html, "html");
    map.Add(MediaType.Xhtml, "html");
    map.Add(MediaType.Xml, xml");
    
    var connegRouter = new ContentNegotiationRouteProxy(new MvcRouteHandler(), map);
    
    RouteTable.Routes.MapAssembly(Assembly.GetExecutingAssembly(), connegRouter);

    In the absence of a route URI template specifying the format explicitly, the connegRouter will examine the Accept request header and pick the first media type supported in the map. Wildcard matches are supported (e.g. text/* matches text/html). The format parameter will be set for the route, based on the value added in the MediaTypeFormatMap.

    The content negotiation is quite simple at the moment. The q parameter in the Accept header is completely ignored. By default, it tries to abide by the Accept header prioritization inferred from the order of the MIME types in the header. However, you can change it to allow the server ordering, as defined by the order MIME types are added to the MediaTypeFormatMap, to take priority. This was added to work around what I consider to be a bug in Google Chrome – despite being unable to natively render XML, it prioritizes XML over HTML in its Accept header. The library does not currently support sending back a 406 (Not Acceptable) HTTP status code when no acceptable MIME type is sent in the Accept header.

    Next Steps

    I haven’t worked on RestMvc in a few months, largely because I shifted focus at work and haven’t done any .NET programming in a while. However, I had planned on doing some automatic etagging, and to make the content negotiation more robust.

    Contributors welcome! The code can be found on github.

Written by Brandon Byars

January 6, 2011 at 5:02 pm

Posted in .NET

Tagged with , ,

Funcletize This!

I was recently involved in troubleshooting a bug in our staging environment. We had some code that worked in every environment we had put it in, except staging. Once there, you perform the equivalent of an update on a field (using LINQ in C#), only to be greeted by a ChangeConflictException.

I’m embarrassed by how long it took to figure out what was wrong. It was obviously an optimistic locking problem, and I even mentioned that it was because the UPDATE statement wasn’t updating anything once I first saw the exception. Optimistic locking works by adding extra fields to the WHERE clause to make sure that the data hasn’t changed since you loaded it. If one of those fields had changed, the WHERE clause wouldn’t match anything, and the O/RM would assume that somebody’s changed the data behind your back and throw an exception.

It turns out that failing to match any rows with the given filter isn’t the only way that LINQ will think no rows were updated; it’s also dependent on the NOCOUNT option in SQL Server. If the database is configured to have NOCOUNT ON, then the number of rows affected by each query won’t be sent back to the client. LINQ interprets this lack of information as 0 rows being updated, and thus throws the ChangeConflictException.

In itself, the bug wasn’t very interesting. What is interesting is what we saw when we opened Reflector to look at the LINQ code around the exception:

IExecuteResult IProvider.Execute(Expression query)
{
    // …
    query = Funcletizer.Funcletize(query);
}

Love it. Uniquifiers, Funcletizers, and Daemonizers of the world unite.

Written by Brandon Byars

October 26, 2008 at 12:13 pm

Posted in .NET, Database

Tagged with ,

Orthogonality

Orthogonality means that features can be used in any combination, that the combinations all make sense, and that the meaning of a given feature is consistent, regardless of the other features with which it is combined. The name is meant to draw an explicit analogy to orthogonal vectors in linear algebra: none of the vectors in an orthogonal set depends on (or can be expressed in terms of) the others, and all are needed in order to describe the vector space as a whole.

– Michael Scott (Programming Language Pragmatics)

I’ve used Delphi and Visual Basic at previous jobs. I disliked both of them. VB has this annoying distinction between objects and primitives, so I’d consistently type

Dim server
server = "localhost"

Dim catalog
catalog = CreateObject("MyCompany.MyObject")

…only to be greeted by an ugly Windows popup box that Object doesn't support this property or method. The problem, as all you grizzled VB programmers no doubt spotted immediately, is the last line should start with the keyword Set. VB requires you to prefix assignments on “objects” with Set. But if you try to put a Set in front of assignments on what VB considers primitives (like the first assignment above), you get an error that reads Object required: '[string: "localhost"]'.

Delphi likewise frustrated me with mundane annoyances:

DoOneThing();

if value = goal then
    DoSomething();
else
    DoSomethingElse();

The code above doesn’t compile, and it won’t compile until you remove the semicolon from the call to DoSomething(). Semicolons complete a statement, and the statement is the entire if-else clause.

These problems in VB and Delphi are related to the concept of orthogonality mentioned in the opening quote. VB doesn’t let you compose assignment to objects the same way it lets you compose assignment to primitives. Delphi doesn’t let you end an if clause the same way it lets you end an else clause. These inconsistencies encourage even experienced programmers to make silly syntax mistakes and make the language harder to use.

What is orthogonality?

The key principles I extracted from Michael Scott’s quote listed above are consistency and composability. Composability means that features can be combined, and consistency stands in for the Principle of Least Surprise—features act how you expect they would, regardless of how they’re being combined. VB’s assignments lack consistency. Delphi’s semicolon parsing doesn’t act consistently when composed within a surrounding statement.

Scott claims that a highly orthogonal language is easier to understand and easier to use. Nowhere does he mention that it’s easier to implement. I’m sure the Delphi grammar was simplified considerably by refusing to allow the DoSomething statement to end in a semicolon when contained within an outer if-else clause. It’s also likely that the implementation of Visual Basic was simplified by having the programmer tell the compiler whether an assignment referred to an object or a primitive.

I suspect many non-orthogonal aspects of languages are there to make them easier to implement. However, some languages that are trivially easy to implement can also be amazingly orthogonal. Lisp is a prime example; it is a highly orthogonal language, and yet an entire Lisp interpreter written in Lisp fits on just one page of the Lisp 1.5 Programmer’s Manual.

I found it instructive to list out syntactic constructs that make languages less orthogonal. It’s amazing how mainstream most of them are:

Statements

Statements aren’t necessary, and a number of languages avoid them altogether. Having only expressions makes the language easier to work in. Compare the inconsistency of Delphi’s statement parsing to the composability of Ruby’s expressions:

puts if x == 0
  "Zero"
else
  "Not Zero"
end

The if clause is an expression; it returns the last value evaluated (e.g., either “Zero” or “Not Zero”). And that value can be composed within another expression.

So what happens when you don’t have a reasonable return value? Smalltalk always returns self, which is convenient because it allows method chaining. In the Ruby example above, the entire expression returns the result of the call to puts, which happens to be nil.

The beauty of expressions is that the composability doesn’t just go one level deep:

x = if a == 1; 1; else; 2 end

y = x = if a == 1; 1; else; 2; end

puts y = x = if a == 1; 1; else; 2; end

puts "returns nil" if (puts y = x = if a == 1; 1; else; 2; end).nil?

As the example above shows, orthogonality doesn’t guarantee good code. However, it allows the language to be used in unanticipated ways, which is A Good Thing. Moreover, since everything is an expression, you can put expressions where you wouldn’t normally expect them. For example, in Ruby, the < operator represents inheritance when the receiver is a class, but the superclass could be an expression:

class Test < if x == 1; OneClass; else; TwoClass
end

Pushing a language like this, only to find that it’s turtles all the way down, is a signpost for orthogonality.

Primitives

We already saw the clumsiness of VB primitives, but most mainstream languages share a similar problem. Java, for example, has a confusing dichotomy of longs and Longs, the first a primitive and the second a full-fledged object. C has stack-allocated primitives, which are freed automatically when they fall out of scope, and heap-allocated variables, which you have to free yourself. C# has value types, which force another abstraction – boxing – into the programmer’s lap.

private static object DBNullIf(int value, int nullValue)
{

    return (value != nullValue) ? (object)value : DBNull.Value;

}

The code above should be a head-scratcher. Isn’t object the superclass of all other types? And if so, why do we have to explicitly cast our int variable to an object to make this code compile? Why don’t we have to do the same for the DBNull?

I mentioned above that ease of implementation is a common source of non-orthogonality. With primitives, we can see another, more legitimate reason: performance. There is a cost to keeping primitives out of the language. Below, we’ll see several more non-orthogonal language features that make performance easier to optimize.

Nulls

Nulls have been problematic enough that an entire design pattern has been created to avoid them. Access violations and object reference exceptions are some of the most common programmer errors. Most languages divide object references into two types: those that point to an object, and those that point to nothing. There’s nothing intrinsically non-orthogonal about that division, except that in most languages the references that point to nothing work differently. Instead of returning a value, they throw an exception when dereferenced.

In Ruby, the division is still more or less there — some references point to objects, and some point to nothing — but both types of references still return a value. In effect, Ruby has built the Null Object pattern into the language, as the references that point to nothing return a nil value. But, like everything else in Ruby, nil is an object (of type NilClass), and can be used in expressions:

1.nil?      # returns false
nil.nil?    # returns true

You never get an NullReferenceException in Ruby. Instead, you get a NoMethodError when you try to call a method on nil that doesn’t exist, which is exactly the same error you’d get if you called a method on any object that didn’t exist.

Magic Functions

Most object-oriented languages have certain functions that aren’t really methods. Instead, they’re special extensions to the language that make the class-based approach work.

public class Magic
{
    public static void ClassMethod()
    {
    }

    public Magic()
    {
    }

    public void NormalMethod()
    {
    }
}

Magic.ClassMethod();
Magic magic = new Magic(5);
magic.NormalMethod();

Notice the context shift we make when constructing new objects. Instead of making method calls via the normal syntax (receiver.method()), we use the keyword new and give the class name. But what is the class but a factory for instances, and what is the constructor but a creation method? In Ruby, the constructor is just a normal class-level method:

regex = Regexp.new(phone_pattern)

Typically, the new class-level method allocates an instance and delegates to the newly created instance’s initialize method (which is what most Ruby programmers call the “constructor”). But, since new is just a normal method, if you really wanted to, you could override it and do something different:

class Test
  def self.new
    2 + 2
  end
end

Test.new    # returns 4!!

Operators also tend to be magic methods in many languages. Ruby more or less treats them just like other methods, except the interpreter steps in to break the elegant consistency of everything-as-a-method to provide operator precedence. Smalltalk and Lisp, two other highly orthogonal languages, do not provide operator precedence. Operators are just normal methods (functions in Lisp), and work with the same precedence rules as any other method.

So here we have yet another reason, on top of ease of language implementation and performance, to add non-orthogonality to a language. Ruby adds operator precedence, even though it adds an element of inconsistency, presumably because it makes the language more intuitive. Since intuitiveness is one of the purported benefits of orthogonality, there is a legitimate conflict of interest here. I think I would prefer not having operator precedence, and leaving the language consistent, but it seems more of a question of style than anything else.

Static and sealed Methods

Class-based object-oriented languages impose a dichotomy between classes and instances of those classes. Most of them still allow behavior to exist on classes, but that behavior is treated differently than instance behavior. By declaring class methods as static, you’re telling the compiler that it’s free to compute the address of this function at compile-time, instead of allowing the dynamic binding that gives you polymorphism.

Ruby gives you some degree of polymorphism at the class level (although you can’t call superclass methods, for obvious reasons):

class Animal
  def self.description
    "kingdom Animalia"
  end
end

class Person < Animal
  def self.description
    "semi-evolved simians"
  end
end

In most mainstream languages, not even all instance methods are polymorphic. They are by default in Java, although you can make them statically-bound by declaring them final. C# and C++ take a more extreme approach, forcing you to declare them virtual if you want to use them polymorphically.

Behavior cannot be combined consistently between virtual and sealed (or final) methods. It’s a common complaint when developers try to extend a framework only to find out that the relevant classes are sealed.

Instance Variables

Only the pure functionalists can get by without state; the rest of us need to remember things. But there’s no reason why the mechanism of retrieving stored values has to be treated differently from the mechanism of calling a parameter-less function that computes a value, nor is there a reason that the mechanism for storing a value has to be different from the mechanism of calling a setter function to store the value for you.

It is common OO dogma that state should be private, and if you need to expose it, you should do so through getters and setters. The evolution of the popular Rails framework recently reaffirmed this dogma. In prior versions, sessions were exposed to the controllers via the @session instance variable. When they needed to add some logic to storing and retrieving, they could no longer expose the simple variable, and refactored to a getter/setter attribute access. They were able to do so in a backward-compatible way, by making the @session variable a proxy to an object that managed the logic, but it was still a process that a more orthogonal language wouldn’t have required. The language should not force you to distinguish between field access and method access.

Both Lisp and Eiffel treat instance variables equivalently to function calls (at least when it comes to rvalues). Lisp simply looks up atoms in the environment, and if that atom is a function (lambda), then it can be called to retrieve the value no differently than if the atom is a variable containing the value. Eiffel, an object-oriented language, declares variables and methods using the same keyword (feature), and exposes them – both to the outside world and to the class itself – the same way (Bertrand Meyer called this the Uniform Access principle):

class POINT
feature
    x, y: REAL
            -- Abscissa and ordinate

    rho: REAL is
            -- Distance to origin (0,0)
        do
            Result := sqrt(x^2 + y^2)
        end

Once you get passed Eiffel’s tradition of YELLING TYPES at you and hiding the actual code, Uniform Access makes a lot of sense. Instance variables are just like features without bodies. C# has properties, which provide similar benefits, but force you to explicitly declare them. Instead of a getRho and setRho method, you can have a property that allows clients to use the same syntax regardless of whether they’re using a property or a field. Because Ruby allows the = symbol as part of a method name, it allows a similar syntax.

However, the separation between variables and properties is superfluous. For example, there’s no need for them to have separate access levels. If other classes need the state exposed, then declare it public. If the language doesn’t offer instance variables, then you’re simply exposing a property method. If you run into the same problem that Rails ran into, and suddenly need to add behavior around exposed state, refactoring should be easy. Just add a private property method that is now the state, and leave the public property method.

So, in my hypothetical language, we might have the following:

class Controller
  public feature session
end

And when you feel you need to add behavior around the exposed session dictionary, it should be easy:

class Controller
  public feature session
    # added behavior goes here
    private_session
  end

  private feature private_session
end

One thing that this hypothetical syntax doesn’t allow is separate access levels for getting and setting, but it shows the basic idea.

Inconsistent Access Levels

Since we’re on the subject of access levels, Ruby’s private access level is not very orthogonal at all. Unlike C++ derived languages, Ruby’s private is object-level, not class-level, which means that even other instances of the same class can’t directly access the private method. That’s a reasonable constraint.

However, instead of making object-level private access level orthogonal, the implementors simply disallowed developers to specify the receiver for private methods. This undoubtedly made implementing object-level private access much easier. Unfortunately, it means that you can’t even use self as the receiver within the object itself, which makes moving a method from public or protected to private non-transparent, even if all references to the method are within the class itself:

class TestAccessLevels
  def say_my_name
    puts self.name
  end

  private
  def name
    "Snoopy"
  end
end

# The following line throws an exception
TestAccessLevels.new.say_my_name

Second class types

Having special types, like null, is a special case of having primitives, and it’s a common performance optimization. Being a first-class type has a well-defined meaning. Specifically, its instances can:

  • be passed to functions as parameters
  • be returned from functions
  • be created and stored in variables
  • be created anonymously
  • be created dynamically

It’s becoming increasingly common for modern languages to move towards first-class functions. In the Execution in the Land of the Nouns, Steve Yegge parodied the typical transmogrifications Java programmers had accustomed themselves to in order to sidestep the language’s lack of first-class functions. Java’s cousin (descendant?), C#, has more or less had them since .NET 2.0 in the form of anonymous delegates.

What neither Java nor C# have are first-class classes. Both have reflection, and even allow you to create new types at runtime (painfully…). But, because both languages are statically typed, you can’t access these runtime-created types the same way you can access normal types. Assuming Foo is a runtime-created type, the following code won’t compile:

Foo foo = new Foo();

The only way to make it work is by heavy use of reflection. Ruby, on the other hand, makes it trivially easy to add types at runtime. In fact, all types are added at runtime, since it’s an interpreted language.

Single Return Values

Since you can pass more than one argument to functions, why should you only be allowed to return one value from functions? Languages like ML add orthogonality by returning tuples. Ruby works similarly, unpacking arrays automatically for you if you use multiple lvalues in one expression.

def duplicate(value)
  [value, value]
end

first, second = duplicate("hi there")

This feature allows you to combine lvalues and rvalues more consistently.

Inconsistent Nesting

Nesting is the standard programming trick of limiting scope, and most languages provide blocks that you can nest indefinitely if you want to limit scope. For example, in C#:

public void TestNest()
{
    string outer = "outer";
    {
        string middle = "middle";
        {
            string inner = "inner";
        }
    }
}

In addition to blocks, Ruby allows you to nest functions, but the nesting doesn’t work in an orthogonal way:

def outer
  inner_var = "inner variable"

  def inner
    "inner method"
  end
end

outer
puts inner
puts inner_var

In this example, the last line will throw a NameError, complaining that inner_var is undefined. This is as we should expect – since it’s defined inside an inner scope from where we’re calling it, we should not be able to access it. However, the same is not true for the inner method defined in the call to outer. Despite the fact that it’s defined within a nested scope, it actually has the same scoping as outer.

Ruby’s scoping gets even weirder:

def outer
  begin
    inner_var = "inner"
  end

  inner_var
end

puts outer

This code works, printing “inner” to the console. It shouldn’t.

JavaScript similarly suffers from strange scoping rules. Because all variables have function scope, and not block scope, a variable is available everywhere within a function regardless of where it is defined:

function testScoping(predicate) {
  if (predicate) {
    var test = "in if block";    
  }
  alert(test);    // works, even though it's in an outer block
}

Fixing your mistakes

Speaking of JavaScript, few things bug me more about programming languages than those that try to fix your mistakes for you. JavaScript automatically appends semicolons for you at the end of a line if you forget to. Most of the time, that works fine, but every now and then it creates a ridiculously hard bug to diagnose:

return
{
  value: 0
}

The return statement above looks like it returns an object with a single property. What it really does, though, is simply return. The semicolon was appended on your behalf at the end of the return keyword, turning the following three lines into dead code. It is for this reason that Douglas Crockford recommends always using K&R style braces in JavaScript in JavaScript: The Good Parts.

Summary

Orthogonality makes the language easier to extend in ways that the language implementors didn’t anticipate. For example, Ruby’s first-class types, combined with its duck-typing, allows the popular mocking framework Mocha to have a very nice syntax:

logger = stub
logger.expects(:error).with("Execution error")

do_something(:blow_up, logger)

The fact that classes in Ruby are also objects means that the same syntax works for classes:

File.stubs(:exists?).returns(true)

I picked on JavaScript a couple times, but it did a better job of any other language I know of with literals. JavaScript has literal numbers and strings, like most languages. It has literal regular expressions and lists like Perl and Ruby. But where it really shines is in it’s literal object syntax. Objects are basically dictionaries, and Ruby and Perl have hash literals, but JavaScripts objects include function literals:

function ajaxSort(columnName, sortDirection) {
    setNotice('Sorting...');
    new Ajax.Request("/reports/ajaxSort", {
            method: "get",
            parameters: { sort: columnName, sortDir: sortDirection },
            onSuccess: function (transport) { 
                $("dataTable").innerHTML = transport.responseText;
            },
            onComplete: function () { clearNotice(); }
        }
    );
}

In time, that syntax was leveraged to form the JSON format.

In a more dramatic example, when object-orientation became all the rage, Lisp was able to add object-orientation to the language without changing the language. The Common Lisp Object System (CLOS) is written entirely in Lisp. The reason Lisp was able absorb an entire new paradigm is largely due to the language’s orthogonality. Not all function-like calls work the same way; some are called “special forms” because they act differently (for example, by providing lazy evaluation of the arguments). However, Lisp allows the programmer to create their own special forms by writing macros, which are themselves written in Lisp.

It helps to have turtles all the way down.

Written by Brandon Byars

July 21, 2008 at 10:36 pm

Posted in Languages

Tagged with , , , , , , , ,

Code Generation and Metaprogramming

I wanted to expand upon an idea that I first talked about in my previous post on Common Lisp. There is a common pattern between syntactic macros, runtime metaprogramming, and static code generation.

Runtime metaprogramming is code-generation. Just like C macros. Just like CL macros.

Ok, that’s a bit of an overstatement. Those three things aren’t really just like each other. But they are definitely related—they all write code that you’d rather not write yourself. Because it’s boring. And repetitious. And ugly.

In general, there are three points at which you can generate code in the development process, although the terminology leaves something to be desired: before compilation, during compilation (or interpretation), and during runtime. In the software development vernacular, only the first option is typically called code-generation (I’ll call it static code generation to avoid confusion). Code generation during compilation goes under the moniker of a ‘syntactic macro,’ and I’m calling runtime code generation ‘runtime metaprogramming.’

Since the “meta” in metaprogramming implies writing code that writes code, all three forms of code generation can be considered metaprogramming, which is why I snuck the “runtime” prefix into the third option above. Just in case you were wondering…

Static Code Generation

Static code generation is the easiest to understand and the weakest of the three options, but it’s often your only option due to language limitations. C macros are an example of static code generation, and it is the only metaprogramming option possible with C out-of-the box.

To take an example, on a previous project I generated code for lazy loading proxies in C#. A proxy, one of the standard GoF design patterns, sits in between a client and an object and intercepts messages that the client sends to the object. For lazy loading, this means that we can instantiate a proxy in place of a database-loaded object, and the client can use it without even knowing that it’s using a proxy. For performance reasons, the actual database object will only be loaded on first access of the proxy. Here’s a truncated example:

public class OrderProxy : IOrder
{
    private IOrder proxiedOrder = null;
    private long id;
    private bool isLoaded = false;

    public OrderProxy(long id)
    {
        this.id = id;
    }

    private void Load()
    {
        if (!isLoaded)
        {
           proxiedOrder = Find();
           isLoaded = true;
        }
    }

    private IOrder Find()
    {
        return FinderRegistry.OrderFinder.Find(id);
    }

    public string OrderNumber
    {
        get
        {
           Load();
           return proxiedOrder.OrderNumber;
        }
        set
        {
           Load();
           proxiedOrder.OrderNumber = value;
        }
    }

    public DateTime DateSubmitted
    {
        get
        {
           Load();
           return proxiedOrder.DateSubmitted;
        }
    }
}

This code is boring to write and boring to maintain. Every time the interface changes, a very repetitious change has to be made in the proxy. To make it worse, we have to do this for every database entity we’ll want to load (at least those we’re worried about lazy-loading). All I’d really like to say is “make this class implement the appropriate interface, and make it a lazy-loading proxy.” Fortunately, since the proxy is supposed to be a drop-in replacement for any other class implementing the same interface, we can use reflection to query the interface and statically generate the proxy.

There’s an important limitation to generating this code statically. Because we’re doing this before compilation, this approach requires a separated interfaces approach, where the binary containing the interfaces is separate from the assembly we’re generating the proxies for. We’ll have to compile the interfaces, use reflection on the compiled assembly to generate the source code for the proxies, and compile the newly generated source code.

But it’s do-able. Simply load the interface using reflection:

public static Type GetType(string name, string nameSpace, string assemblyFileName)
{
    if (!File.Exists(assemblyFileName))
        throw new IOException("No such file");

    Assembly assembly = Assembly.LoadFile(Path.GetFullPath(assemblyFileName));
    string qualifiedName = string.Format(“{0}.{1}”, nameSpace, name);
    return assembly.GetType(qualifiedName, true, true);
}

From there it’s pretty trivial to loop through the properties and methods and recreate the source code for them on the proxy, with a call to Load before delegating to the proxied object.

Runtime Metaprogramming

Now it turns out that when I wrote the code generation code above, there weren’t very many mature object-relational mappers in the .NET space. Fortunately, that’s changed, and the code above is no longer necessary. NHibernate will lazy-load for you, using a similar proxy approach that I used above. Except, NHibernate will write the proxy code at runtime.

The mechanics of how this work are encapsulated in a nice little library called Castle.DynamicProxy. NHibernate uses reflection to read interfaces (or virtual classes) and calls DynamicProxy to runtime generate code using the Reflection.Emit namespace. In C#, that’s a difficult thing to do, which is why I wouldn’t recommend doing it unless you use DynamicProxy.

This is a much more powerful technique than static code generation. For starters, you no longer need two assemblies, one for the interfaces, and one for the proxies. But the power of runtime metaprogramming extends well beyond saving you a simple .NET assembly.

Ruby makes metaprogramming much easier than C#. The standard Rails object-relational mapper also uses proxies to manage associations, but the metaprogramming applies even to the model classes themselves (which are equivalent to the classes that implement our .NET interfaces). The truncated IOrder implementation above showed 3 properties: Id, OrderNumber, and DateSubmitted. Assuming we have those columns in our orders table in the database, then the following Ruby class completely implements the same interface:

class Order < ActiveRecord::Base
end

At runtime, The ActiveRecord::Base superclass will load the schema of the orders table, and for each column, add a property to the Order class of the same name. Now we really see the power of metaprogramming: it helps us keep our code DRY. If it’s already specified in the database schema, why should we have to specify it in our application code as well?

Syntactic Macros

It probably wouldn’t make much sense to generate lazy-loading proxies at compile time, but that doesn’t mean syntactic macros don’t have their place. Used appropriately, they can DRY up your code in ways that even runtime metaprogramming cannot.

Peter Seibel gives a good example of building a unit test framework in Common Lisp. The idea is that we’d like to assert certain code is true, but also show the asserted code in our report. For example:

pass ... (= (+ 1 2) 3)
pass ... (= (+ 1 2 3) 6)
pass ... (= (-1 -3) -4)

The code to make this work, assuming report-result is implemented correctly, looks like this:

(defun test-+ ()
  (report-result (= (+ 1 2) 3) '(= (+ 1 2) 3))
  (report-result (= (+ 1 2 3) 6) '(= (+1 2 3) 6))
  (report-result (= (+ -1 -3) -4) '(= (+ -1 -3) -4)))

Notice the ugly duplication in each call to report-result. We have the code that’s actually executed (the first parameter), and the quoted list to report (the second parameter). Runtime metaprogramming could not solve the problem because the first parameter will be evaluated before being passed to report-result. Static code-generation could remove the duplication, but would be ugly. We could DRY up the code at compile time, if only we had access to the abstract syntax tree. Fortunately, in CL, the source code is little more than a textual representation of the AST.

Here’s the macro that Seibel comes up with:

(defmacro check (&body forms)
  `(progn
    ,@(loop for f in forms collect `(report-result ,f ',f))))

Notice how the source code within the list (represented as the loop variable f) is both executed and quoted. The test now becomes much simpler:

(defun test-+ ()
  (check (= (+ 1 2) 3))
  (check (= (+ 1 2 3) 6))
  (check (= (+ -1 -3) -4)))

Summary

Finding ways to eliminate duplication is always A Good Thing. For a long time, if you were programming in a mainstream language, then static code generation was your only option when code generation was needed. Things changed with the advent of reflection based languages, particularly when Java and C# joined the list of mainstream languages. Even though their metaprogramming capability isn’t as powerful as languages like Smalltalk and Ruby, they at least introduced metaprogramming techniques to the masses.

Of course, Lisp has been around since, say, the 1950’s (I’m not sure how long macros have been around, however). Syntactic macros provide a very powerful way of generating code, even letting you change the language. But until more languages implement them, they will never become as popular as they should be.

Written by Brandon Byars

March 29, 2008 at 6:00 pm

Using Closures to Implement Undo

While it seems to be fairly common knowledge in the functional programming world, I don’t think most object-oriented developers realize that closures and objects can be used to implement each other. Ken Dickey showed how it can be done rather easily in Scheme, complete with multiple inheritance and dynamic dispatch.

That’s not to say, of course, that all OO programmers should drop their object hats and run over to the world of functional programming. There is room for multiple paradigms.

Take the well-known Command pattern, often advertised as having two advantages over a more traditional API:

  1. Commands can be easily decorated, giving you some measure of aspect-oriented programming. CruiseControl.NET uses a Command-pattern dispatch for the web interface, and decorates each command with error-handling, etc, providing a nice separation of concerns.
  2. Commands can give you easy undo functionality. Rails migrations are a good example.

Recently, I had to retrofit Undo onto an existing legacy (and ugly) codebase, and I was able to do it quite elegantly with closures instead of commands.

What are closures?

Briefly (since better descriptions lie elsewhere), a closure is a procedure that “remembers” its bindings to free variables, where free variables are those variables that lie outside the procedure itself. The name come from LISP, where the procedure (or “lambda”, as LISPers call them) was said to “close over” its lexical environment. In C# terms, a closure is simply an anonymous delegate with a reference to a free variable, as in:

string mark = “i wuz here”;
DoSomething(delegate { Console.WriteLine(mark); });

Notice that the anonymous delegate references the variable mark. When the delegate is actually called, it will be within a lexical scope that does not include mark. To make that work, the compiler wraps the closure in a class that remembers both the code to execute and any variable bindings (remember – objects and closures can be interchanged).

As always, Wikipedia has a nice write-up. A C#-specific description can be found here.

What does a closure-based Undo look like?

The legacy code I needed to update maintained the entire object state serialized in XML. This was terrible for a number of reasons, but it did have the advantage of making undo easy in principle; just swap out the new XML with the XML before making the previous API call. I wanted something like this:

public delegate void Action();

public void AddItem(OrderItemStruct itemInfo)
{
    string originalXml = orderXml;
    Action todo = delegate
    {
        OrderApi.AddOrderItem(currentSession, ref itemInfo,
            ref orderXml, out errorCode, out errorMessage);
    };
    Action undo = delegate { orderXml = originalXml; };
    processor.Do(todo, undo);
}

In actual practice, the undo part of that could be wrapped up in some boilerplate code:

public void AddItem(OrderItemStruct itemInfo)
{
    CallApiMethod(delegate
    {
        OrderApi.AddOrderItem(currentSession, ref itemInfo,
            ref orderXml, out errorCode, out errorMessage);
    });
}

private void CallApiMethod(Action method)
{
    string originalXml = orderXml;
    processor.Do(method, delegate { orderXml = originalXml; });
    // error handling, etc…
}

Notice that the undo procedure is referencing originalXml. That variable will be saved with the closure, making for a rather lightweight syntax, even with the static typing.

Getting Started

Implementing a single undo is really quite easy. Here’s a simple test fixture for it:

[Test]
public void SingleUndo()
{
    CommandProcessor processor = new CommandProcessor(5);
    int testValue = 0;
    processor.Do(delegate { testValue++; },
        delegate { testValue--; });

    processor.Undo();

    Assert.AreEqual(0, testValue);
}

…and the code to make it work:

public delegate void Action();

public class CommandProcessor
{
    private CircularBuffer undoBuffer;

    public CommandProcessor(int capacity)
    {
        undoBuffer = new CircularBuffer(capacity);
    }

    public void Do(Action doAction, Action undoAction)
    {
        doAction();
        undoBuffer.Add(undoAction);
    }

    public void Undo()
    {
        if (!undoBuffer.IsEmpty)
        {
            Action action = undoBuffer.Pop();
            action();
        }
    }
}

I won’t go into how CircularBuffer works, but it’s such a simple data structure that you can figure it out.

Naturally, with undo, we’ll want redo:

[Test]
public void SingleRedo()
{
    CommandProcessor processor = new CommandProcessor(5);
    int testValue = 0;
    processor.Do(delegate { testValue++; }, delegate { testValue--; });
    processor.Undo();

    processor.Redo();

    Assert.AreEqual(1, testValue);
}

Conceptually, this should be fairly easy:

public void Undo()
{
    PopAndDo(undoBuffer);
}

public void Redo()
{
    PopAndDo(redoBuffer);
}

private void PopAndDo(CircularBuffer buffer)
{
    if (!buffer.IsEmpty)
    {
        Action action = buffer.Pop();
        action();
    }
}

However, we’re not actually adding anything to the redo buffer yet. What we need to do is rather interesting—we don’t want to add to the redo buffer until Undo is called. Closures to the rescue:

public void  Do(Action doAction, Action undoAction)
{
    doAction();
    undoBuffer.Add(delegate
    {
        undoAction();
        redoBuffer.Add(doAction);
    });
}

But let’s say I undo, redo, and then want to undo and redo again. That won’t work as written, and making it work is starting to get pretty ugly:

public void Do(Action doAction, Action undoAction)
{
    doAction();
    undoBuffer.Add(delegate
    {
        undoAction();
        redoBuffer.Add(delegate
        {
            doAction();
            undoBuffer.Add(delegate
            {
                undoAction();
                redoBuffer.Add(doAction);
            });
        });
    });
}

It’s becoming apparent that what we really want is infinite recursion, lazily-evaluated. How ‘bout a closure?

public void  Do(Action doAction, Action undoAction)
{
    doAction();
    undoBuffer.Add(DecoratedAction(undoAction, undoBuffer, doAction, redoBuffer));
}

private Action DecoratedAction(Action undoAction, CircularBuffer undoBuffer,
        Action redoAction, CircularBuffer redoBuffer)
{
    return delegate
    {
        undoAction();
        redoBuffer.Add(DecoratedAction(
            redoAction, redoBuffer, undoAction, undoBuffer));
    };
}

Now we see how easy it is to decorate closures—remember that the ability to decorate commands is an oft-quoted advantage of them. However, closures provide a more lightweight approach to programming than commands.

The elegance of this approach is hard to deny. All it takes is getting over the conceptual hump that functions are just data. Think about it—we just added a function that took two functions as arguments and returned another function.

What also was apparent to me is how much TDD helped me get to this point. It may not be obvious from the few snippets I’ve shown here, but building up to the DecoratedAction abstraction was a very satisfying experience.

For reference, here’s the full CommandProcessor class. The bit I haven’t shown, CanUndo and CanRedo, along with an event that fires when either one change, is there so that we know when to enable or disable a menu option in a UI.

public class CommandProcessor
{
    public event EventHandler UndoAbilityChanged;

    private CircularBuffer undoBuffer;
    private CircularBuffer redoBuffer;

    public CommandProcessor(int capacity)
    {
        undoBuffer = new CircularBuffer(capacity);
        redoBuffer = new CircularBuffer(capacity);
    }

    public void Do(Action doAction, Action undoAction)
    {
        FireEventIfChanged(delegate
        {
            doAction();

            // Redo only makes sense if we’re redoing a clean undo stack.
            // Once they do something else, redo would corrupt the state.
            redoBuffer.Clear();

            undoBuffer.Add(DecoratedAction(
                undoAction, undoBuffer, doAction, redoBuffer));
        });
    }

    private Action DecoratedAction(Action undoAction, CircularBuffer undoBuffer,
        Action redoAction, CircularBuffer redoBuffer)
    {
        return delegate
        {
            undoAction();
            redoBuffer.Add(DecoratedAction(
                redoAction, redoBuffer, undoAction, undoBuffer));
        };
    }

    public void Undo()
    {
        FireEventIfChanged(delegate { PopAndDo(undoBuffer); });
    }

    public void Redo()
    {
        FireEventIfChanged(delegate { PopAndDo(redoBuffer); });
    }

    public void Clear()
    {
        undoBuffer.Clear();
        redoBuffer.Clear();
    }

    public bool CanUndo
    {
        get { return !undoBuffer.IsEmpty; }
    }

    public bool CanRedo
    {
        get { return !redoBuffer.IsEmpty; }
    }

    private void PopAndDo(CircularBuffer buffer)
    {
        if (!buffer.IsEmpty)
        {
            Action action = buffer.Pop();
            action();
        }
    }

    private void FireEventIfChanged(Action action)
    {
        bool originalCanUndo = CanUndo;
        bool originalCanRedo = CanRedo;

        action();

        if (originalCanUndo != CanUndo || originalCanRedo != CanRedo)
            OnUndoAbilityChanged(EventArgs.Empty);
    }

    protected void OnUndoAbilityChanged(EventArgs e)
    {
        EventUtils.FireEvent(this, e, UndoAbilityChanged);
    }
}

Written by Brandon Byars

November 5, 2007 at 11:26 pm

Posted in .NET, Design Patterns, TDD

Tagged with

C# Enum Generation

Ayende recently asked on the ALT.NET mailing list about the various methods developers use to provide lookup values, with the question framed as one between lookup tables and enums. My own preference is to use both, but keep it DRY with code generation.

To demonstrate the idea, I wrote a Ruby script that generates a C# enum file from some metadata. I much prefer Ruby to pure .NET solutions like CodeSmith—I find it easier and more powerful (I do think CodeSmith is excellent if there is no Ruby expertise on the team, however). The full source for this example can be grabbed here.

The idea is simple. I want a straightforward and extensible way to provide metadata for lookup values, following the Ruby Way of convention over configuration. XML is very popular in the .NET world, but the Ruby world views it as overly verbose, and prefers lighter markup languages like YAML. For my purposes, I decided not to mess with markup at all (although I’m still considering switching to YAML—the hash of hashes approach describes what I want well). Here’s some example metadata:

enums = {
  'OrderType' => {},
  'MethodOfPayment' => {:table => 'PaymentMethod',},
  'StateProvince' => {:table => 'StateProvinces',
                      :name_column => 'Abbreviation',
                      :id_column => 'StateProvinceId',
                      :transformer => lambda {|value| value.upcase},
                      :filter => lambda {|value| !value.empty?}}
}

That list, which is valid Ruby code, describes three enums, which will be named OrderType, MethodOfPayment, and StateProvince. The intention is that, where you followed your database standards, you should usually be able to get by without adding any extra metadata, as shown in the OrderType example. The code generator will get the ids and enum names from the OrderType table (expecting the columns to be named OrderTypeId and Description) and create the enum from those values. As StateProvince shows, the table name and two column names can be overridden.

More interestingly, you can both transform and filter the enum names by passing lambdas (which are like anonymous delegates in C#). The ‘StateProvince’ example above will filter out any states that, after cleaning up any illegal characters, equal an empty string, and then it will upper case the name.

We use a pre-build event in our project to build the enum file. However, if you simply overwrite the file every time you build, you may slow down the build process considerably. MSBuild (used by Visual Studio) evidently sees that the timestamp has been updated, so it rebuilds the project, forcing a rebuild of all downstream dependent projects. A better solution is to only overwrite the file if there are changes:

require File.dirname(__FILE__) + '/enum_generator'

gen = EnumGenerator.new('localhost', ‘database-name’)
source = gen.generate_all(‘Namespace', enums)

filename = File.join(File.dirname(__FILE__), 'Enums.cs')
if Dir[filename].empty? || source != IO.read(filename)
  File.open(filename, 'w') {|file| file << source}
end

I define the basic templates straight in the EnumGenerator class, but allow them to be swapped out. In theory, the default name column and the default lambda for generating the id column name given the table name (or enum name) could be handled the same way. Below is the EnumGenerator code:

class EnumGenerator
  FILE_TEMPLATE = <<EOT
//------------------------------------------------------------------------------
// <auto-generated>
//     This code was generated by a tool from <%= catalog %> on <%= server %>.
//
//     Changes to this file may cause incorrect behavior and will be lost if
//     the code is regenerated.
// </auto-generated>
//------------------------------------------------------------------------------

namespace <%= namespace %>
{
    <%= enums %>
}
EOT

  ENUM_TEMPLATE = <<EOT
public enum <%= enum_name %>
{
<% values.keys.sort.each_with_index do |id, i| -%>
    <%= values[id] %> = <%= id %><%= ',' unless i == values.length - 1 %>
<% end -%>
}

EOT

  # Change the templates by calling these setters
  attr_accessor :enum_template, :file_template

  attr_reader :server, :catalog

  def initialize(server, catalog)
    @server, @catalog = server, catalog
    @enum_template, @file_template = ENUM_TEMPLATE, FILE_TEMPLATE
  end
end

The code generation uses erb, the standard Ruby templating language:

def transform(template, template_binding)
  erb = ERB.new(template, nil, '-')
  erb.result template_binding
end

template_binding describes the variables available to use in the template in much the same way that Castle Monorail’s PropertyBag describes the variables available to the views. The difference is that, because Ruby is dynamic, you don’t have to explictly add values to the binding. The rest of the code is shown below:

def generate(enum_name, attributes)
  table = attributes[:table] || enum_name
  filter = attributes[:filter] || lambda {|value| true}
  values = enum_values(table, attributes)
  values.delete_if {|key, value| !filter.call(value)}
  transform enum_template, binding
end

def generate_all(namespace, metadata)
  enums = ''
  metadata.keys.sort.each {|enum_name| enums << generate(enum_name, metadata[enum_name])}
  enums = enums.gsub(/\n/m, "\n\t").strip
  transform file_template, binding
end

private
def enum_values(table, attributes)
  sql = get_sql table, attributes
  @dbh ||= DBI.connect("DBI:ADO:Provider=SQLNCLI;server=#{server};database=#{catalog};Integrated Security=SSPI")
  sth = @dbh.execute sql
  values = {}
  sth.each {|row| values[row['Id']] = clean(row['Name'], attributes[:transformer])}
  sth.finish

  values
end

def get_sql(table, attributes)
  id_column = attributes[:id_column] || "#{table}Id"
  name_column = attributes[:name_column] || "Description"
  "SELECT #{id_column} AS Id, #{name_column} AS Name FROM #{table} ORDER BY Id"
end

def clean(enum_value, transformer=nil)
  enum_value = '_' + enum_value if enum_value =~ /^\d/
  enum_value = enum_value.gsub /[^\w]/, ''
  transformer ||= lambda {|value| value}
  transformer.call enum_value
end

Caveat Emptor: I wrote this code from scratch today; it is not the same code we currently use in production. I think it’s better, but if you find a problem with it please let me know.

Written by Brandon Byars

October 21, 2007 at 9:54 pm

Posted in .NET, Code Generation, Ruby

Tagged with ,

Throw Out Those Utility Classes

How many times have you written an xxxUtils class, where xxx is some framework supplied class that you can’t extend or subclass? I always seem to end up with several in any decent sized project, StringUtils, DateUtils, DictionaryUtils, etc. In most cases, these classes are the result of language limitations. In Ruby and Smalltalk, for example, what would be the point of a StringUtils class when you could simply add methods to the String class directly? But C# and Java make String sealed (final) so you can’t even subclass it.

Utility classes like these tend to suffer from logical cohesion. In spite of the friendly-sounding name, logical cohesion is actually a fairly weak form of cohesion; it’s just a loose jumble of functions that have something in common. It can in no way be considered object-oriented.

Our DictionaryUtils makes an interesting case study because it was small. It only did two things: compared two dictionaries key-by-key for equality (useful in testing), and converting the entries to a Ruby-esque string. That last method made me a little jealous of how convenient Hashes are in Ruby:

middlestate:~ bhbyars$ irb
>> {'a' => 1, 'b' => 2, 'c' => 3}
=> {"a"=>1, "b"=>2, "c"=>3}

For the non-Ruby readers, I just created a 3-element Hash in one line. The command-line interpreter spit out a string representation. Our DictionaryUtils.ConvertToText could manage that last bit, but I wanted to be able to create hashtables as easily in C# as I could in Ruby. Naturally, that meant a third method on DictionaryUtils. Or did it?

C# on Hash

DictionaryUtils.Create seemed bloviated and ugly as soon as I first wrote it, so I quickly scratched it out and started a new class:

public class Hash
{
    public static Hashtable New(params object[] keysAndValues)
    {
        if (keysAndValues.Length % 2 != 0)
            throw new ArgumentException(“Hash.New requires an even number of parameters”);

        Hashtable hash = new Hashtable();
        for (int i = 0; i < keysAndValues.Length; i += 2)
        {
            hash[keysAndValues[i]] = keysAndValues[i + 1];
        }
        return hash;
    }
}

This allowed me to create small loaded Hashtables in one line, which was convenient, especially for test methods (although the syntax isn’t as explicit as Ruby’s). I then decided to merge the static DictionaryUtils methods into Hash, as instance methods. First, of course, I had to make Hash an actual dictionary implementation. This was trivial:

private IDictionary proxiedHash;

public Hash(IDictionary dictionary)
{
    proxiedHash = dictionary;
}

public bool Contains(object key)
{
    return proxiedHash.Contains(key);
}

public void Add(object key, object value)
{
    proxiedHash.Add(key, value);
}

public void Clear()
{
    proxiedHash.Clear();
}

// etc…

Then I changed the return value of Hash.New to a Hash instead of a Hashtable. The last line became return new Hash(hash) instead of return hash.

Next I moved the ConvertToText method, which, as an instance method, conveniently mapped to ToString.

public override  string ToString()
{
    SeparatedStringBuilder builder = new SeparatedStringBuilder(", ");
    ICollection keys = CollectionUtils.TryToSort(Keys);
    foreach (object key in keys)
    {
        builder.AppendFormat("{0} => {1}", Encode(key), Encode(this[key]));
    }
    return "{" + builder.ToString() + "}";
}

private object Encode(object value)
{
    if (value == null)
        return "<NULL>";

    IDictionary dictionary = value as IDictionary;
    if (dictionary != null)
        return new Hash(dictionary).ToString();

    if (value is string)
        return "\"" + value + "\"";

    return value;
}

The SeparatedStringBuilder class is a StringBuilder that adds a custom separator between each string. It’s very convenient whenever you’re a building a comma-separated list, as above. It’s proven to be handy in a variety of situations. For example, I’ve used it to build a SQL WHERE clause by making ” AND ” the separator. It’s included with the code download at the bottom of this article.

Notice, also, that we’re still using a CollectionUtils class. Ah, well. I’ve got to have something to look forward to fixing tomorrow…

The DictionaryUtils.AreEqual method conveniently maps to an instance level Equals method:

public override bool Equals(object obj)
{
    IDictionary other = obj as IDictionary;
    if (other == null) return false;
    Hash hash = new Hash(other);
    return hash.ToString() == ToString();
}

public override int GetHashCode()
{
    return proxiedHash.GetHashCode();
}

The syntax is much cleaner than the old DictionaryUtils class. It’s nicely encapsulated, fits conveniently into the framework overrides, and is object-oriented, allowing us to add other utility methods to the Hash class easily. It’s especially nice for testing, since the Equals method will work against any dictionary implementation, not just Hashes:

Assert.AreEqual(Hash.New(“address”, customer.Address), propertyBag);

The approach was simple, relying on proxying for fulfilling the IDictionary implementation (I’m probably abusing the word “proxying,” since we’re not doing anything with the interception. Really, this is nothing more than the Decorator design pattern). That was easy only because the framework actually provided an interface to subtype; the same isn’t true of String and Date. However, it isn’t true of StringBuilder either; if you look at the code, SeparatedStringBuilder looks like a StringBuilder, it talks like a StringBuilder, and it quacks like a StringBuilder, but there is no syntactic relationship between them since StringBuilder is sealed and doesn’t implement an interface. While the need for SeparatedStringBuilder may represent a special case, I think I’d prefer creating similar-looking objects rather than relying on a framework-provided xxx and a custom built xxxUtils class. Proxying, as used by Hash, generally makes such implementations trivial and clean, leaving you to spend your time developing what you really want without making the API unnecessarily ugly.

All the code needed to compile and test the Hash class can be found here.

Written by Brandon Byars

August 28, 2007 at 11:40 pm

Posted in .NET, Design

Tagged with

Using Higher Order Functions in Windows Forms Applications

My wife is in the middle of a research project comparing diet to the age of reproduction in African house snakes. She has to collect quite a bit of data, and when I finally looked at the spreadsheets she was maintaining, I was ashamed that I had not written something for her earlier.

This was really the first Windows Forms application that I’ve had the opportunity to do in years (my UI’s aren’t very inspiring). However, I have to maintain a couple at work that were primarily written by former colleagues, and I’ve always been a bit dismayed at the enormous amount of duplication that the standard event-driven application generates.

Despite the fact that the application I wrote for my wife was nothing more than a one-off application, one which you don’t expect to have to maintain, I focused on eliminating the duplication I see in the Windows applications at work. The result isn’t something that I would even begin to consider done for a corporate application, but I found the duplication removal techniques worth writing about. The code can be found here.

The biggest gains in removing duplication, and the ones most readers are likely to be least familiar with, are the use of higher order functions. My impression is that most C# developers aren’t very comfortable with higher order functions. Actually, I think that’s probably true for most developers working within mainstream commercially developed (Microsoft, Borland, Sun) languages. They’re simply not emphasized enough.

For example, all the forms had a ListView to display the data. All of them had to define the column header names and the data that goes in each row. It looked something like this:

protected override void AddHeaders()
{
    AddHeader(“Weight”);
    AddHeader(“Length”);
    AddHeader(“HL”);
    AddHeader(“HW”);
}

protected override void AddCells()
{
    AddCell(Weight);
    AddCell(Length);
    AddCell(HeadLength);
    AddCell(HeadWidth);
}

Having the subclass define the column header names and the data that goes in each row didn’t bother me. What did bother me was having to specify the order that the headers and data needed to be shown in two different place. However, while the header names were static, the data would be different for each invocation. The result was to specify the order only once, in an associative array (I used .NET 2.0’s generic Dictionary, which seemed to maintain the order I entered the items). The key would be the column name, and the value would be a function to retrieve the data value.

// The superclass for all Forms…
public class SnakeForm : Form
{
    protected delegate object GetterDelegate(object value);

    private IDictionary associations;

    protected virtual void AddListViewAssociations(
        IDictionary associations)
    {
        throw new NotImplementedException(“Override…”);
    }

    protected virtual IEnumerable ListViewHeaders
    {
        get
        {
            foreach (string header in associations.Keys)
                yield return header;
        }
    }

    protected virtual IEnumerable ListViewValues(object value)
    {
        foreach (GetterDelegate getter in associations.Values)
            yield return getter(value);
    }

    protected virtual void AddCells(object source)
    {
        foreach (object value in ListViewValues(source))
            AddCell(value);
    }

    private void SnakeForm_Load(object sender, EventArgs e)
    {
        associations = new Dictionary();
        AddListViewAssociations(associations);
        AddHeaders();
    }

    private void AddHeaders()
    {
        foreach (string header in ListViewHeaders)
            AddHeader(header);
    }

    private void AddHeader(string name)
    {
        ColumnHeader header = new ColumnHeader();
        header.Text = name;
        lvData.Columns.Add(header);
    }
}

The important things to note are that the subclass is passed, in a template method, a collecting parameter, associations, each entry of which represents a column name along with a way of retrieving the value for a row in that column. The delegate used to retrieve the value can be passed a single state parameter, which is needed by the report forms that need to pass in the source object for each row. Given that information, the superclass can manage most of the work. (AddListViewAssociations would have been abstract, except for the fact that Visual Studio’s designer doesn’t much care for abstract classes.)

For example, here is the information for the measurement form that was first given to show the problem:

protected override void AddListViewAssociations(
    IDictionary associations)
{
    associations.Add(“Weight”, delegate { return Weight; });
    associations.Add(“Length”, delegate { return Length; });
    associations.Add(“HL”, delegate { return HeadLength; });
    associations.Add(“HW”, delegate { return HeadWidth; });
}

One of the benefits of removing the ordering duplication is that the column names now sit beside the functions for retrieving the values, making it easier to understand. Notice that the GetterDelegate definition actually accepts an object parameter. C#’s anonymous delegate syntax lets you ignore unused parameters, making for a somewhat more readable line.

One of the forms shows the information about feedings per snake, and needed that parameter. Below is the entire implementation of the form (aside from the designer-generated code).

// ReportForm is a subclass of SnakeForm
public partial class FeedingBySnakeReport : ReportForm
{
    public FeedingBySnakeReport()
    {
        InitializeComponent();
    }

    protected override void AddListViewAssociations(
        IDictionary associations)
    {
        associations.Add(“Snake”, delegate(object obj)
            { return ((FeedingReportDto)obj).Snake; });
        associations.Add(“Diet”, delegate(object obj)
            { return ((FeedingReportDto)obj).Diet; });
        associations.Add(“Date”, delegate(object obj)
            { return ((FeedingReportDto)obj).Date; });
        associations.Add(“Weight”, delegate(object obj)
            { return ((FeedingReportDto)obj).Weight; });
        associations.Add(“Food Weight”, delegate(object obj)
            { return ((FeedingReportDto)obj).FoodWeight; });
        associations.Add(“Ate?”, delegate(object obj)
            { return ((FeedingReportDto)obj).Ate; });
        associations.Add(“%BM”, delegate(object obj)
            { return ((FeedingReportDto)obj).PercentBodyMass; });
        associations.Add(“Comments”, delegate(object obj)
            { return ((FeedingReportDto)obj).Comments; });
    }

    protected IEnumerable GetReportValues()
    {
        FeedRepository repository = new FeedRepository();
        return repository.FeedingsBySnake(Snake);
    }
}

In case you’re wondering what this form does, it allows you to select a snake, or all snakes, and see the feeding information in the ListView. It also lets you export all the data to a CSV file. Not bad for 30 lines of code.

Another thing that bothered me about all the event handlers was how similar they looked. The workflow was abstracted in the superclass into a HandleEvent method:

protected delegate void EventHandlerDelegate();

protected virtual void HandleEvent(EventHandlerDelegate handler)
{
    Cursor = Cursors.WaitCursor;
    try
    {
        handler();
    }
    catch (Exception ex)
    {
        ShowError(ex.Message);
    }
    finally
    {
        Cursor = Cursors.Default;
    }
}

HandleEvent takes a function that handles the meat of the event handler and wraps it within the code that’s common to all event handlers. Here’s a couple examples:

// In DataEntryForm, an abstract superclass, and subclass of SnakeForm
private void btnSave_Click(object sender, EventArgs)
{
    HandleEvent(delegate
    {
        if (!IsOkToSave())
            return;

        Save();
        AddRow(null);
        FinishListViewUpdate();
        Reset();
    });
}

// In ReportForm, an abstract superclass, and subclass of SnakeForm
private void btnShow_Click(object sender, EventArgs e)
{
    HandleEvent(delegate
    {
        lvData.Items.Clear();

        // GetReportValues() is a template method defined in the subclasses.
        IEnumerable reportValues = GetReportValues();
        foreach (object record in reportValues)
            AddRow(record);
    });
}

Managing the ListView proved to be fertile territory for removing duplication through higher order functions. For example, I used the first row’s data to set the column alignments automatically—if it looked like a number or date, right-align the data; otherwise left-align it.

private void SetAlignments(object record)
{
    int i = 0;

    // A bit hackish, but the report dtos currently provide strings only…
    foreach (object value in ListViewValues(record))
    {
        if (IsNumber(value) || IsDate(value))
            lvData.Columns[i].TextAlign = HorizontalAlignment.Right;
        else
            lvData.Columns[i].TextAlign = HorizontalAlignment.Left;

        i += 1;
    }
}

private bool IsNumber(object value)
{
    try
    {
        double.Parse(value.ToString().Replace(”%”, ””));
        return true;
    }
    catch
    {
        return false;
    }
}

private bool IsDate(object value)
{
    try
    {
        DateTime.Parse(value.ToString());
        return true;
    }
    catch
    {
        return false;
    }
}

Look how alike IsNumber and IsDate look. We can simplify:

private delegate void ParseDelegate(string text);

private bool IsNumber(object value)
{
    return CanParse(value, delegate(string text)
        { double.Parse(text.Replace(”%”, ””)); });
}

private bool IsDate(object value)
{
    return CanParse(value, delegate(string text) { DateTime.Parse(text); });
}

private bool CanParse(object value, ParseDelegate parser)
{
    try
    {
        parser(value.ToString());
        return true;
    }
    catch
    {
        return false;
    }
}

I used a similar trick to auto-size the column widths in the ListView based on the width of the largest item. Here’s the refactored code:

private delegate string GetTextDelegate(int index);

private void AutoSizeListView()
{
    int[] widths = new int[lvData.Columns.Count];
    FillSizes(widths, delegate(int i) { return lvData.Columns[i].Text; });

    foreach (ListViewItem item in lvData.Items)
    {
        FillSizes(widths, delegate(int i) { return item.SubItems[i].Text; });
    }

    for (int i = 0; i < lvData.Columns.Count; i++)
    {
        if (!IsHidden(lvData.Columns[i]))
        {
            lvData.Columns[i].Width = widths[i] + 12;
        }
    }
}

private void FillSizes(int[] widths, GetTextDelegate text)
{
    using (Graphics graphics = CreateGraphics())
    {
        for (int i = 0; i < lvData.Columns.Count; i++)
        {
            SizeF size = graphics.MeasureString(text(i), lvData.Font);
            if (size.Width > widths[i])
                widths[i] = (int)size.Width;
        }
    }
}

private bool IsHidden(ColumnHeader header)
{
    return header.Width == 0;
}

If this were a more long-lived application, I really should have bit the bullet and created my own ListView subclass. The methods above reek of Feature Envy.

Being able to treat functions as first-class objects is extremely useful. For some reason, it doesn’t get the attention it deserves in most development books. And it’s often somewhat obscured by intimidating sounding names like “lambda expressions” thanks to its roots in lambda calculus. However, much of what I was able to do in this application was possible only because I was able treat functions as data and pass them as parameters. And it was helped by the fact that I didn’t have to explicitly define each function as a method; I could create them anonymously like any other data object (although C#’s anonymous delegate syntax is somewhat obscured by the static typing).

Written by Brandon Byars

July 17, 2007 at 12:22 am

Posted in .NET, Design

Tagged with

C# Execute Around Method

Kent Beck called one of the patterns in Smalltalk Best Practice Patterns “Execute Around Method.” It’s a useful pattern for removing duplication in code that requires boilerplate code to be run both before and after the code you really want to write. It’s a much lighter weight method than template methods (no subclassing), which can accomplish the same goal.

As an example, I’ve written the following boilerplate ADO.NET code countless times:

public DataTable GetTable(string query, IDictionary parameters)
{
    using (SqlConnection connection = new SqlConnection(this.connectionString))
    {
        using (SqlCommand command = new SqlCommand(query, connection))
        {
            connection.Open();
            foreach (DictionaryEntry parameter in parameters)
            {
                command.Parameters.AddWithValue(
                    parameter.Key.ToString(), parameter.Value);
            }

            SqlDataAdapter adapter = new SqlDataAdapter(command);
            using (DataSet dataset = new DataSet())
            {
                adapter.Fill(dataset);
                return dataset.Tables0;
            }
        }
    }
}

public void Exec(string query, IDictionary parameters)
{
    using (SqlConnection connection = new SqlConnection(this.connectionString))
    {
        using (SqlCommand command = new SqlCommand(query, connection))
        {
            connection.Open();
            foreach (DictionaryEntry parameter in parameters)
            {
                command.Parameters.AddWithValue(
                    parameter.Key.ToString(), parameter.Value);
            }

            command.ExecuteNonQuery();
        }
    }
}

Notice that the connection and parameter management overwhelms the actual code that each method is trying to get to. And the duplication means I have multiple places to change when I decide to do something differently. However, since the using block encloses the relevant code, a simple Extract Method refactoring is not as easy to see.

Here’s the result of applying an Execute Around Method pattern to it.

private delegate object SqlCommandDelegate(SqlCommand command);

public DataTable GetTable(string query, IDictionary parameters)
{
    return (DataTable)ExecSql(query, parameters, delegate(SqlCommand command)
    {
        SqlDataAdapter adapter = new SqlDataAdapter(command);
        using (DataSet dataset = new DataSet())
        {
            adapter.Fill(dataset);
            return dataset.Tables0;
        }
    });
}

public void Exec(string query, IDictionary parameters)
{
    ExecSql(query, parameters, delegate(SqlCommand command)
    {
        return command.ExecuteNonQuery();
    });
}

private object ExecSql(string query, IDictionary parameters,
    SqlCommandDelegate action)
{
    using (SqlConnection connection = new SqlConnection(this.onnectionString))
    {
        using (SqlCommand command = new SqlCommand(query, connection))
        {
            connection.Open();
            foreach (DictionaryEntry parameter in parameters)
            {
                command.Parameters.AddWithValue(
                    parameter.Key.ToString(), parameter.Value);
            }

            return action(command);
        }
    }
}

Much nicer, no?

Written by Brandon Byars

June 11, 2007 at 11:46 pm

Posted in .NET, Design Patterns

Tagged with

.NET Database Migrations

Pramod Sadalage and Scott Ambler have suggested using a series of numbered change scripts to version your database. Start with a base schema, and every subsequent change gets its own change script, grabbing the next number. That version number is stored in a table in the database, which makes it easy to update—you just run all change scripts, in order, greater than the version stored in your database.

The Ruby on Rails team implemented this technique in their migrations code. It’s quite elegant. This blog uses a Rails application called Typo; here’s one of its migrations:

class AddArticleUserId < ActiveRecord::Migration
  def self.up
    add_column :articles, :user_id, :integer

    puts "Linking article authors to users"
    Article.find(:all).each do |a|
      u=User.find_by_name(a.author)
      if(u)
        a.user=u
        a.save
      end
    end
  end

  def self.down
    remove_column :articles, :user_id
  end
end

That migration is called 3_add_article_user_id.rb, where 3 is the version number. Notice that it’s written in Ruby, not in SQL. It adds a column called user_id to the articles table and updates the data. The data update is particularly interesting—we get to use the ActiveRecord O/RM code instead of having to do it in SQL (although you can use SQL if you need to). The Rails migration code can also rollback changes; that’s what the down method is for.

The problem I’ve always had with this scheme is that we have many database objects that I’d like to version in their own files in our source control system. For example, here’s our directory structure:

db/
  functions/
  migrations/
  procedures/
  triggers/
  views/

We have several files in each directory, and it’s convenient to keep them that way so we can easily check a subversion log and see the history of changes for the database object. For us to use the migrations scheme above, we’d have to create a stored procedure in a migration, and later alter it in a separate migration. Since the two migrations will be in separate files, our source control wouldn’t give us a version history of that stored procedure.

We came up with a hybrid solution. Schema changes to the tables use a migration scheme like Rails. Database objects are versioned in separate files. Both the schema changes and the peripheral database object changes are updated when we update the database.

For this to work, we have to be a little careful with how we create the database objects. We want them to work regardless of whether we’re creating them for the first time or updating them, which means ALTER statements won’t work. The solution is simply to drop the object if it exists, and then create it. This is a fairly common pattern.

I wrote an NAnt and MSBuild task to do the dirty work. It runs both the schema migrations and the database object updates. Both are optional, so if migrations are all you want, that’s all you need to use. It expects all migrations to be in the same directory, and match the pattern 1.comment.sql, where 1 is the version number. It will be stored in a database table whose default name is SchemaVersion, with the following structure:

CREATE TABLE SchemaVersion (
  Version int,
  MigrationDate datetime,
  Comment varchar(255)
)

I’ve only tested it on SQL Server, but I think the task should work for other DBMS’s as well (it uses OLEDB). Migrations can contain batches (using the SQL Server GO command) and are run transactionally. Unlike the Rails example, the .NET migrations use SQL, and I don’t yet have any rollback functionality.

You can include any extra SQL files you want in the DatabaseObjects property. Both NAnt and MSBuild have convenient ways to recursively add all files matching an extension.

Here’s an NAnt example:

<target name="migrate" description="Update the database">
    <loadtasks assembly="Migrations.dll" />
    <migrateDatabase
        connectionString="${connectionString}"
        migrationsDirectory="db/migrations"
        commandTimeout="600"
        batchSeparator="go">
        <fileset>
            <include name="db/functions/**/*.sql"/>
            <include name="db/procedures/**/*.sql"/>
            <include name="db/triggers/**/*.sql"/>
            <include name="db/views/**/*.sql"/>
        </fileset>
    </migrateDatabase>
</target>

And here it is using MSBuild:

<ItemGroup>
    <DatabaseObjects Include="db/functions/**/*.sql"/>
    <DatabaseObjects Include="db/procedures/**/*.sql"/>
    <DatabaseObjects Include="db/triggers/**/*.sql"/>
    <DatabaseObjects Include="db/views/**/*.sql"/>
</ItemGroup>

<Target Name="dbMigrate">
    <MigrateDatabase 
        ConnectionString="$(ConnectionString)"
        MigrationsDirectory="db/migrations"
        DatabaseObjects="@(DatabaseObjects)"
        CommandTimeout="600"
        TableName="version_info" />
</Target>

The source code and binaries can be found here.

Written by Brandon Byars

April 14, 2007 at 10:35 pm

Follow

Get every new post delivered to your Inbox.