A Day In The Lyf

…the lyf so short, the craft so longe to lerne

Archive for the ‘Code Generation’ Category

Code Generation and Metaprogramming

I wanted to expand upon an idea that I first talked about in my previous post on Common Lisp. There is a common pattern between syntactic macros, runtime metaprogramming, and static code generation.

Runtime metaprogramming is code-generation. Just like C macros. Just like CL macros.

Ok, that’s a bit of an overstatement. Those three things aren’t really just like each other. But they are definitely related—they all write code that you’d rather not write yourself. Because it’s boring. And repetitious. And ugly.

In general, there are three points at which you can generate code in the development process, although the terminology leaves something to be desired: before compilation, during compilation (or interpretation), and during runtime. In the software development vernacular, only the first option is typically called code-generation (I’ll call it static code generation to avoid confusion). Code generation during compilation goes under the moniker of a ‘syntactic macro,’ and I’m calling runtime code generation ‘runtime metaprogramming.’

Since the “meta” in metaprogramming implies writing code that writes code, all three forms of code generation can be considered metaprogramming, which is why I snuck the “runtime” prefix into the third option above. Just in case you were wondering…

Static Code Generation

Static code generation is the easiest to understand and the weakest of the three options, but it’s often your only option due to language limitations. C macros are an example of static code generation, and it is the only metaprogramming option possible with C out-of-the box.

To take an example, on a previous project I generated code for lazy loading proxies in C#. A proxy, one of the standard GoF design patterns, sits in between a client and an object and intercepts messages that the client sends to the object. For lazy loading, this means that we can instantiate a proxy in place of a database-loaded object, and the client can use it without even knowing that it’s using a proxy. For performance reasons, the actual database object will only be loaded on first access of the proxy. Here’s a truncated example:

public class OrderProxy : IOrder
{
    private IOrder proxiedOrder = null;
    private long id;
    private bool isLoaded = false;

    public OrderProxy(long id)
    {
        this.id = id;
    }

    private void Load()
    {
        if (!isLoaded)
        {
           proxiedOrder = Find();
           isLoaded = true;
        }
    }

    private IOrder Find()
    {
        return FinderRegistry.OrderFinder.Find(id);
    }

    public string OrderNumber
    {
        get
        {
           Load();
           return proxiedOrder.OrderNumber;
        }
        set
        {
           Load();
           proxiedOrder.OrderNumber = value;
        }
    }

    public DateTime DateSubmitted
    {
        get
        {
           Load();
           return proxiedOrder.DateSubmitted;
        }
    }
}

This code is boring to write and boring to maintain. Every time the interface changes, a very repetitious change has to be made in the proxy. To make it worse, we have to do this for every database entity we’ll want to load (at least those we’re worried about lazy-loading). All I’d really like to say is “make this class implement the appropriate interface, and make it a lazy-loading proxy.” Fortunately, since the proxy is supposed to be a drop-in replacement for any other class implementing the same interface, we can use reflection to query the interface and statically generate the proxy.

There’s an important limitation to generating this code statically. Because we’re doing this before compilation, this approach requires a separated interfaces approach, where the binary containing the interfaces is separate from the assembly we’re generating the proxies for. We’ll have to compile the interfaces, use reflection on the compiled assembly to generate the source code for the proxies, and compile the newly generated source code.

But it’s do-able. Simply load the interface using reflection:

public static Type GetType(string name, string nameSpace, string assemblyFileName)
{
    if (!File.Exists(assemblyFileName))
        throw new IOException("No such file");

    Assembly assembly = Assembly.LoadFile(Path.GetFullPath(assemblyFileName));
    string qualifiedName = string.Format(“{0}.{1}”, nameSpace, name);
    return assembly.GetType(qualifiedName, true, true);
}

From there it’s pretty trivial to loop through the properties and methods and recreate the source code for them on the proxy, with a call to Load before delegating to the proxied object.

Runtime Metaprogramming

Now it turns out that when I wrote the code generation code above, there weren’t very many mature object-relational mappers in the .NET space. Fortunately, that’s changed, and the code above is no longer necessary. NHibernate will lazy-load for you, using a similar proxy approach that I used above. Except, NHibernate will write the proxy code at runtime.

The mechanics of how this work are encapsulated in a nice little library called Castle.DynamicProxy. NHibernate uses reflection to read interfaces (or virtual classes) and calls DynamicProxy to runtime generate code using the Reflection.Emit namespace. In C#, that’s a difficult thing to do, which is why I wouldn’t recommend doing it unless you use DynamicProxy.

This is a much more powerful technique than static code generation. For starters, you no longer need two assemblies, one for the interfaces, and one for the proxies. But the power of runtime metaprogramming extends well beyond saving you a simple .NET assembly.

Ruby makes metaprogramming much easier than C#. The standard Rails object-relational mapper also uses proxies to manage associations, but the metaprogramming applies even to the model classes themselves (which are equivalent to the classes that implement our .NET interfaces). The truncated IOrder implementation above showed 3 properties: Id, OrderNumber, and DateSubmitted. Assuming we have those columns in our orders table in the database, then the following Ruby class completely implements the same interface:

class Order < ActiveRecord::Base
end

At runtime, The ActiveRecord::Base superclass will load the schema of the orders table, and for each column, add a property to the Order class of the same name. Now we really see the power of metaprogramming: it helps us keep our code DRY. If it’s already specified in the database schema, why should we have to specify it in our application code as well?

Syntactic Macros

It probably wouldn’t make much sense to generate lazy-loading proxies at compile time, but that doesn’t mean syntactic macros don’t have their place. Used appropriately, they can DRY up your code in ways that even runtime metaprogramming cannot.

Peter Seibel gives a good example of building a unit test framework in Common Lisp. The idea is that we’d like to assert certain code is true, but also show the asserted code in our report. For example:

pass ... (= (+ 1 2) 3)
pass ... (= (+ 1 2 3) 6)
pass ... (= (-1 -3) -4)

The code to make this work, assuming report-result is implemented correctly, looks like this:

(defun test-+ ()
  (report-result (= (+ 1 2) 3) '(= (+ 1 2) 3))
  (report-result (= (+ 1 2 3) 6) '(= (+1 2 3) 6))
  (report-result (= (+ -1 -3) -4) '(= (+ -1 -3) -4)))

Notice the ugly duplication in each call to report-result. We have the code that’s actually executed (the first parameter), and the quoted list to report (the second parameter). Runtime metaprogramming could not solve the problem because the first parameter will be evaluated before being passed to report-result. Static code-generation could remove the duplication, but would be ugly. We could DRY up the code at compile time, if only we had access to the abstract syntax tree. Fortunately, in CL, the source code is little more than a textual representation of the AST.

Here’s the macro that Seibel comes up with:

(defmacro check (&body forms)
  `(progn
    ,@(loop for f in forms collect `(report-result ,f ',f))))

Notice how the source code within the list (represented as the loop variable f) is both executed and quoted. The test now becomes much simpler:

(defun test-+ ()
  (check (= (+ 1 2) 3))
  (check (= (+ 1 2 3) 6))
  (check (= (+ -1 -3) -4)))

Summary

Finding ways to eliminate duplication is always A Good Thing. For a long time, if you were programming in a mainstream language, then static code generation was your only option when code generation was needed. Things changed with the advent of reflection based languages, particularly when Java and C# joined the list of mainstream languages. Even though their metaprogramming capability isn’t as powerful as languages like Smalltalk and Ruby, they at least introduced metaprogramming techniques to the masses.

Of course, Lisp has been around since, say, the 1950’s (I’m not sure how long macros have been around, however). Syntactic macros provide a very powerful way of generating code, even letting you change the language. But until more languages implement them, they will never become as popular as they should be.

Advertisements

Written by Brandon Byars

March 29, 2008 at 6:00 pm

C# Enum Generation

Ayende recently asked on the ALT.NET mailing list about the various methods developers use to provide lookup values, with the question framed as one between lookup tables and enums. My own preference is to use both, but keep it DRY with code generation.

To demonstrate the idea, I wrote a Ruby script that generates a C# enum file from some metadata. I much prefer Ruby to pure .NET solutions like CodeSmith—I find it easier and more powerful (I do think CodeSmith is excellent if there is no Ruby expertise on the team, however). The full source for this example can be grabbed here.

The idea is simple. I want a straightforward and extensible way to provide metadata for lookup values, following the Ruby Way of convention over configuration. XML is very popular in the .NET world, but the Ruby world views it as overly verbose, and prefers lighter markup languages like YAML. For my purposes, I decided not to mess with markup at all (although I’m still considering switching to YAML—the hash of hashes approach describes what I want well). Here’s some example metadata:

enums = {
  'OrderType' => {},
  'MethodOfPayment' => {:table => 'PaymentMethod',},
  'StateProvince' => {:table => 'StateProvinces',
                      :name_column => 'Abbreviation',
                      :id_column => 'StateProvinceId',
                      :transformer => lambda {|value| value.upcase},
                      :filter => lambda {|value| !value.empty?}}
}

That list, which is valid Ruby code, describes three enums, which will be named OrderType, MethodOfPayment, and StateProvince. The intention is that, where you followed your database standards, you should usually be able to get by without adding any extra metadata, as shown in the OrderType example. The code generator will get the ids and enum names from the OrderType table (expecting the columns to be named OrderTypeId and Description) and create the enum from those values. As StateProvince shows, the table name and two column names can be overridden.

More interestingly, you can both transform and filter the enum names by passing lambdas (which are like anonymous delegates in C#). The ‘StateProvince’ example above will filter out any states that, after cleaning up any illegal characters, equal an empty string, and then it will upper case the name.

We use a pre-build event in our project to build the enum file. However, if you simply overwrite the file every time you build, you may slow down the build process considerably. MSBuild (used by Visual Studio) evidently sees that the timestamp has been updated, so it rebuilds the project, forcing a rebuild of all downstream dependent projects. A better solution is to only overwrite the file if there are changes:

require File.dirname(__FILE__) + '/enum_generator'

gen = EnumGenerator.new('localhost', ‘database-name’)
source = gen.generate_all(‘Namespace', enums)

filename = File.join(File.dirname(__FILE__), 'Enums.cs')
if Dir[filename].empty? || source != IO.read(filename)
  File.open(filename, 'w') {|file| file << source}
end

I define the basic templates straight in the EnumGenerator class, but allow them to be swapped out. In theory, the default name column and the default lambda for generating the id column name given the table name (or enum name) could be handled the same way. Below is the EnumGenerator code:

class EnumGenerator
  FILE_TEMPLATE = <<EOT
//------------------------------------------------------------------------------
// <auto-generated>
//     This code was generated by a tool from <%= catalog %> on <%= server %>.
//
//     Changes to this file may cause incorrect behavior and will be lost if
//     the code is regenerated.
// </auto-generated>
//------------------------------------------------------------------------------

namespace <%= namespace %>
{
    <%= enums %>
}
EOT

  ENUM_TEMPLATE = <<EOT
public enum <%= enum_name %>
{
<% values.keys.sort.each_with_index do |id, i| -%>
    <%= values[id] %> = <%= id %><%= ',' unless i == values.length - 1 %>
<% end -%>
}

EOT

  # Change the templates by calling these setters
  attr_accessor :enum_template, :file_template

  attr_reader :server, :catalog

  def initialize(server, catalog)
    @server, @catalog = server, catalog
    @enum_template, @file_template = ENUM_TEMPLATE, FILE_TEMPLATE
  end
end

The code generation uses erb, the standard Ruby templating language:

def transform(template, template_binding)
  erb = ERB.new(template, nil, '-')
  erb.result template_binding
end

template_binding describes the variables available to use in the template in much the same way that Castle Monorail’s PropertyBag describes the variables available to the views. The difference is that, because Ruby is dynamic, you don’t have to explictly add values to the binding. The rest of the code is shown below:

def generate(enum_name, attributes)
  table = attributes[:table] || enum_name
  filter = attributes[:filter] || lambda {|value| true}
  values = enum_values(table, attributes)
  values.delete_if {|key, value| !filter.call(value)}
  transform enum_template, binding
end

def generate_all(namespace, metadata)
  enums = ''
  metadata.keys.sort.each {|enum_name| enums << generate(enum_name, metadata[enum_name])}
  enums = enums.gsub(/\n/m, "\n\t").strip
  transform file_template, binding
end

private
def enum_values(table, attributes)
  sql = get_sql table, attributes
  @dbh ||= DBI.connect("DBI:ADO:Provider=SQLNCLI;server=#{server};database=#{catalog};Integrated Security=SSPI")
  sth = @dbh.execute sql
  values = {}
  sth.each {|row| values[row['Id']] = clean(row['Name'], attributes[:transformer])}
  sth.finish

  values
end

def get_sql(table, attributes)
  id_column = attributes[:id_column] || "#{table}Id"
  name_column = attributes[:name_column] || "Description"
  "SELECT #{id_column} AS Id, #{name_column} AS Name FROM #{table} ORDER BY Id"
end

def clean(enum_value, transformer=nil)
  enum_value = '_' + enum_value if enum_value =~ /^\d/
  enum_value = enum_value.gsub /[^\w]/, ''
  transformer ||= lambda {|value| value}
  transformer.call enum_value
end

Caveat Emptor: I wrote this code from scratch today; it is not the same code we currently use in production. I think it’s better, but if you find a problem with it please let me know.

Written by Brandon Byars

October 21, 2007 at 9:54 pm

Posted in .NET, Code Generation, Ruby

Tagged with ,

Ruby and SQL DMO

We have a rather large legacy system that we are trying to inject a healthy dose of agility into. One of the biggest challenges has always been versioning the database, which is large, clumsy, and cluttered. We managed to get the rest of the code under a continuous integration scheme, but every time somebody made a schema change, or even updated a stored procedure needed for some tests to work, manual intervention was needed.

Pramod Sadalage and Martin Fowler wrote one of the first articles on iterative database design, and Pramod later teamed up with Scott Ambler to collaborate on Refactoring Databases. The advice, adopted by the Ruby on Rails team, was to create a separate migration file for each schema change, and number them sequentially. For example, the first migration would be 1.sql, then 2.sql, and so on. You could store the latest migration file run on a database in a version table, which would make updating a database as easy as running every migration, in order, whose version number is greater than the one stored in your database table.

We managed to do something similar, although it required a number of changes. But before it could all work, you need a starting point—a base schema to create a new developer database. We use SQL Server 2000, so initially I simply had Enterprise Manager create a SQL script for me. Not only did it not work (I don’t think it sorted the dependencies right), it was a ghastly nightmare to look at.

Why do standard vendor-supplied code generation tools create such ugly code?

I decided to do the code generation myself using SQL DMO (the same COM interfaces that Enterprise Manager was using, just poorly). I’d successfully used Ruby and ERB for code generation before, and discovered I could make very nice looking code (Code Generation in Action by Jack Herrington describes the concept nicely). Within just a couple hours, I had a SQL script that not only looked much nicer than anything Enterprise Manager spits out; but worked to boot.

First, I needed to connect to the database I wanted to dump into a SQL file:

require 'win32ole'

class SqlServer
  def initialize(server, catalog)
    sql = WIN32OLE.new("SQLDMO.SQLServer")
    sql.LoginSecure = 1
    sql.Connect(server)
    @db = sql.Databases(catalog)
  end
end

This uses Windows Authentication to connect to the appropriate SQL Server database. The DMO interface, described here, is one of those clumsy, wandering APIs commonly developed when VBScript is your target language (warning, the MSDN website for DMO only seems to work in IE, and poorly at that). I decided to wrap some of the classes to make it easier to use. First, a superclass:

class DBObject
  def initialize(object)
    @object = object
  end

  def name
    return @object.Name unless @object.Name =~ / /
    "[#{@object.Name}]"
  end

  def beautify(text)
    # strip brackets if they're not needed, and remove the .dbo prefix
    text.gsub!(/\[([^ ]*?)\]/, '\1')
    text.gsub(/dbo\./, '')
  end
end

Here I provide one of the keys to generating code that looks appealing. I really don’t like to look at all the noise of bracketing all the tables and column names, just in case the name contains a space in it. The following looks ugly to me:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[tablename] (
    [id] [int] NOT NULL,
    [description] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
) ON [PRIMARY]

That drives me wild. I don’t ever change any of the settings for ANSI nulls, etc, and I could care less what the collation is so long as it matches the database default. Nor do I care about the file group. It’s hard to see through all the noise.

Here’s what I want:

CREATE TABLE tablename (
    id int NOT NULL
    description varchar(100) NOT NULL
)

Our name and beautify methods will help us achieve prettier code. Here’s the most important subclass:

class Table < DBObject
  attr_reader :columns, :constraints, :keys, :indexes, :references

  def initialize(table, connection)
    super(table)
    @columns, @constraints, @keys = [], [], []
    table.Columns.each { |col| @columns << Column.new(col) }
    table.Checks.each { |check| @constraints << CheckConstraint.new(check) }
    table.Keys.each { |key| @keys << Key.new(key, connection) }
    get_indexes(table)
    get_references
  end

  private
  def get_references
    @references = []
    @keys.each do |key|
      if "" != key.referenced_table and name != key.referenced_table
        @references << key.referenced_table unless references.include?(key.referenced_table)
      end
    end
  end

  def get_indexes(table)
    @indexes = []
    # the Indexes collection includes objects already in Keys and statistics
    keys = @keys.map { |key| key.name }
    table.Indexes.each do |index|
      if not keys.include?(index.Name)
        if index.Type == 16 || index.Type == 0
          @indexes << Index.new(index)
        end
      end
    end
  end
end

You can find the classes it depends on by downloading all of the code here. Notice, however, that a database connection is needed for the Key constructor. As far as I could tell, there was no way, using nothing more than DMO, to find out if a key cascade deletes. I had to query the INFORMATION_SCHEMA views to find that information.

For our script to work, we’re going to need to order our dependencies correctly. The SQL script will fail if we try to add a foreign key to a table that doesn’t yet exist. The following should do the trick:

class SqlServer
  # returns a topological sort with parent tables in front of child tables
  def self.topologically_sorted(tables)
    sorted = []

    # We need a hash to navigate the references field, which gives only names
    table_hash = {}
    tables.each { |table| table_hash[table.name] = table }

    # first add all root tables to sorted
    sorted += tables.find_all { |table| 0 == table.references.length }

    while tables.length < sorted.length
      sorted += tables.find_all do |table|
        if sorted.include?(table)
          result = FALSE
        else
          # all dependencies must already be in sorted
          dependencies = table.references.map { |ref| table_hash[ref] }
          result = (nil == dependencies.find { |ref| not sorted.include?(ref) })
        end
        result
      end
    end
    sorted
  end
end

Now, our code is as simple as binding to some ERB templates:

require 'erb'
require 'db_object'

class SchemaWriter
  def initialize(server, catalog)
    @db = SqlServer.new server, catalog
  end

  def generate_create_script
    generate_code_for(@db.user_defined_datatypes, "create_udt")
    generate_code_for(@db.rules, "create_rule")
    generate_code_for(SqlServer.topologically_sorted(@db.user_tables), "create_table")
  end

  def generate_code_for(objects, template_name)
    file_name = template_name + ".erb"
    template = ""
    File.open(file_name) { |file| template = file.read }
    objects.each do |object|
      erb = ERB.new(template, nil, '-')
      puts erb.result(binding)
    end
  end
end

if $0 == __FILE__
  writer = SchemaWriter.new(ARGV[0], ARGV[1])
  writer.generate_create_script
end

As an example, here’s the create.table.erb template:

create table <%= object.name %>(
<% object.columns.each_with_index do |column, i| -%>
    <%= column.name %> <%= column.text %><%= "," unless i == object.columns.length - 1 %>
<% end -%>
)

<% object.keys.each do |key| -%>
alter table <%= object.name %> add constraint <%= key.name %> <%= key.text %>
<% end -%>
<% object.constraints.each do |constraint| -%>
alter table <%= object.name %> add constraint <%= constraint.name %> <%= constraint.text %>
<% end -%>
<% object.indexes.each do |index| -%>
<%= index.text %>
<% end -%>

Written by Brandon Byars

March 17, 2007 at 3:28 pm

Posted in Code Generation, Database, Ruby

Tagged with ,