Archive for the ‘Code Generation’ Category
C# Enum Generation
Ayende recently asked on the ALT.NET mailing list about the various methods developers use to provide lookup values, with the question framed as one between lookup tables and enums. My own preference is to use both, but keep it DRY with code generation.
To demonstrate the idea, I wrote a Ruby script that generates a C# enum file from some metadata. I much prefer Ruby to pure .NET solutions like CodeSmith—I find it easier and more powerful (I do think CodeSmith is excellent if there is no Ruby expertise on the team, however). The full source for this example can be grabbed here.
The idea is simple. I want a straightforward and extensible way to provide metadata for lookup values, following the Ruby Way of convention over configuration. XML is very popular in the .NET world, but the Ruby world views it as overly verbose, and prefers lighter markup languages like YAML. For my purposes, I decided not to mess with markup at all (although I’m still considering switching to YAML—the hash of hashes approach describes what I want well). Here’s some example metadata:
enums = {
'OrderType' => {},
'MethodOfPayment' => {:table => 'PaymentMethod',},
'StateProvince' => {:table => 'StateProvinces',
:name_column => 'Abbreviation',
:id_column => 'StateProvinceId',
:transformer => lambda {|value| value.upcase},
:filter => lambda {|value| !value.empty?}}
}
That list, which is valid Ruby code, describes three enums, which will be named OrderType
, MethodOfPayment
, and StateProvince
. The intention is that, where you followed your database standards, you should usually be able to get by without adding any extra metadata, as shown in the OrderType
example. The code generator will get the ids and enum names from the OrderType
table (expecting the columns to be named OrderTypeId
and Description
) and create the enum from those values. As StateProvince
shows, the table name and two column names can be overridden.
More interestingly, you can both transform and filter the enum names by passing lambdas (which are like anonymous delegates in C#). The ‘StateProvince’ example above will filter out any states that, after cleaning up any illegal characters, equal an empty string, and then it will upper case the name.
We use a pre-build event in our project to build the enum file. However, if you simply overwrite the file every time you build, you may slow down the build process considerably. MSBuild (used by Visual Studio) evidently sees that the timestamp has been updated, so it rebuilds the project, forcing a rebuild of all downstream dependent projects. A better solution is to only overwrite the file if there are changes:
require File.dirname(__FILE__) + '/enum_generator'
gen = EnumGenerator.new('localhost', ‘database-name’)
source = gen.generate_all(‘Namespace', enums)
filename = File.join(File.dirname(__FILE__), 'Enums.cs')
if Dir[filename].empty? || source != IO.read(filename)
File.open(filename, 'w') {|file| file << source}
end
I define the basic templates straight in the EnumGenerator
class, but allow them to be swapped out. In theory, the default name column and the default lambda for generating the id column name given the table name (or enum name) could be handled the same way. Below is the EnumGenerator
code:
class EnumGenerator
FILE_TEMPLATE = <<EOT
//------------------------------------------------------------------------------
// <auto-generated>
// This code was generated by a tool from <%= catalog %> on <%= server %>.
//
// Changes to this file may cause incorrect behavior and will be lost if
// the code is regenerated.
// </auto-generated>
//------------------------------------------------------------------------------
namespace <%= namespace %>
{
<%= enums %>
}
EOT
ENUM_TEMPLATE = <<EOT
public enum <%= enum_name %>
{
<% values.keys.sort.each_with_index do |id, i| -%>
<%= values[id] %> = <%= id %><%= ',' unless i == values.length - 1 %>
<% end -%>
}
EOT
# Change the templates by calling these setters
attr_accessor :enum_template, :file_template
attr_reader :server, :catalog
def initialize(server, catalog)
@server, @catalog = server, catalog
@enum_template, @file_template = ENUM_TEMPLATE, FILE_TEMPLATE
end
end
The code generation uses erb, the standard Ruby templating language:
def transform(template, template_binding)
erb = ERB.new(template, nil, '-')
erb.result template_binding
end
template_binding
describes the variables available to use in the template in much the same way that Castle Monorail’s PropertyBag
describes the variables available to the views. The difference is that, because Ruby is dynamic, you don’t have to explictly add values to the binding. The rest of the code is shown below:
def generate(enum_name, attributes)
table = attributes[:table] || enum_name
filter = attributes[:filter] || lambda {|value| true}
values = enum_values(table, attributes)
values.delete_if {|key, value| !filter.call(value)}
transform enum_template, binding
end
def generate_all(namespace, metadata)
enums = ''
metadata.keys.sort.each {|enum_name| enums << generate(enum_name, metadata[enum_name])}
enums = enums.gsub(/\n/m, "\n\t").strip
transform file_template, binding
end
private
def enum_values(table, attributes)
sql = get_sql table, attributes
@dbh ||= DBI.connect("DBI:ADO:Provider=SQLNCLI;server=#{server};database=#{catalog};Integrated Security=SSPI")
sth = @dbh.execute sql
values = {}
sth.each {|row| values[row['Id']] = clean(row['Name'], attributes[:transformer])}
sth.finish
values
end
def get_sql(table, attributes)
id_column = attributes[:id_column] || "#{table}Id"
name_column = attributes[:name_column] || "Description"
"SELECT #{id_column} AS Id, #{name_column} AS Name FROM #{table} ORDER BY Id"
end
def clean(enum_value, transformer=nil)
enum_value = '_' + enum_value if enum_value =~ /^\d/
enum_value = enum_value.gsub /[^\w]/, ''
transformer ||= lambda {|value| value}
transformer.call enum_value
end
Caveat Emptor: I wrote this code from scratch today; it is not the same code we currently use in production. I think it’s better, but if you find a problem with it please let me know.
Ruby and SQL DMO
We have a rather large legacy system that we are trying to inject a healthy dose of agility into. One of the biggest challenges has always been versioning the database, which is large, clumsy, and cluttered. We managed to get the rest of the code under a continuous integration scheme, but every time somebody made a schema change, or even updated a stored procedure needed for some tests to work, manual intervention was needed.
Pramod Sadalage and Martin Fowler wrote one of the first articles on iterative database design, and Pramod later teamed up with Scott Ambler to collaborate on Refactoring Databases. The advice, adopted by the Ruby on Rails team, was to create a separate migration file for each schema change, and number them sequentially. For example, the first migration would be 1.sql, then 2.sql, and so on. You could store the latest migration file run on a database in a version table, which would make updating a database as easy as running every migration, in order, whose version number is greater than the one stored in your database table.
We managed to do something similar, although it required a number of changes. But before it could all work, you need a starting point—a base schema to create a new developer database. We use SQL Server 2000, so initially I simply had Enterprise Manager create a SQL script for me. Not only did it not work (I don’t think it sorted the dependencies right), it was a ghastly nightmare to look at.
Why do standard vendor-supplied code generation tools create such ugly code?
I decided to do the code generation myself using SQL DMO (the same COM interfaces that Enterprise Manager was using, just poorly). I’d successfully used Ruby and ERB for code generation before, and discovered I could make very nice looking code (Code Generation in Action by Jack Herrington describes the concept nicely). Within just a couple hours, I had a SQL script that not only looked much nicer than anything Enterprise Manager spits out; but worked to boot.
First, I needed to connect to the database I wanted to dump into a SQL file:
require 'win32ole'
class SqlServer
def initialize(server, catalog)
sql = WIN32OLE.new("SQLDMO.SQLServer")
sql.LoginSecure = 1
sql.Connect(server)
@db = sql.Databases(catalog)
end
end
This uses Windows Authentication to connect to the appropriate SQL Server database. The DMO interface, described here, is one of those clumsy, wandering APIs commonly developed when VBScript is your target language (warning, the MSDN website for DMO only seems to work in IE, and poorly at that). I decided to wrap some of the classes to make it easier to use. First, a superclass:
class DBObject
def initialize(object)
@object = object
end
def name
return @object.Name unless @object.Name =~ / /
"[#{@object.Name}]"
end
def beautify(text)
# strip brackets if they're not needed, and remove the .dbo prefix
text.gsub!(/\[([^ ]*?)\]/, '\1')
text.gsub(/dbo\./, '')
end
end
Here I provide one of the keys to generating code that looks appealing. I really don’t like to look at all the noise of bracketing all the tables and column names, just in case the name contains a space in it. The following looks ugly to me:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[tablename] (
[id] [int] NOT NULL,
[description] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
) ON [PRIMARY]
That drives me wild. I don’t ever change any of the settings for ANSI nulls, etc, and I could care less what the collation is so long as it matches the database default. Nor do I care about the file group. It’s hard to see through all the noise.
Here’s what I want:
CREATE TABLE tablename (
id int NOT NULL
description varchar(100) NOT NULL
)
Our name and beautify methods will help us achieve prettier code. Here’s the most important subclass:
class Table < DBObject
attr_reader :columns, :constraints, :keys, :indexes, :references
def initialize(table, connection)
super(table)
@columns, @constraints, @keys = [], [], []
table.Columns.each { |col| @columns << Column.new(col) }
table.Checks.each { |check| @constraints << CheckConstraint.new(check) }
table.Keys.each { |key| @keys << Key.new(key, connection) }
get_indexes(table)
get_references
end
private
def get_references
@references = []
@keys.each do |key|
if "" != key.referenced_table and name != key.referenced_table
@references << key.referenced_table unless references.include?(key.referenced_table)
end
end
end
def get_indexes(table)
@indexes = []
# the Indexes collection includes objects already in Keys and statistics
keys = @keys.map { |key| key.name }
table.Indexes.each do |index|
if not keys.include?(index.Name)
if index.Type == 16 || index.Type == 0
@indexes << Index.new(index)
end
end
end
end
end
You can find the classes it depends on by downloading all of the code here. Notice, however, that a database connection is needed for the Key constructor. As far as I could tell, there was no way, using nothing more than DMO, to find out if a key cascade deletes. I had to query the INFORMATION_SCHEMA views to find that information.
For our script to work, we’re going to need to order our dependencies correctly. The SQL script will fail if we try to add a foreign key to a table that doesn’t yet exist. The following should do the trick:
class SqlServer
# returns a topological sort with parent tables in front of child tables
def self.topologically_sorted(tables)
sorted = []
# We need a hash to navigate the references field, which gives only names
table_hash = {}
tables.each { |table| table_hash[table.name] = table }
# first add all root tables to sorted
sorted += tables.find_all { |table| 0 == table.references.length }
while tables.length < sorted.length
sorted += tables.find_all do |table|
if sorted.include?(table)
result = FALSE
else
# all dependencies must already be in sorted
dependencies = table.references.map { |ref| table_hash[ref] }
result = (nil == dependencies.find { |ref| not sorted.include?(ref) })
end
result
end
end
sorted
end
end
Now, our code is as simple as binding to some ERB templates:
require 'erb'
require 'db_object'
class SchemaWriter
def initialize(server, catalog)
@db = SqlServer.new server, catalog
end
def generate_create_script
generate_code_for(@db.user_defined_datatypes, "create_udt")
generate_code_for(@db.rules, "create_rule")
generate_code_for(SqlServer.topologically_sorted(@db.user_tables), "create_table")
end
def generate_code_for(objects, template_name)
file_name = template_name + ".erb"
template = ""
File.open(file_name) { |file| template = file.read }
objects.each do |object|
erb = ERB.new(template, nil, '-')
puts erb.result(binding)
end
end
end
if $0 == __FILE__
writer = SchemaWriter.new(ARGV[0], ARGV[1])
writer.generate_create_script
end
As an example, here’s the create.table.erb template:
create table <%= object.name %>(
<% object.columns.each_with_index do |column, i| -%>
<%= column.name %> <%= column.text %><%= "," unless i == object.columns.length - 1 %>
<% end -%>
)
<% object.keys.each do |key| -%>
alter table <%= object.name %> add constraint <%= key.name %> <%= key.text %>
<% end -%>
<% object.constraints.each do |constraint| -%>
alter table <%= object.name %> add constraint <%= constraint.name %> <%= constraint.text %>
<% end -%>
<% object.indexes.each do |index| -%>
<%= index.text %>
<% end -%>