A Day In The Lyf

…the lyf so short, the craft so longe to lerne

Ruby and SQL DMO

We have a rather large legacy system that we are trying to inject a healthy dose of agility into. One of the biggest challenges has always been versioning the database, which is large, clumsy, and cluttered. We managed to get the rest of the code under a continuous integration scheme, but every time somebody made a schema change, or even updated a stored procedure needed for some tests to work, manual intervention was needed.

Pramod Sadalage and Martin Fowler wrote one of the first articles on iterative database design, and Pramod later teamed up with Scott Ambler to collaborate on Refactoring Databases. The advice, adopted by the Ruby on Rails team, was to create a separate migration file for each schema change, and number them sequentially. For example, the first migration would be 1.sql, then 2.sql, and so on. You could store the latest migration file run on a database in a version table, which would make updating a database as easy as running every migration, in order, whose version number is greater than the one stored in your database table.

We managed to do something similar, although it required a number of changes. But before it could all work, you need a starting point—a base schema to create a new developer database. We use SQL Server 2000, so initially I simply had Enterprise Manager create a SQL script for me. Not only did it not work (I don’t think it sorted the dependencies right), it was a ghastly nightmare to look at.

Why do standard vendor-supplied code generation tools create such ugly code?

I decided to do the code generation myself using SQL DMO (the same COM interfaces that Enterprise Manager was using, just poorly). I’d successfully used Ruby and ERB for code generation before, and discovered I could make very nice looking code (Code Generation in Action by Jack Herrington describes the concept nicely). Within just a couple hours, I had a SQL script that not only looked much nicer than anything Enterprise Manager spits out; but worked to boot.

First, I needed to connect to the database I wanted to dump into a SQL file:

require 'win32ole'

class SqlServer
  def initialize(server, catalog)
    sql = WIN32OLE.new("SQLDMO.SQLServer")
    sql.LoginSecure = 1
    sql.Connect(server)
    @db = sql.Databases(catalog)
  end
end

This uses Windows Authentication to connect to the appropriate SQL Server database. The DMO interface, described here, is one of those clumsy, wandering APIs commonly developed when VBScript is your target language (warning, the MSDN website for DMO only seems to work in IE, and poorly at that). I decided to wrap some of the classes to make it easier to use. First, a superclass:

class DBObject
  def initialize(object)
    @object = object
  end

  def name
    return @object.Name unless @object.Name =~ / /
    "[#{@object.Name}]"
  end

  def beautify(text)
    # strip brackets if they're not needed, and remove the .dbo prefix
    text.gsub!(/\[([^ ]*?)\]/, '\1')
    text.gsub(/dbo\./, '')
  end
end

Here I provide one of the keys to generating code that looks appealing. I really don’t like to look at all the noise of bracketing all the tables and column names, just in case the name contains a space in it. The following looks ugly to me:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[tablename] (
    [id] [int] NOT NULL,
    [description] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
) ON [PRIMARY]

That drives me wild. I don’t ever change any of the settings for ANSI nulls, etc, and I could care less what the collation is so long as it matches the database default. Nor do I care about the file group. It’s hard to see through all the noise.

Here’s what I want:

CREATE TABLE tablename (
    id int NOT NULL
    description varchar(100) NOT NULL
)

Our name and beautify methods will help us achieve prettier code. Here’s the most important subclass:

class Table < DBObject
  attr_reader :columns, :constraints, :keys, :indexes, :references

  def initialize(table, connection)
    super(table)
    @columns, @constraints, @keys = [], [], []
    table.Columns.each { |col| @columns << Column.new(col) }
    table.Checks.each { |check| @constraints << CheckConstraint.new(check) }
    table.Keys.each { |key| @keys << Key.new(key, connection) }
    get_indexes(table)
    get_references
  end

  private
  def get_references
    @references = []
    @keys.each do |key|
      if "" != key.referenced_table and name != key.referenced_table
        @references << key.referenced_table unless references.include?(key.referenced_table)
      end
    end
  end

  def get_indexes(table)
    @indexes = []
    # the Indexes collection includes objects already in Keys and statistics
    keys = @keys.map { |key| key.name }
    table.Indexes.each do |index|
      if not keys.include?(index.Name)
        if index.Type == 16 || index.Type == 0
          @indexes << Index.new(index)
        end
      end
    end
  end
end

You can find the classes it depends on by downloading all of the code here. Notice, however, that a database connection is needed for the Key constructor. As far as I could tell, there was no way, using nothing more than DMO, to find out if a key cascade deletes. I had to query the INFORMATION_SCHEMA views to find that information.

For our script to work, we’re going to need to order our dependencies correctly. The SQL script will fail if we try to add a foreign key to a table that doesn’t yet exist. The following should do the trick:

class SqlServer
  # returns a topological sort with parent tables in front of child tables
  def self.topologically_sorted(tables)
    sorted = []

    # We need a hash to navigate the references field, which gives only names
    table_hash = {}
    tables.each { |table| table_hash[table.name] = table }

    # first add all root tables to sorted
    sorted += tables.find_all { |table| 0 == table.references.length }

    while tables.length < sorted.length
      sorted += tables.find_all do |table|
        if sorted.include?(table)
          result = FALSE
        else
          # all dependencies must already be in sorted
          dependencies = table.references.map { |ref| table_hash[ref] }
          result = (nil == dependencies.find { |ref| not sorted.include?(ref) })
        end
        result
      end
    end
    sorted
  end
end

Now, our code is as simple as binding to some ERB templates:

require 'erb'
require 'db_object'

class SchemaWriter
  def initialize(server, catalog)
    @db = SqlServer.new server, catalog
  end

  def generate_create_script
    generate_code_for(@db.user_defined_datatypes, "create_udt")
    generate_code_for(@db.rules, "create_rule")
    generate_code_for(SqlServer.topologically_sorted(@db.user_tables), "create_table")
  end

  def generate_code_for(objects, template_name)
    file_name = template_name + ".erb"
    template = ""
    File.open(file_name) { |file| template = file.read }
    objects.each do |object|
      erb = ERB.new(template, nil, '-')
      puts erb.result(binding)
    end
  end
end

if $0 == __FILE__
  writer = SchemaWriter.new(ARGV[0], ARGV[1])
  writer.generate_create_script
end

As an example, here’s the create.table.erb template:

create table <%= object.name %>(
<% object.columns.each_with_index do |column, i| -%>
    <%= column.name %> <%= column.text %><%= "," unless i == object.columns.length - 1 %>
<% end -%>
)

<% object.keys.each do |key| -%>
alter table <%= object.name %> add constraint <%= key.name %> <%= key.text %>
<% end -%>
<% object.constraints.each do |constraint| -%>
alter table <%= object.name %> add constraint <%= constraint.name %> <%= constraint.text %>
<% end -%>
<% object.indexes.each do |index| -%>
<%= index.text %>
<% end -%>
Advertisements

Written by Brandon Byars

March 17, 2007 at 3:28 pm

Posted in Code Generation, Database, Ruby

Tagged with ,

%d bloggers like this: