Ruby and SQL DMO
We have a rather large legacy system that we are trying to inject a healthy dose of agility into. One of the biggest challenges has always been versioning the database, which is large, clumsy, and cluttered. We managed to get the rest of the code under a continuous integration scheme, but every time somebody made a schema change, or even updated a stored procedure needed for some tests to work, manual intervention was needed.
Pramod Sadalage and Martin Fowler wrote one of the first articles on iterative database design, and Pramod later teamed up with Scott Ambler to collaborate on Refactoring Databases. The advice, adopted by the Ruby on Rails team, was to create a separate migration file for each schema change, and number them sequentially. For example, the first migration would be 1.sql, then 2.sql, and so on. You could store the latest migration file run on a database in a version table, which would make updating a database as easy as running every migration, in order, whose version number is greater than the one stored in your database table.
We managed to do something similar, although it required a number of changes. But before it could all work, you need a starting point—a base schema to create a new developer database. We use SQL Server 2000, so initially I simply had Enterprise Manager create a SQL script for me. Not only did it not work (I don’t think it sorted the dependencies right), it was a ghastly nightmare to look at.
Why do standard vendor-supplied code generation tools create such ugly code?
I decided to do the code generation myself using SQL DMO (the same COM interfaces that Enterprise Manager was using, just poorly). I’d successfully used Ruby and ERB for code generation before, and discovered I could make very nice looking code (Code Generation in Action by Jack Herrington describes the concept nicely). Within just a couple hours, I had a SQL script that not only looked much nicer than anything Enterprise Manager spits out; but worked to boot.
First, I needed to connect to the database I wanted to dump into a SQL file:
require 'win32ole'
class SqlServer
def initialize(server, catalog)
sql = WIN32OLE.new("SQLDMO.SQLServer")
sql.LoginSecure = 1
sql.Connect(server)
@db = sql.Databases(catalog)
end
end
This uses Windows Authentication to connect to the appropriate SQL Server database. The DMO interface, described here, is one of those clumsy, wandering APIs commonly developed when VBScript is your target language (warning, the MSDN website for DMO only seems to work in IE, and poorly at that). I decided to wrap some of the classes to make it easier to use. First, a superclass:
class DBObject
def initialize(object)
@object = object
end
def name
return @object.Name unless @object.Name =~ / /
"[#{@object.Name}]"
end
def beautify(text)
# strip brackets if they're not needed, and remove the .dbo prefix
text.gsub!(/\[([^ ]*?)\]/, '\1')
text.gsub(/dbo\./, '')
end
end
Here I provide one of the keys to generating code that looks appealing. I really don’t like to look at all the noise of bracketing all the tables and column names, just in case the name contains a space in it. The following looks ugly to me:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[tablename] (
[id] [int] NOT NULL,
[description] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
) ON [PRIMARY]
That drives me wild. I don’t ever change any of the settings for ANSI nulls, etc, and I could care less what the collation is so long as it matches the database default. Nor do I care about the file group. It’s hard to see through all the noise.
Here’s what I want:
CREATE TABLE tablename (
id int NOT NULL
description varchar(100) NOT NULL
)
Our name and beautify methods will help us achieve prettier code. Here’s the most important subclass:
class Table < DBObject
attr_reader :columns, :constraints, :keys, :indexes, :references
def initialize(table, connection)
super(table)
@columns, @constraints, @keys = [], [], []
table.Columns.each { |col| @columns << Column.new(col) }
table.Checks.each { |check| @constraints << CheckConstraint.new(check) }
table.Keys.each { |key| @keys << Key.new(key, connection) }
get_indexes(table)
get_references
end
private
def get_references
@references = []
@keys.each do |key|
if "" != key.referenced_table and name != key.referenced_table
@references << key.referenced_table unless references.include?(key.referenced_table)
end
end
end
def get_indexes(table)
@indexes = []
# the Indexes collection includes objects already in Keys and statistics
keys = @keys.map { |key| key.name }
table.Indexes.each do |index|
if not keys.include?(index.Name)
if index.Type == 16 || index.Type == 0
@indexes << Index.new(index)
end
end
end
end
end
You can find the classes it depends on by downloading all of the code here. Notice, however, that a database connection is needed for the Key constructor. As far as I could tell, there was no way, using nothing more than DMO, to find out if a key cascade deletes. I had to query the INFORMATION_SCHEMA views to find that information.
For our script to work, we’re going to need to order our dependencies correctly. The SQL script will fail if we try to add a foreign key to a table that doesn’t yet exist. The following should do the trick:
class SqlServer
# returns a topological sort with parent tables in front of child tables
def self.topologically_sorted(tables)
sorted = []
# We need a hash to navigate the references field, which gives only names
table_hash = {}
tables.each { |table| table_hash[table.name] = table }
# first add all root tables to sorted
sorted += tables.find_all { |table| 0 == table.references.length }
while tables.length < sorted.length
sorted += tables.find_all do |table|
if sorted.include?(table)
result = FALSE
else
# all dependencies must already be in sorted
dependencies = table.references.map { |ref| table_hash[ref] }
result = (nil == dependencies.find { |ref| not sorted.include?(ref) })
end
result
end
end
sorted
end
end
Now, our code is as simple as binding to some ERB templates:
require 'erb'
require 'db_object'
class SchemaWriter
def initialize(server, catalog)
@db = SqlServer.new server, catalog
end
def generate_create_script
generate_code_for(@db.user_defined_datatypes, "create_udt")
generate_code_for(@db.rules, "create_rule")
generate_code_for(SqlServer.topologically_sorted(@db.user_tables), "create_table")
end
def generate_code_for(objects, template_name)
file_name = template_name + ".erb"
template = ""
File.open(file_name) { |file| template = file.read }
objects.each do |object|
erb = ERB.new(template, nil, '-')
puts erb.result(binding)
end
end
end
if $0 == __FILE__
writer = SchemaWriter.new(ARGV[0], ARGV[1])
writer.generate_create_script
end
As an example, here’s the create.table.erb template:
create table <%= object.name %>(
<% object.columns.each_with_index do |column, i| -%>
<%= column.name %> <%= column.text %><%= "," unless i == object.columns.length - 1 %>
<% end -%>
)
<% object.keys.each do |key| -%>
alter table <%= object.name %> add constraint <%= key.name %> <%= key.text %>
<% end -%>
<% object.constraints.each do |constraint| -%>
alter table <%= object.name %> add constraint <%= constraint.name %> <%= constraint.text %>
<% end -%>
<% object.indexes.each do |index| -%>
<%= index.text %>
<% end -%>