A Day In The Lyf

…the lyf so short, the craft so longe to lerne

Archive for the ‘Database’ Category

Funcletize This!

I was recently involved in troubleshooting a bug in our staging environment. We had some code that worked in every environment we had put it in, except staging. Once there, you perform the equivalent of an update on a field (using LINQ in C#), only to be greeted by a ChangeConflictException.

I’m embarrassed by how long it took to figure out what was wrong. It was obviously an optimistic locking problem, and I even mentioned that it was because the UPDATE statement wasn’t updating anything once I first saw the exception. Optimistic locking works by adding extra fields to the WHERE clause to make sure that the data hasn’t changed since you loaded it. If one of those fields had changed, the WHERE clause wouldn’t match anything, and the O/RM would assume that somebody’s changed the data behind your back and throw an exception.

It turns out that failing to match any rows with the given filter isn’t the only way that LINQ will think no rows were updated; it’s also dependent on the NOCOUNT option in SQL Server. If the database is configured to have NOCOUNT ON, then the number of rows affected by each query won’t be sent back to the client. LINQ interprets this lack of information as 0 rows being updated, and thus throws the ChangeConflictException.

In itself, the bug wasn’t very interesting. What is interesting is what we saw when we opened Reflector to look at the LINQ code around the exception:

IExecuteResult IProvider.Execute(Expression query)
    // …
    query = Funcletizer.Funcletize(query);

Love it. Uniquifiers, Funcletizers, and Daemonizers of the world unite.


Written by Brandon Byars

October 26, 2008 at 12:13 pm

Posted in .NET, Database

Tagged with ,

Configuring ActiveRecord to work with SQL Server 2005

As much as possible, I like a zero-install configuration. In other words, I want to simply checkout a codebase, run an automated build process, and start working. Ideally, I’d like to be able to do that on a clean machine.

It doesn’t always work, of course. For instance, even though most of our production code is written in .NET, we use Ruby extensively for automation. Installing Ruby is one of those dependencies that we live with. But installing Ruby isn’t enough; we also need Rails (or at least ActiveRecord) for some data management scripts we have, Watir, and some fixes for ActiveRecord to work with SQL Server.

All of that can be done fairly easily by somebody who knows what they’re doing, but new developers often don’t know what they’re doing, and I strive to be dispensable. We wrote a script that configured
Ruby and the necessary libraries to work for our development environment.

First, we needed to install the necessary gems. This is quite easy to do on the command line, but it took me a little digging before I figured out how to do it in code:

needed_gems = ["rails"]

require "rubygems"

needed_gems.each do |gem|
  puts "Installing gem #{gem}..."
  Gem::GemRunner.new.run ["install", gem, "--include-dependencies"]

SQL Server requires an ADO adapter that doesn’t ship with Ruby. You can read about it here. All that’s needed is to download the ADO.rb file (which we have versioned next to our setup script) and copy it to the right place, creating the directory if needed:

setup_dir = File.dirname(__FILE__)
ruby_dir = "C:/ruby/lib/ruby"
ruby_version = 1.8

# The ADO adapter needed for using dbi (the standard db access)
# with SQL Server, which does not come with ruby by default.
# See http://wiki.rubyonrails.org/rails/pages/HowtoConnectToMicrosoftSQLServer.

puts "creating ADO adapter..."
ado_dir = "#{ruby_dir}/site_ruby/#{ruby_version}/DBD/ADO"
system "if not exist #{ado_dir} mkdir #{ado_dir}"
system "copy /Y #{setup_dir}/ADO.rb #{ado_dir}"

Finally, we use SQL Server 2005, and we want to use Windows Authentication for all of our Ruby scripts. Neither SQL Server 2005 nor Windows Authentication is supported by rails out of the box. The problem, described on the SQL Server wiki for rails, is the connection string rails builds. At first, we were taking the suggestions of some of the comments on the wiki and changing the sqlserver_adapter.rb file that ships with rails. This obviously isn’t ideal, so now we monkey-patch it in our code that accesses the database:

module DBI
  # We have to monkey patch this because the SQL Server adapter that comes
  # with rails (active_record\connection_adapters\sqlserver_adapter.rb)
  # doesn't work with SQL Server 2005 or with Integrated Security yet.
  class << self
    alias_method :original_connect, :connect

    def connect(driver_url, username, password)
      # Change to SQL 2005
      driver_url.sub! "SQLOLEDB", "SQLNCLI"

      # Change to Windows Authentication
      driver_url.sub! /User Id=[^;]*;Password=[^;]*;/, "Integrated Security=SSPI;"

      original_connect(driver_url, username, password)

And that’s it. You still can’t checkout the codebase and start working on a clean machine, but it’s not bad. Install Ruby, run setup.rb. All database code has been patched to deal with our environment.

Written by Brandon Byars

October 23, 2007 at 7:13 pm

.NET Database Migrations

Pramod Sadalage and Scott Ambler have suggested using a series of numbered change scripts to version your database. Start with a base schema, and every subsequent change gets its own change script, grabbing the next number. That version number is stored in a table in the database, which makes it easy to update—you just run all change scripts, in order, greater than the version stored in your database.

The Ruby on Rails team implemented this technique in their migrations code. It’s quite elegant. This blog uses a Rails application called Typo; here’s one of its migrations:

class AddArticleUserId < ActiveRecord::Migration
  def self.up
    add_column :articles, :user_id, :integer

    puts "Linking article authors to users"
    Article.find(:all).each do |a|

  def self.down
    remove_column :articles, :user_id

That migration is called 3_add_article_user_id.rb, where 3 is the version number. Notice that it’s written in Ruby, not in SQL. It adds a column called user_id to the articles table and updates the data. The data update is particularly interesting—we get to use the ActiveRecord O/RM code instead of having to do it in SQL (although you can use SQL if you need to). The Rails migration code can also rollback changes; that’s what the down method is for.

The problem I’ve always had with this scheme is that we have many database objects that I’d like to version in their own files in our source control system. For example, here’s our directory structure:


We have several files in each directory, and it’s convenient to keep them that way so we can easily check a subversion log and see the history of changes for the database object. For us to use the migrations scheme above, we’d have to create a stored procedure in a migration, and later alter it in a separate migration. Since the two migrations will be in separate files, our source control wouldn’t give us a version history of that stored procedure.

We came up with a hybrid solution. Schema changes to the tables use a migration scheme like Rails. Database objects are versioned in separate files. Both the schema changes and the peripheral database object changes are updated when we update the database.

For this to work, we have to be a little careful with how we create the database objects. We want them to work regardless of whether we’re creating them for the first time or updating them, which means ALTER statements won’t work. The solution is simply to drop the object if it exists, and then create it. This is a fairly common pattern.

I wrote an NAnt and MSBuild task to do the dirty work. It runs both the schema migrations and the database object updates. Both are optional, so if migrations are all you want, that’s all you need to use. It expects all migrations to be in the same directory, and match the pattern 1.comment.sql, where 1 is the version number. It will be stored in a database table whose default name is SchemaVersion, with the following structure:

CREATE TABLE SchemaVersion (
  Version int,
  MigrationDate datetime,
  Comment varchar(255)

I’ve only tested it on SQL Server, but I think the task should work for other DBMS’s as well (it uses OLEDB). Migrations can contain batches (using the SQL Server GO command) and are run transactionally. Unlike the Rails example, the .NET migrations use SQL, and I don’t yet have any rollback functionality.

You can include any extra SQL files you want in the DatabaseObjects property. Both NAnt and MSBuild have convenient ways to recursively add all files matching an extension.

Here’s an NAnt example:

<target name="migrate" description="Update the database">
    <loadtasks assembly="Migrations.dll" />
            <include name="db/functions/**/*.sql"/>
            <include name="db/procedures/**/*.sql"/>
            <include name="db/triggers/**/*.sql"/>
            <include name="db/views/**/*.sql"/>

And here it is using MSBuild:

    <DatabaseObjects Include="db/functions/**/*.sql"/>
    <DatabaseObjects Include="db/procedures/**/*.sql"/>
    <DatabaseObjects Include="db/triggers/**/*.sql"/>
    <DatabaseObjects Include="db/views/**/*.sql"/>

<Target Name="dbMigrate">
        TableName="version_info" />

The source code and binaries can be found here.

Written by Brandon Byars

April 14, 2007 at 10:35 pm

Using Rails Fixtures For Storing Test Data

Yesterday I wrote about trying to retrofit agility into a large enterprise database. Even after we got our database scripted so that every developer could use a localhost database, we still had to switch back to an integration database too often to make it worthwhile. The problem was test data; we simply didn’t have enough to be useful, and it was difficult to maintain.

At first we tried SQL scripts, but quickly found them too hard to read. After creating a script to add data for one table, nary another script was written (random rant: why can’t the SQL INSERT statement follow the syntax of the UPDATE syntax? With big tables, the positional coupling between the column list and VALUES list is just too hard to maintain). Next, we tried a CSV file, which worked somewhat nicely because we could edit them in Excel. However, the BULK INSERT command we used to insert them caused too many problems. Any time a column was added, even if it was a nullable column, even if it was a computed column, you had to add it to the CSV file. And both the SQL file and the CSV files lacked the ability to add dynamic data. Many times we wanted to enter yesterday for a date, regardless of what actual date that happened to be. In some cases, we’d simply like to add 1000 rows, without regard to what data those rows contained.

Rails Fixtures

In the end, we hijacked some functionality from the Ruby on Rails testing framework. The Fixtures class allows you to make the following call in your tests to delete and re-add test data before each test:

fixtures :orders, :order_items

That one line of code will, by default, delete the order_items and orders tables (in that order), and then add all data from the orders.yml and order_items.yml file (in that order). YAML is preferred in the Ruby world for it’s readability; XML, like so many “enterprisey” technologies, is considered bulky and clumsy to the Rails team. Even better, the Rails code first runs your YAML files through ERB (a Ruby templating engine) before sending it to the YAML parser, so you can add dynamic content.

It turns out that using the fixtures code outside of Rails is really quite easy. First, we need to set up our connection information and define the necessary ActiveRecord classes. We can define all them in one file (I’ll call it driver.rb):

require 'active_record'
require 'active_record/fixtures'

  :adapter  => "sqlserver",
  :database =>  "db_name",
  :username => "db_user",
  :password => "password"

ActiveRecord::Base.logger = Logger.new(File.dirname(__FILE__) + "/debug.log")

class DisplayCategory < ActiveRecord::Base
  set_primary_key "DisplayCategoryId"
  set_table_name "DisplayCategories"

ActiveRecord is the Rails object-relational mapping framework, and like everything else in Rails, it’s built around the principle of “convention over configuration.” Since we’re dealing with a legacy database, and one not intended to be consumed by a Rails application, we’ll have to settle for a little bit of configuration.

By default, ActiveRecord expects the table name to be the plural version of the class name, with underscores between words (display_categories), and the primary key be an identity column called id. Our database has a different naming standard, with a healthy dose of standard violations, so we’ll have to add the call to set_primary_key and set_table_name to all of our ActiveRecord classes. DisplayCategories has an identity column, but I’ll show you an example below that does not.

When the script is parsed by Ruby, ActiveRecord will connect to the database and add a property for every column in the table. Metaprogramming is what allows ActiveRecord to be so DRY.

Then, we need the actual data. The Rails convention expects the YAML file to match the table name, so we’ll put the following in data/DisplayCategories.yml:

  DisplayCategoryId: 1
  DisplayName: Books
  SortIndex: 1
  DisplayCategoryId: 2
  DisplayName: Fiction
  SortIndex: 2
  ParentDisplayCategoryId: 1

You see how readable YAML is. The keys (books, fiction) are not inserted into the database. We’ll find a good use for them below for testing purposes, but for now we’ll simply use them to help us describe the record.

Let’s run it. The following code should work (in add_test_data.rb):

require 'driver.rb'
Fixtures.create_fixtures("data", [:DisplayCategories])

Unfortunately, we may get an error running add_test_data.rb. By default, the YAML parser simply dumps every record into a hashtable, with the identifier as the key. Since hashtables have no implicit ordering, we could run into a problem since our data requires that it be inserted in order (fiction has a ParentDisplayCategoryId foreign key pointing to books). If order doesn’t matter to you, use the above syntax. When it does matter, it’s a simple change:

--- !omap
- books:
    DisplayCategoryId: 1
    DisplayName: Books
    SortIndex: 1
- fiction:
    DisplayCategoryId: 2
    DisplayName: Fiction
    SortIndex: 2
    ParentDisplayCategoryId: 1

That omap syntax defines an ordered map, which solves our dependency problem. Now, running add_test_data.rb should work.

Dealing with Exceptions

DisplayCategories doesn’t quite meet the Rails conventions, but it’s close. What happens when we’re not close?

Let’s assume the following table definition:

  PersonNameId int NOT NULL
  FirstName varchar(100) NOT NULL,
  LastName varchar(100) NOT NULL,
  FullName AS FirstName + ' ' + LastName

INSERT INTO AuditPersonName(DateChanged, PersonNameId, FirstName, LastName)
SELECT GETDATE(), PersonNameId, FirstName, LastName
FROM deleted

Now, our configuration with ActiveRecord becomes more involved. There are four headaches we’ll run into here:

  • We don’t have an identity column
  • We have a computed column
  • Our table name isn’t plural
  • We have a trigger

Let’s tackle them one at a time. Instead of an identity column, let’s assume that we have to call a stored procedure called get_next_id to give us our next id as an output parameter. The following will work:

class PersonName < ActiveRecord::Base
  set_primary_key "PersonNameId"
  set_table_name "PersonName"

  def before_create
    sql <<-EOT
      DECLARE @id int
      EXEC get_next_id 'PersonName', @id OUTPUT
      SELECT @id
    self.id = connection.select_value(sql).to_i

We’re using one of the built-in hooks ActiveRecord provides to change how it gets the ids for us. In case you’ve never seen that EOT business, it’s called a here document; you can read about it here.

However, we still won’t be able to use that ActiveRecord implementation because it doesn’t understand computed columns, and will try to add FullName to the insert list. To work around that problem, add the following code to your PersonName class:

class PersonName
  @@computed_columns = ['FullName']

  def initialize(attributes = nil)
    @@computed_columns.each { |column| @attributes.delete(column) }

This is a bit hackish, but it gets the job done. I discovered the @attributes instance variable in ActiveRecord::Base when browsing for a solution, and found that I could simply remove the offending columns from it in the constructor.

Speaking of hacks, we can fix the problem of PersonName being singular by changing the way Rails understands the English language. Add the following to driver.rb:

require 'active_support/inflector'

Inflector.inflections do |inflect|
  inflect.uncountable 'PersonName'

The uncountable method was intended for words like fish and money. That’s ok, cheating seems like a pragmatic solution here. Chad Fowler mentions this solution in his book on Rails Recipes. Other options include:

Inflector.inflections do |inflect|
  inflect.plural /(ox)$/i, '\1en'
  inflect.singular /(ox)en/i, '\1'
  inflect.irregular 'person', 'people'

For more details, consult Fowler’s book, or dig through the source code. Now, add some test data and try to run it. We can add it to add_test_data.rb quite easily:

require 'driver.rb'
Fixtures.create_fixtures("data", [:DisplayCategories, :PersonName])

Try running it. If you’re using SQL Server, you should receive an exception that reads “Cannot create new connection because in manual or distributed transaction mode.” That’s because our trigger is getting in the way. Add SET NOCOUNT ON as the first line of your trigger, and try again. Everything should work now.

Using dynamic fixtures

I mentioned above that Fixtures first runs the YAML file through ERB before sending it to the parser. That means we can clean up our data to get rid of most of the noise for columns we don’t care about. The following is a valid YAML fixture:

def ignore_values
  Column1: 1
  UselessDate: 1/1/2000
  ImportantColumn: 1
<%= ignore_values %>
  ImportantColumn: 2
<%= ignore_values %>

Now, at least we’re only writing the columns we care about for each record. Sometimes, however, we can do a lot better. Imagine an address table that stores both a main and a shipping address for each person. The following would be a valid fixture:

@next_id = 0

def addresses_for(name, attributes)
  "#{address_for(name, 'main', attributes)}#{address_for(name, 'shipping', attributes)}"

def address_for(name, type, attributes)
  @next_id += 1
  AddressId: #{@next_id}
  PersonId: #{attributes[:PersonId]}
  AddressTypeId: #{type == 'main' ? 1 : 2}
  Address1: #{attributes[:Address1]}
  City: #{attributes[:City]}
  State: #{attributes[:State]}
  Zip: #{attributes[:Zip]}
  County: #{attributes[:County]}
<%= addresses_for('person1', {
  :PersonId => 1,
  :City => 'Marshall',
  :Address1 => '123 Some Road',
  :State => 'TX',
  :Zip => '75672',
  :County => 'Harrison'
}) %>
<%= addresses_for('person2', {
  :PersonId => 2,
  :City => 'Marshall',
  :Address1 => 'P.O. BOX 123',
  :State => 'TX',
  :Zip => '75672',
  :County => 'Harrison'
}) %>

That fixture will create 4 records, person1_main, person1_shipping, person2_main, and person2_shipping. Since we don’t care about having a separate billing address and shipping address for our test data, we can reduce the amount of information we have to add. And if we ever do care, it’s easy to write it out without calling our addresses_for method.


The Fixtures class was written for testing, and it has some nice features that give you access to each loaded record, for example, as an instance variable named after the record identifier (@person1, @fiction, etc). I had trouble getting that part to work with SQL Server, but even if you don’t ask Fixtures to load each record for you, you can still access them, which is quite useful if you’re using Ruby for any of your testing (for example, in Watir scripts). Let’s add the following to driver.rb:

require 'test/unit'

Test::Unit::TestCase.fixture_path = File.dirname(__FILE__) + "data/"

class Test::Unit::TestCase
  self.use_transactional_fixtures = false
  self.use_instantiated_fixtures = :no_instances

Now, we should be able to load the fixtures automatically in our test cases:

require 'test/unit'
require 'watir'
require 'driver'

class ProductBrowsingTest < Test::Unit::TestCase
  fixtures :DisplayCategories

  def test_something
    # create a browser instance, and navigate to the right page
    browser.link(:text, @DisplayCategories["fiction"]["DisplayName"]).click

Written by Brandon Byars

March 18, 2007 at 6:53 pm

Posted in Database, Ruby

Tagged with ,

Ruby and SQL DMO

We have a rather large legacy system that we are trying to inject a healthy dose of agility into. One of the biggest challenges has always been versioning the database, which is large, clumsy, and cluttered. We managed to get the rest of the code under a continuous integration scheme, but every time somebody made a schema change, or even updated a stored procedure needed for some tests to work, manual intervention was needed.

Pramod Sadalage and Martin Fowler wrote one of the first articles on iterative database design, and Pramod later teamed up with Scott Ambler to collaborate on Refactoring Databases. The advice, adopted by the Ruby on Rails team, was to create a separate migration file for each schema change, and number them sequentially. For example, the first migration would be 1.sql, then 2.sql, and so on. You could store the latest migration file run on a database in a version table, which would make updating a database as easy as running every migration, in order, whose version number is greater than the one stored in your database table.

We managed to do something similar, although it required a number of changes. But before it could all work, you need a starting point—a base schema to create a new developer database. We use SQL Server 2000, so initially I simply had Enterprise Manager create a SQL script for me. Not only did it not work (I don’t think it sorted the dependencies right), it was a ghastly nightmare to look at.

Why do standard vendor-supplied code generation tools create such ugly code?

I decided to do the code generation myself using SQL DMO (the same COM interfaces that Enterprise Manager was using, just poorly). I’d successfully used Ruby and ERB for code generation before, and discovered I could make very nice looking code (Code Generation in Action by Jack Herrington describes the concept nicely). Within just a couple hours, I had a SQL script that not only looked much nicer than anything Enterprise Manager spits out; but worked to boot.

First, I needed to connect to the database I wanted to dump into a SQL file:

require 'win32ole'

class SqlServer
  def initialize(server, catalog)
    sql = WIN32OLE.new("SQLDMO.SQLServer")
    sql.LoginSecure = 1
    @db = sql.Databases(catalog)

This uses Windows Authentication to connect to the appropriate SQL Server database. The DMO interface, described here, is one of those clumsy, wandering APIs commonly developed when VBScript is your target language (warning, the MSDN website for DMO only seems to work in IE, and poorly at that). I decided to wrap some of the classes to make it easier to use. First, a superclass:

class DBObject
  def initialize(object)
    @object = object

  def name
    return @object.Name unless @object.Name =~ / /

  def beautify(text)
    # strip brackets if they're not needed, and remove the .dbo prefix
    text.gsub!(/\[([^ ]*?)\]/, '\1')
    text.gsub(/dbo\./, '')

Here I provide one of the keys to generating code that looks appealing. I really don’t like to look at all the noise of bracketing all the tables and column names, just in case the name contains a space in it. The following looks ugly to me:

CREATE TABLE [dbo].[tablename] (
    [id] [int] NOT NULL,
    [description] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL

That drives me wild. I don’t ever change any of the settings for ANSI nulls, etc, and I could care less what the collation is so long as it matches the database default. Nor do I care about the file group. It’s hard to see through all the noise.

Here’s what I want:

CREATE TABLE tablename (
    id int NOT NULL
    description varchar(100) NOT NULL

Our name and beautify methods will help us achieve prettier code. Here’s the most important subclass:

class Table < DBObject
  attr_reader :columns, :constraints, :keys, :indexes, :references

  def initialize(table, connection)
    @columns, @constraints, @keys = [], [], []
    table.Columns.each { |col| @columns << Column.new(col) }
    table.Checks.each { |check| @constraints << CheckConstraint.new(check) }
    table.Keys.each { |key| @keys << Key.new(key, connection) }

  def get_references
    @references = []
    @keys.each do |key|
      if "" != key.referenced_table and name != key.referenced_table
        @references << key.referenced_table unless references.include?(key.referenced_table)

  def get_indexes(table)
    @indexes = []
    # the Indexes collection includes objects already in Keys and statistics
    keys = @keys.map { |key| key.name }
    table.Indexes.each do |index|
      if not keys.include?(index.Name)
        if index.Type == 16 || index.Type == 0
          @indexes << Index.new(index)

You can find the classes it depends on by downloading all of the code here. Notice, however, that a database connection is needed for the Key constructor. As far as I could tell, there was no way, using nothing more than DMO, to find out if a key cascade deletes. I had to query the INFORMATION_SCHEMA views to find that information.

For our script to work, we’re going to need to order our dependencies correctly. The SQL script will fail if we try to add a foreign key to a table that doesn’t yet exist. The following should do the trick:

class SqlServer
  # returns a topological sort with parent tables in front of child tables
  def self.topologically_sorted(tables)
    sorted = []

    # We need a hash to navigate the references field, which gives only names
    table_hash = {}
    tables.each { |table| table_hash[table.name] = table }

    # first add all root tables to sorted
    sorted += tables.find_all { |table| 0 == table.references.length }

    while tables.length < sorted.length
      sorted += tables.find_all do |table|
        if sorted.include?(table)
          result = FALSE
          # all dependencies must already be in sorted
          dependencies = table.references.map { |ref| table_hash[ref] }
          result = (nil == dependencies.find { |ref| not sorted.include?(ref) })

Now, our code is as simple as binding to some ERB templates:

require 'erb'
require 'db_object'

class SchemaWriter
  def initialize(server, catalog)
    @db = SqlServer.new server, catalog

  def generate_create_script
    generate_code_for(@db.user_defined_datatypes, "create_udt")
    generate_code_for(@db.rules, "create_rule")
    generate_code_for(SqlServer.topologically_sorted(@db.user_tables), "create_table")

  def generate_code_for(objects, template_name)
    file_name = template_name + ".erb"
    template = ""
    File.open(file_name) { |file| template = file.read }
    objects.each do |object|
      erb = ERB.new(template, nil, '-')
      puts erb.result(binding)

if $0 == __FILE__
  writer = SchemaWriter.new(ARGV[0], ARGV[1])

As an example, here’s the create.table.erb template:

create table <%= object.name %>(
<% object.columns.each_with_index do |column, i| -%>
    <%= column.name %> <%= column.text %><%= "," unless i == object.columns.length - 1 %>
<% end -%>

<% object.keys.each do |key| -%>
alter table <%= object.name %> add constraint <%= key.name %> <%= key.text %>
<% end -%>
<% object.constraints.each do |constraint| -%>
alter table <%= object.name %> add constraint <%= constraint.name %> <%= constraint.text %>
<% end -%>
<% object.indexes.each do |index| -%>
<%= index.text %>
<% end -%>

Written by Brandon Byars

March 17, 2007 at 3:28 pm

Posted in Code Generation, Database, Ruby

Tagged with ,