A Day In The Lyf

The lyf so short, the craft so longe to lerne

Getting it to work is not the only goal

Posted by Brandon Byars Wed, 13 Jun 2007 03:44:00 GMT

I’ve had the good fortune of working with some really bad developers in the past. I’m sure the content of that sentence suggests some sarcasm to you; I assure you that I mean none. At one point, I myself was one such developer and didn’t realize it. This seems to be a common pattern—bad developers don’t know that they’re bad, and quite often they think they’re really good. Evidently, the pattern isn’t restricted to software developers (see this study for example).

Working with bad developers has made me a better one by seeing the results of bad practices. One of the things I now realize, for instance, is that getting something to work isn’t the only goal when working on a task. You also have to make sure that it will keep working. If you say you’re done after whatever you’re working on appears to work, you’re only doing half of your job. And probably, the easier half.

Making sure it will keep working means different things to different people. In my view, it means making sure the code is as simple as possible and getting it under a set of automated tests so that future developers aren’t afraid to change it. Too often I’ve seen bad developers point the finger at someone who has the gall to change “their” code when the change doesn’t work. “It’s complicated,” being one of the most immature and insulting complaints, as if it’s ok to erect a solid barrier around code you’ve written because you’re the only one smart enough to understand it. In my experience, that attitude represents a lack of aesthetics when it comes to coding. It’s the result of saying you’re done when you get it to work, not when it’s simple. Simple Ain’t Easy. It’s the result of thinking that testing is something that only those who aren’t smart enough to be developers do.

The desire to stop when you get something working isn’t a problem that only bad developers have; it’s just highlighted more against a backdrop of bad developers. But it’s a discipline that even good developers break from time to time.

The team I’m currently working with is easily the most competent group of developers I’ve ever worked with. We’re actually pretty decent at trying to keep things simple and get as much of our code tested as we know how. But there’s some code which we just haven’t found good ways to easily test yet. And unfortunately, we still break code changing it.

In my opinion, being afraid to change the code isn’t the answer. In fact, being afraid to change the code seems like the worst possible solution. It’s surrendering. It’s giving up on the aesthetics that make coding fun and fulfilling.

We’re still working on a solution. My idea for the moment is to use the bugs that do get through as opportunities to improve our integration tests. We use Watir, for example to test the web site, and the controller and view code in the web site has proven too difficult to effectively unit test. I think a good mindset is to extend the lesson learned about not stopping when you get something to work to bug fixing. If a bug does get through, don’t say it’s fixed until you see a failing integration test that exposes the bug, and then see the test pass after you’ve made the fix. The fix itself then becomes, in many cases, the easiest part of bug fixing.

It’s funny how many developers think that’s overkill. I remember, way back when when I first saw the traditional waterfall model, seeing something like this:

Requirements – Design – Code – Test – Deploy

It doesn’t matter whether you’re doing waterfall or a more iterative cycle. Coding, while obviously important, doesn’t dominate that cycle. Too many developers act as if its the only part of development.

Posted in | no comments |

C# Execute Around Method

Posted by Brandon Byars Tue, 12 Jun 2007 04:46:00 GMT

Kent Beck called one of the patterns in Smalltalk Best Practice Patterns “Execute Around Method.” It’s a useful pattern for removing duplication in code that requires boilerplate code to be run both before and after the code you really want to write. It’s a much lighter weight method than template methods (no subclassing), which can accomplish the same goal.

As an example, I’ve written the following boilerplate ADO.NET code countless times:

public DataTable GetTable(string query, IDictionary parameters)

{

    using (SqlConnection connection = new SqlConnection(this.connectionString))

    {

        using (SqlCommand command = new SqlCommand(query, connection))

        {

            connection.Open();

            foreach (DictionaryEntry parameter in parameters)

            {

                command.Parameters.AddWithValue(parameter.Key.ToString(), parameter.Value);

            }

 

            SqlDataAdapter adapter = new SqlDataAdapter(command);

            using (DataSet dataset = new DataSet())

            {

                adapter.Fill(dataset);

                return dataset.Tables0;

            }

        }

    }

}

 

public void Exec(string query, IDictionary parameters)

{

    using (SqlConnection connection = new SqlConnection(this.connectionString))

    {

        using (SqlCommand command = new SqlCommand(query, connection))

        {

            connection.Open();

            foreach (DictionaryEntry parameter in parameters)

            {

                command.Parameters.AddWithValue(parameter.Key.ToString(), parameter.Value);

            }

 

            command.ExecuteNonQuery();

        }

    }

}

Notice that the connection and parameter management overwhelms the actual code that each method is trying to get to. And the duplication means I have multiple places to change when I decide to do something differently. However, since the using block encloses the relevant code, a simple Extract Method refactoring is not as easy to see.

Here’s the result of applying an Execute Around Method pattern to it.

private delegate object SqlCommandDelegate(SqlCommand command);

 

public DataTable GetTable(string query, IDictionary parameters)

{

    return (DataTable)ExecSql(query, parameters, delegate(SqlCommand command)

    {

        SqlDataAdapter adapter = new SqlDataAdapter(command);

        using (DataSet dataset = new DataSet())

        {

            adapter.Fill(dataset);

            return dataset.Tables0;

        }

    });

}

 

public void Exec(string query, IDictionary parameters)

{

    ExecSql(query, parameters, delegate(SqlCommand command)

    {

        return command.ExecuteNonQuery();

    });

}

 

private object ExecSql(string query, IDictionary parameters, SqlCommandDelegate action)

{

    using (SqlConnection connection = new SqlConnection(this.onnectionString))

    {

        using (SqlCommand command = new SqlCommand(query, connection))

        {

            connection.Open();

            foreach (DictionaryEntry parameter in parameters)

            {

                command.Parameters.AddWithValue(parameter.Key.ToString(), parameter.Value);

            }

 

            return action(command);

        }

    }

}

Much nicer, no?

Posted in | no comments |

Big Methods Considered Harmful

Posted by Brandon Byars Mon, 28 May 2007 15:43:00 GMT

Several years back, as a young programmer out of school who thought he understood OOP inside and out, I remember a conversation with a colleague about having to take over somebody else’s code. My colleague was upset because the original programmer used so many small methods that it was hard to figure out what anything was doing. Isn’t it so much easier, he rationalized (and I agreed), to just use a few methods, and a few objects, and make it obvious what you’re doing?

Years later, my colleague having moved on, we’re left with a mess of a system in certain parts—and those big method parts are now the hardest to understand, maintain, and extend. With the accretion of features, fixes, and cruft, some of those methods have morphed to over 1000 lines of code, and appear impervious to refactoring due to our complete inability to understand what the hell the method actually does. It’s a tremendous counter-example to our earlier rationalizations.

I now recognize the “bigger is better” attitude as the mark of an immature object-oriented developer, somebody who hasn’t understood the real power of OO yet. Kent Beck make the point vividly with his Composed Method pattern in Smalltalk Best Practice Patterns. Large methods do indeed make it easier to follow the flow of control, but they do so at the expense of flexibility and composability.

Small methods allow you to isolate assumptions. Small methods allow you to say things once and only once, leading to code that is DRY and elegant. Small methods let you easily see the big picture without getting lost in the details (our earlier naive fallacy was wanting to see the details up front). Small methods help you discover new responsibility—feature envy stands out more. Small methods help you isolate rates of change, keeping responsibilities that have to change in every subclass tucked away in one set of methods, and those that don’t have to change in another set of methods. Small methods allow you to see everything at the same level of abstraction. Small methods make most comments unnecessary. Small methods make unit testing easier, since the units are smaller. Small methods aid in creating cohesive systems, where each method has only one reason to change. And small methods make performance tuning easier.

Yes, small methods can help performance. A lot of people, particularly those from the C or C++ world, seem to have trouble believing that. It’s true that methods have some overhead to maintain the stack, but for 99.999% of applications the overhead that incurs simply isn’t worth worrying about, and if it is worth worrying about then you’re probably not writing in an object-oriented language anyhow. Algorithmic improvements are several orders of magnitude more important than inlining method calls. And small methods, which isolate assumptions so well, make algorithmic improvements easier to spot. Want to use a memoization cache? You’ll likely have to affect only one method.

The C language has macros, which are textually substituted in a preprocessing step to simulate function calling without incurring the overhead. Consider the following quote:

There is a tendency among older C programmers to write macros instead of functions for very short computations that will be executed frequently… The reason is performance: a macro avoids the overhead of a function call. This argument was weak even when C was first defined, a time of slow machines and expensive function calls; today it is irrelevant. With modern machines and compilers, the drawbacks of function macros outweigh their benefits.

The author of that quote is Brian Kernighan (The Practice of Programming), who also happens to be the co-author of the first book on the C language.

Many people think (or are even trained) that the only reason to break apart a method is if you want to reuse a part of it. That line of thinking is indefensible. The most difficult part of programming is not maximizing reuse; it’s minimizing complexity. Reuse is just one of the tools we use to minimize complexity; writing clean code that communicates well is another.

Posted in , | no comments |

TDD'ing a Markov Chain

Posted by Brandon Byars Sun, 06 May 2007 17:40:00 GMT

In The Practice of Programming, Brian Kernighan and Rob Pike develop a simple algorithm to scramble text, yet do so in a way that helps prevent turning the output into gibberish. The idea is simple to understand:

  • Parse the input into a list of prefixes. A prefix of two words long seems to work well.
  • For each prefix in the input text, keep a list of suffixes, where a suffix is the word immediately following the prefix. If the same suffix exists in the input multiple times for a prefix, it should be listed multiple times in the suffix list.
  • To create the output text, starting with an initial prefix, randomly select a suffix for that prefix. Then slide your prefix over one word so that it now includes the last word of the previous prefix and the new suffix. Rinse, lather, repeat.

Kernighan and Pike suggest using sentinel values to help start and end the process. The algorithm is a simple example of a Markov chain algorithm.

I decided to give a Ruby implementation a go in a TDD fashion to see if I could learn something. I had the advantage of having already seen a few implementations by Kernighan and Pike, but to keep it somewhat fair, I didn’t consult the book during the implementation.

If you want to follow along with the code, you can download it here. To give you a sense of where we’re going, here’s the first stanza of Edgar Allen Poe’s The Raven, after scrambling:

Once upon a bust of Pallas just above my chamber door; - This it is, and this mystery explore; - ‘Tis the wind and nothing more.

Posted in , | no comments |

Beer, Software, and Hypocrisy

Posted by Brandon Byars Sat, 28 Apr 2007 02:17:00 GMT

I love free software. I love it when it’s free as in speech, and I really love it when it’s free as in beer. The development environment I currently work in lives on an open source infrastructure, and most of the best developers I’ve talked to have contributed in some way to the open source world.

I’ve also spoken to many developers who use Stallman’s notion of free software as nothing more than an excuse to pirate licensed software. The argument, as I understand it, is that society has a moral obligation to make software free, and since they don’t, we’re perfectly justified in using cracked versions of their products. Apparently (the reasons aren’t clear to me), the argument extends to music, movies, and TV shows as well.

I’ll remain mute on the moral argument, and I admit to having used “borrowed” software in my younger and rasher days. What I find absurd, though, is the hypocrisy of using the moral argument of free software to justify pirating software just because you don’t want to pay for it. Most developers, and this certainly includes the ones I’ve spoken with about pirated software, simply aren’t competent enough to deal with the consequences of free software.

Richard Stallman sagely noted that, were software made free (as in speech), there would still be money to be made, but it would not be made by the average developer of today. To be a developer in Stallman’s utopia would require both a passion and competence found only in the upper echelon of today’s developers. Richard Stallman has the skills to back it up. So do many developers. But not most. And for the rest of us, saying that software should be “free as in speech” has too often become a cop-out, when really all we want is a free drink.

Posted in | no comments |

.NET Database Migrations

Posted by Brandon Byars Sun, 15 Apr 2007 03:35:00 GMT

Pramod Sadalage and Scott Ambler have suggested using a series of numbered change scripts to version your database. Start with a base schema, and every subsequent change gets its own change script, grabbing the next number. That version number is stored in a table in the database, which makes it easy to update—you just run all change scripts, in order, greater than the version stored in your database.

The Ruby on Rails team implemented this technique in their migrations code. It’s quite elegant. This blog uses a Rails application called Typo; here’s one of its migrations:

  class AddArticleUserId < ActiveRecord::Migration
    def self.up
      add_column :articles, :user_id, :integer

      puts "Linking article authors to users"
      Article.find(:all).each do |a|
        u=User.find_by_name(a.author)
        if(u)
          a.user=u
          a.save
        end
      end
    end

    def self.down
      remove_column :articles, :user_id
    end
  end

That migration is called 3_add_article_user_id.rb, where 3 is the version number. Notice that it’s written in Ruby, not in SQL. It adds a column called user_id to the articles table and updates the data. The data update is particularly interesting—we get to use the ActiveRecord O/RM code instead of having to do it in SQL (although you can use SQL if you need to). The Rails migration code can also rollback changes; that’s what the down method is for.

The problem I’ve always had with this scheme is that we have many database objects that I’d like to version in their own files in our source control system. For example, here’s our directory structure:

db/ functions/ migrations/ procedures/ triggers/ views/

We have several files in each directory, and it’s convenient to keep them that way so we can easily check a subversion log and see the history of changes for the database object. For us to use the migrations scheme above, we’d have to create a stored procedure in a migration, and later alter it in a separate migration. Since the two migrations will be in separate files, our source control wouldn’t give us a version history of that stored procedure.

We came up with a hybrid solution. Schema changes to the tables use a migration scheme like Rails. Database objects are versioned in separate files. Both the schema changes and the peripheral database object changes are updated when we update the database.

For this to work, we have to be a little careful with how we create the database objects. We want them to work regardless of whether we’re creating them for the first time or updating them, which means ALTER statements won’t work. The solution is simply to drop the object if it exists, and then create it. This is a fairly common pattern.

I wrote an NAnt and MSBuild task to do the dirty work. It runs both the schema migrations and the database object updates. Both are optional, so if migrations are all you want, that’s all you need to use. It expects all migrations to be in the same directory, and match the pattern 1.comment.sql, where 1 is the version number. It will be stored in a database table whose default name is SchemaVersion, with the following structure:

CREATE TABLE SchemaVersion (
  Version int, 
  MigrationDate datetime, 
  Comment varchar(255)
)

I’ve only tested it on SQL Server, but I think the task should work for other DBMS’s as well (it uses OLEDB). Migrations can contain batches (using the SQL Server GO command) and are run transactionally. Unlike the Rails example, the .NET migrations use SQL, and I don’t yet have any rollback functionality.

You can include any extra SQL files you want in the DatabaseObjects property. Both NAnt and MSBuild have convenient ways to recursively add all files matching an extension.

Here’s an NAnt example:

    <target name="migrate" description="Update the database">
         <loadtasks assembly="Migrations.dll" />
         <migrateDatabase
             connectionString="Provider=SQLNCLI;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=Northwind"
             migrationsDirectory="db/migrations"
             commandTimeout="600"
             batchSeparator="go"
         >
            <fileset>
                <include name="db/functions/**/*.sql"/>
                <include name="db/procedures/**/*.sql"/>
                <include name="db/triggers/**/*.sql"/>
                <include name="db/views/**/*.sql"/>
            </fileset>
         </migrateDatabase>
     </target>

And here it is using MSBuild:

  <PropertyGroup>
      <ConnectionString Condition="$(ConnectionString)==''">
Provider=SQLNCLI;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=Northwind
      </ConnectionString>
  </PropertyGroup>

  <ItemGroup>
       <DatabaseObjects Include="db/functions/**/*.sql"/>
       <DatabaseObjects Include="db/procedures/**/*.sql"/>
       <DatabaseObjects Include="db/triggers/**/*.sql"/>
       <DatabaseObjects Include="db/views/**/*.sql"/>
  </ItemGroup>

  <Target Name="dbMigrate">
      <MigrateDatabase 
          ConnectionString="$(ConnectionString)"
          MigrationsDirectory="db/migrations"
          DatabaseObjects="@(DatabaseObjects)"
          CommandTimeout="600"
          TableName="version_info"
      />
  </Target>

The source code and binaries can be found here.

Posted in , , | no comments |

Using Rails Fixtures For Storing Test Data

Posted by Brandon Byars Sun, 18 Mar 2007 23:53:00 GMT

Yesterday I wrote about trying to retrofit agility into a large enterprise database. Even after we got our database scripted so that every developer could use a localhost database, we still had to switch back to an integration database too often to make it worthwhile. The problem was test data; we simply didn’t have enough to be useful, and it was difficult to maintain.

At first we tried SQL scripts, but quickly found them too hard to read. After creating a script to add data for one table, nary another script was written (random rant: why can’t the SQL INSERT statement follow the syntax of the UPDATE syntax? With big tables, the positional coupling between the column list and VALUES list is just too hard to maintain). Next, we tried a CSV file, which worked somewhat nicely because we could edit them in Excel. However, the BULK INSERT command we used to insert them caused too many problems. Any time a column was added, even if it was a nullable column, even if it was a computed column, you had to add it to the CSV file. And both the SQL file and the CSV files lacked the ability to add dynamic data. Many times we wanted to enter yesterday for a date, regardless of what actual date that happened to be. In some cases, we’d simply like to add 1000 rows, without regard to what data those rows contained.

Rails Fixtures

In the end, we hijacked some functionality from the Ruby on Rails testing framework. The Fixtures class allows you to make the following call in your tests to delete and re-add test data before each test:

  fixtures :orders, :order_items

That one line of code will, by default, delete the order_items and orders tables (in that order), and then add all data from the orders.yml and order_items.yml file (in that order). YAML is preferred in the Ruby world for it’s readability; XML, like so many “enterprisey” technologies, is considered bulky and clumsy to the Rails team. Even better, the Rails code first runs your YAML files through ERB (a Ruby templating engine) before sending it to the YAML parser, so you can add dynamic content.

It turns out that using the fixtures code outside of Rails is really quite easy. First, we need to set up our connection information and define the necessary ActiveRecord classes. We can define all them in one file (I’ll call it driver.rb):

require 'active_record'
require 'active_record/fixtures'

ActiveRecord::Base.establish_connection(
  :adapter  => "sqlserver",
  :database =>  "db_name",
  :username => "db_user",
  :password => "password"
)

ActiveRecord::Base.logger = Logger.new(File.dirname(__FILE__) + "/debug.log")

class DisplayCategory < ActiveRecord::Base
  set_primary_key "DisplayCategoryId"
  set_table_name "DisplayCategories"
end

ActiveRecord is the Rails object-relational mapping framework, and like everything else in Rails, it’s built around the principle of “convention over configuration.” Since we’re dealing with a legacy database, and one not intended to be consumed by a Rails application, we’ll have to settle for a little bit of configuration.

By default, ActiveRecord expects the table name to be the plural version of the class name, with underscores between words (display_categories), and the primary key be an identity column called id. Our database has a different naming standard, with a healthy dose of standard violations, so we’ll have to add the call to set_primary_key and set_table_name to all of our ActiveRecord classes. DisplayCategories has an identity column, but I’ll show you an example below that does not.

When the script is parsed by Ruby, ActiveRecord will connect to the database and add a property for every column in the table. Metaprogramming is what allows ActiveRecord to be so DRY.

Then, we need the actual data. The Rails convention expects the YAML file to match the table name, so we’ll put the following in data/DisplayCategories.yml:

books:
  DisplayCategoryId: 1
  DisplayName: Books
  SortIndex: 1
fiction:
  DisplayCategoryId: 2
  DisplayName: Fiction
  SortIndex: 2
  ParentDisplayCategoryId: 1

You see how readable YAML is. The keys (books, fiction) are not inserted into the database. We’ll find a good use for them below for testing purposes, but for now we’ll simply use them to help us describe the record.

Let’s run it. The following code should work (in add_test_data.rb):

require 'driver.rb'
Fixtures.create_fixtures("data", [:DisplayCategories])

Unfortunately, we may get an error running add_test_data.rb. By default, the YAML parser simply dumps every record into a hashtable, with the identifier as the key. Since hashtables have no implicit ordering, we could run into a problem since our data requires that it be inserted in order (fiction has a ParentDisplayCategoryId foreign key pointing to books). If order doesn’t matter to you, use the above syntax. When it does matter, it’s a simple change:

--- !omap
- books:
    DisplayCategoryId: 1
    DisplayName: Books
    SortIndex: 1
- fiction:
    DisplayCategoryId: 2
    DisplayName: Fiction
    SortIndex: 2
    ParentDisplayCategoryId: 1

That omap syntax defines an ordered map, which solves our dependency problem. Now, running add_test_data.rb should work.

Dealing with Exceptions

DisplayCategories doesn’t quite meet the Rails conventions, but it’s close. What happens when we’re not close?

Let’s assume the following table definition:

<pre>
CREATE TABLE PersonName (
  PersonNameId int NOT NULL 
    CONSTRAINT PK_PersonName PRIMARY KEY,
  FirstName varchar(100) NOT NULL,
  LastName varchar(100) NOT NULL,
  FullName AS FirstName + ' ' + LastName
)

CREATE TRIGGER Audit_On_Update ON PersonName AFTER UPDATE
AS
INSERT INTO AuditPersonName(DateChanged, PersonNameId, FirstName, LastName)
SELECT GETDATE(), PersonNameId, FirstName, LastName
FROM deleted
Now, our configuration with ActiveRecord becomes more involved. There are four headaches we’ll run into here:
  • We don’t have an identity column
  • We have a computed column
  • Our table name isn’t plural
  • We have a trigger

Let’s tackle them one at a time. Instead of an identity column, let’s assume that we have to call a stored procedure called get_next_id to give us our next id as an output parameter. The following will work:

class PersonName < ActiveRecord::Base
  set_primary_key "PersonNameId"
  set_table_name "PersonName"

  def before_create
    sql <<-EOT
      DECLARE @id int
      EXEC get_next_id 'PersonName', @id OUTPUT
      SELECT @id
    EOT
    self.id = connection.select_value(sql).to_i
  end
end

We’re using one of the built-in hooks ActiveRecord provides to change how it gets the ids for us. In case you’ve never seen that EOT business, it’s called a here document; you can read about it here.

However, we still won’t be able to use that ActiveRecord implementation because it doesn’t understand computed columns, and will try to add FullName to the insert list. To work around that problem, add the following code to your PersonName class:

class PersonName
  @@computed_columns = ['FullName']

  def initialize(attributes = nil)
    super
    @@computed_columns.each { |column| @attributes.delete(column) }
  end
end

This is a bit hackish, but it gets the job done. I discovered the @attributes instance variable in ActiveRecord::Base when browsing for a solution, and found that I could simply remove the offending columns from it in the constructor.

Speaking of hacks, we can fix the problem of PersonName being singular by changing the way Rails understands the English language. Add the following to driver.rb:

require 'active_support/inflector'

Inflector.inflections do |inflect|
  inflect.uncountable 'PersonName'
end

The uncountable method was intended for words like fish and money. That’s ok, cheating seems like a pragmatic solution here. Chad Fowler mentions this solution in his book on Rails Recipes. Other options include:

Inflector.inflections do |inflect|
  inflect.plural /(ox)$/i, '\1en'
  inflect.singular /(ox)en/i, '\1'
  inflect.irregular 'person', 'people'
end

For more details, consult Fowler’s book, or dig through the source code. Now, add some test data and try to run it. We can add it to add_test_data.rb quite easily:

  require 'driver.rb'
  Fixtures.create_fixtures("data", [:DisplayCategories, :PersonName])  

Try running it. If you’re using SQL Server, you should receive an exception that reads “Cannot create new connection because in manual or distributed transaction mode.” That’s because our trigger is getting in the way. Add SET NOCOUNT ON as the first line of your trigger, and try again. Everything should work now.

Using dynamic fixtures

I mentioned above that Fixtures first runs the YAML file through ERB before sending it to the parser. That means we can clean up our data to get rid of most of the noise for columns we don’t care about. The following is a valid YAML fixture:

<%
def ignore_values
  <<EOT
  Column1: 1
  UselessDate: 1/1/2000
  SomeNullColumn:
EOT
end
%>
identifier1:
  ImportantColumn: 1
<%= ignore_values %>
identifier2:
  ImportantColumn: 2
<%= ignore_values %>

Now, at least we’re only writing the columns we care about for each record. Sometimes, however, we can do a lot better. Imagine an address table that stores both a main and a shipping address for each person. The following would be a valid fixture:

<%
@next_id = 0

def addresses_for(name, attributes)
  "#{address_for(name, 'main', attributes)}#{address_for(name, 'shipping', attributes)}"
end

def address_for(name, type, attributes)
  @next_id += 1
  <<EOT
#{name}_#{type}:
  AddressId: #{@next_id}
  PersonId: #{attributes[:PersonId]}
  AddressTypeId: #{type == 'main' ? 1 : 2}
  Geocode: 
  Address1: #{attributes[:Address1]}
  Address2: 
  Address3: 
  City: #{attributes[:City]}
  State: #{attributes[:State]}
  Zip: #{attributes[:Zip]}
  County: #{attributes[:County]}
EOT
end
%>
<%= addresses_for('person1', {
  :PersonId => 1,
  :City => 'Marshall',
  :Address1 => '123 Some Road',
  :State => 'TX',
  :Zip => '75672',
  :County => 'Harrison'
}) %>
<%= addresses_for('person2', {
  :PersonId => 2,
  :City => 'Marshall',
  :Address1 => 'P.O. BOX 123',
  :State => 'TX',
  :Zip => '75672',
  :County => 'Harrison'
}) %>

That fixture will create 4 records, person1_main, person1_shipping, person2_main, and person2_shipping. Since we don’t care about having a separate billing address and shipping address for our test data, we can reduce the amount of information we have to add. And if we ever do care, it’s easy to write it out without calling our addresses_for method.

Testing

The Fixtures class was written for testing, and it has some nice features that give you access to each loaded record, for example, as an instance variable named after the record identifier (@person1, @fiction, etc). I had trouble getting that part to work with SQL Server, but even if you don’t ask Fixtures to load each record for you, you can still access them, which is quite useful if you’re using Ruby for any of your testing (for example, in Watir scripts). Let’s add the following to driver.rb:

require 'test/unit'

Test::Unit::TestCase.fixture_path = File.dirname(__FILE__) + "data/"
$LOAD_PATH.unshift(Test::Unit::TestCase.fixture_path)

class Test::Unit::TestCase
  self.use_transactional_fixtures = false
  self.use_instantiated_fixtures = :no_instances
end

Now, we should be able to load the fixtures automatically in our test cases:

require 'test/unit'
require 'watir'
require 'driver'

class ProductBrowsingTest < Test::Unit::TestCase
  fixtures :DisplayCategories

  def test_something
    # create a browser instance, and navigate to the right page
    browser.link(:text, @DisplayCategories["fiction"]["DisplayName"]).click
  end
end

I’m working on a code generation template that will give me similar access to the test data for C# fixtures. I’ll try to post the solution when I get it ready.

Posted in , | no comments |

Ruby and SQL DMO

Posted by Brandon Byars Sat, 17 Mar 2007 20:28:00 GMT

We have a rather large legacy system that we are trying to inject a healthy dose of agility into. One of the biggest challenges has always been versioning the database, which is large, clumsy, and cluttered. We managed to get the rest of the code under a continuous integration scheme, but every time somebody made a schema change, or even updated a stored procedure needed for some tests to work, manual intervention was needed.

Pramod Sadalage and Martin Fowler wrote one of the first articles on iterative database design, and Pramod later teamed up with Scott Ambler to collaborate on Refactoring Databases. The advice, adopted by the Ruby on Rails team, was to create a separate migration file for each schema change, and number them sequentially. For example, the first migration would be 1.sql, then 2.sql, and so on. You could store the latest migration file run on a database in a version table, which would make updating a database as easy as running every migration, in order, whose version number is greater than the one stored in your database table.

We managed to do something similar, although it required a number of changes (I’ll write up the solution soon). But before it could all work, you need a starting point—a base schema to create a new developer database. We use SQL Server 2000, so initially I simply had Enterprise Manager create a SQL script for me. Not only did it not work (I don’t think it sorted the dependencies right), it was a ghastly nightmare to look at.

Why do standard vendor-supplied code generation tools create such ugly code?

I decided to do the code generation myself using SQL DMO (the same COM interfaces that Enterprise Manager was using, just poorly). I’d successfully used Ruby and ERB for code generation before, and discovered I could make very nice looking code (Code Generation in Action by Jack Herrington describes the concept nicely). Within just a couple hours, I had a SQL script that not only looked much nicer than anything Enterprise Manager spits out; but worked to boot.

First, I needed to connect to the database I wanted to dump into a SQL file (forgive the lack of syntax highlighting; I’m working on getting that working):

  require 'win32ole'

  class SqlServer
    def initialize(server, catalog)
      sql = WIN32OLE.new("SQLDMO.SQLServer")
      sql.LoginSecure = 1
      sql.Connect(server)
      @db = sql.Databases(catalog)
    end
  end

This uses Windows Authentication to connect to the appropriate SQL Server database. The DMO interface, described here, is one of those clumsy, wandering APIs commonly developed when VBScript is your target language (warning, the MSDN website for DMO only seems to work in IE, and poorly at that). I decided to wrap some of the classes to make it easier to use. First, a superclass:

  class DBObject
    def initialize(object)
      @object = object
    end

    def name
      return @object.Name unless @object.Name =~ / /
      "[#{@object.Name}]"
    end

    def beautify(text)
      # strip brackets if they're not needed, and remove the .dbo prefix
      text.gsub!(/\[([^ ]*?)\]/, '\1')
      text.gsub(/dbo\./, '')
    end
  end

Here I provide one of the keys to generating code that looks appealing. I really don’t like to look at all the noise of bracketing all the tables and column names, just in case the name contains a space in it. The following looks ugly to me:

  SET ANSI_NULLS ON
  GO
  SET QUOTED_IDENTIFIER ON
  GO
  SET ANSI_PADDING ON
  GO
  CREATE TABLE [dbo].[tablename] (
    [id] [int] NOT NULL,
    [description] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
  ) ON [PRIMARY]

That drives me wild. I don’t ever change any of the settings for ANSI nulls, etc, and I could care less what the collation is so long as it matches the database default. Nor do I care about the file group. It’s hard to see through all the noise.

Here’s what I want:

  CREATE TABLE tablename (
    id int NOT NULL
    description varchar(100) NOT NULL
  )

Our name and beautify methods will help us achieve prettier code. Here’s the most important subclass:

  class Table < DBObject
    attr_reader :columns, :constraints, :keys, :indexes, :references

    def initialize(table, connection)
      super(table)
      @columns, @constraints, @keys = [], [], []
      table.Columns.each { |col| @columns << Column.new(col) }
      table.Checks.each { |check| @constraints << CheckConstraint.new(check) }
      table.Keys.each { |key| @keys << Key.new(key, connection) }
      get_indexes(table)
      get_references
    end

    private
    def get_references
      @references = []
      @keys.each do |key|
        if "" != key.referenced_table and name != key.referenced_table
          @references << key.referenced_table unless references.include?(key.referenced_table)
        end
      end
    end

    def get_indexes(table)
      @indexes = []
      # the Indexes collection includes objects already in Keys and statistics
      keys = @keys.map { |key| key.name }
      table.Indexes.each do |index|
        if not keys.include?(index.Name)
          if index.Type == 16 || index.Type == 0
            @indexes << Index.new(index)
          end
        end
      end
    end
  end

You can find the classes it depends on by downloading all of the code here. Notice, however, that a database connection is needed for the Key constructor. As far as I could tell, there was no way, using nothing more than DMO, to find out if a key cascade deletes. I had to query the INFORMATION_SCHEMA views to find that information.

For our script to work, we’re going to need to order our dependencies correctly. The SQL script will fail if we try to add a foreign key to a table that doesn’t yet exist. The following should do the trick:

  class SqlServer
    # returns a topological sort with parent tables in front of child tables
    def self.topologically_sorted(tables)
      sorted = []

      # We need a hash to navigate the references field, which gives only names
      table_hash = {}
      tables.each { |table| table_hash[table.name] = table }

      # first add all root tables to sorted
      sorted += tables.find_all { |table| 0 == table.references.length }

      while tables.length > sorted.length
        sorted += tables.find_all do |table|
          if sorted.include?(table)
           result = FALSE
          else
            # all dependencies must already be in sorted
            dependencies = table.references.map { |ref| table_hash[ref] }
            result = (nil == dependencies.find { |ref| not sorted.include?(ref) })
          end
          result
        end
      end
      sorted
    end
  end

Now, our code is as simple as binding to some ERB templates:

  require 'erb'
  require 'db_object'

  class SchemaWriter
    def initialize(server, catalog)
      @db = SqlServer.new server, catalog
    end

    def generate_create_script
      generate_code_for(@db.user_defined_datatypes, "create_udt")
      generate_code_for(@db.rules, "create_rule")
      generate_code_for(SqlServer.topologically_sorted(@db.user_tables), "create_table")
    end

    def generate_code_for(objects, template_name)
      file_name = template_name + ".erb"
      template = ""
      File.open(file_name) { |file| template = file.read }
      objects.each do |object|
        erb = ERB.new(template, nil, '-')
        puts erb.result(binding)
      end
    end
  end

  if $0 == __FILE__
    writer = SchemaWriter.new(ARGV[0], ARGV[1])
    writer.generate_create_script
  end

As an example, here’s the create.table.erb template:

  create table <%= object.name %>(
  <% object.columns.each_with_index do |column, i| -%>
      <%= column.name %> <%= column.text %><%= "," unless i == object.columns.length - 1 %>
  <% end -%>
  )

  <% object.keys.each do |key| -%>
  alter table <%= object.name %> add constraint <%= key.name %> <%= key.text %>
  <% end -%>
  <% object.constraints.each do |constraint| -%>
  alter table <%= object.name %> add constraint <%= constraint.name %> <%= constraint.text %>
  <% end -%>
  <% object.indexes.each do |index| -%>
  <%= index.text %>
  <% end -%>

Posted in , , | no comments |

Cohesion

Posted by Brandon Byars Wed, 21 Feb 2007 21:32:00 GMT

Wherefore Design?

When I first started programming, I naturally assumed that programming was the hardest part of programming. What a fool I was.

There are a number of things that good developers do that, at first glance, appear to slow down how fast they write code. A great deal of time is spent communicating with users and other developers; tests are written; code that already appears to work is refactored. And a great deal of time is spent in the ivory tower world of design. Perhaps not up-front (I prefer the agile motto of designing all the time), but design is an activity that takes time. If all these tasks take so much time away from writing code (which, on the surface of things, seems to be the job of developers), then why bother?

It turns out that programming is the easiest part of programming. It’s the same mistake managers make when they revolt against their developers pairing up to solve a problem. Writing code is not about typing. It’s not even so much about writing code. It’s about designing a system to meet the customers’ needs, and it’s about ensuring that the system will continue to be able to meet the customers’ needs in the future.

Unfortunately, the first large system I wrote was with a team where everybody, myself included, thought that programming was about writing code. I got an enormous amount of code written in a very short amount of time. A year and a half later, the system still had not gone live, and I had completely lost confidence in my ability to make a change without breaking some essential functionality. This is a common phenomenon, which I can informally depict as a productivity curve:

I followed the red line for the system. I think most developers probably do. The problem is that initial surge of productivity. It’s intoxicating.

Avoiding the red curve is what separates good developers from everybody else.

Design is one of those extra things good developers do in an attempt to follow the black line in the graph above. You miss out on the initial addictive surge, but good developers recognize it as fools’ gold anyhow.

Remember Alice? It’s a song about Alice.

In his study of structural design, Larry Constantine identified cohesion as the central metric in creating good designs. The object-oriented evolution did little to change this fact—indeed, object-orientation is a natural result of trying to increase cohesion over procedural code. If you want good design, then you need to understand cohesion. A system without cohesion is a real Nugger-Tugger

Non-cohesive designs ramble, and they say the same thing over and over again, just in different places. They never really do what they’re supposed to. It’s like listening to Arlo Guthrie sing Alice’s Restaraunt. You really can get anything you want at Alice’s restaurant.

Uncle Bob helped restate cohesion for the OO world as the Single Responsibility Principle, which basically states that a class should have only one reason to change. Constantine’s definition included any module, so it’s fair to say the same thing about methods.

Cohesion is about minding your own business, and it is the design principle that explains why Feature Envy stinks. For example, imagine an order needing to calculate a subtotal. Here’s one way to do it:

  public class Order
  {
      // ...

      public Money Subtotal
      {
          get
          {
              Money subtotal = Money.Zero;
              foreach (LineItem item in items)
              {
                  subtotal += item.Product.UnitPrice * item.Quantity;
              }
              return subtotal;
          }
      }
  }

There are a couple of things wrong with this code. First the feature envy—the Subtotal property seems more interested in LineItem’s data than its own class’s data. Clearly, Order has more than one reason to change. It has to change if any of Order’s responsibilities change, and it just might have to change if LineItem’s responsibilities change.

Second, notice how the Subtotal property is actually reaching into the Product’s data as well—yet another reason to change. This is an example of violating the Law of Demeter. LoD purists like to spout out a very precise definition for Demeter. I prefer to think of it as the “two dot” rule. If I see two dots in a single expression, I probably need to rethink what I’m doing. I’ve now coupled myself both to the LineItem class (which is ok, since Order contains LineItem objects) as well as the Product class (which is unnecessary).

This is important: increasing cohesion reduces coupling. Constantine saw them as two sides of the same coin.

Let’s try again:

  public class Order
  {
      // ...

      public Money Subtotal
      {
          get
          {
              Money subtotal = Money.Zero;
              foreach (LineItem item in items)
              {
                  subtotal += item.Subtotal;
              }
              return subtotal;
          }
      }
  }

  public class LineItem
  {
      // ...

      public Money Subtotal
      {
          get { return Product.UnitPrice * Quantity; }
      }
  }

Now Order is minding its own business. In programming, spreading ignorance is A Good Thing.

Different levels of cohesion

Rather than speaking of modules which either have or lack cohesion, Constantine identified different levels of cohesion. While I don’t consider it too important to know them by name, I do find it helpful to reflect on a few of my mistakes in reference to those concepts.

Logical cohesion

Sometimes, we tend to group functionality into a class simply because it naively seems to go together. In that first system I mentioned above, I wrote a class called Validation. As you may expect, Validation has many reponsibilities. in this case, it validated addresses, phone numbers, accounts, and a host of other unrelated things. Validation suffers from what Constantine called logical cohesion.

More common (and less damaging) examples of logical cohesion are found in languages like C# and Java, and are hard to get around due to language limitations. Consider the System.Math class in .NET. What’s its purpose? Well, it calculates arc-cosines and square roots and ceilings and exponents, as well as rounding and truncating. The same thing is true of the utility or helper classes that predominate in the mainstream world. Find yourself doing the same thing over and over again with text? Create a StringUtils class that does it for you.

Those methods really belong on the string and float classes, not tucked away in some utility classes. More powerful languages allow you to do this. Want to add a natural log method to Float? Here’s Ruby code that does just that:

  class Float
    def ln
      # ... code that performs a natural logarithm
    end
  end

Float already exists—it ships with Ruby. That’s ok; Ruby has no problem with you adding methods to classes that already exist. Rumor has it that the next version of C# may let you do the same thing. It will be interesting to gauge the mainstream response to this feature.

Temporal and Procedural cohesion

Constantine identified two levels of cohesion, temporal and procedural, that have to do with putting unrelated tasks together simply because they happen to occur at more or less the same time. The only difference between the two is that procedural cohesion requires some procedural relationship between the two elements. When I first started programming, I was extraordinarily guilty of creating unnecessary procedural cohesion, due to some delusional belief that minimizing the number of loops would improve performance.

As an example of procedural cohesion, let’s look at an alternative implementation of Order:

  public class Order
  {
      private Money subtotal;
      private Money tax;
      private Money shipping;
      private double taxPercent;
      private Money total;

      // ...

      public void CalculateTotals()
      {
          subtotal = Money.Zero;
          tax = Money.Zero;

          foreach (LineItem item in items)
          {
              subtotal += item.Subtotal;
              tax += item.Tax;
          }

          shipping = ShippingFor(subtotal);
          if (ChargeTaxOnShipping(state))
          {
              tax += shipping * taxPercent;
          }
          total = subtotal + shipping + tax;
      }

      public Money Subtotal
      {
          get { return subtotal; }
      }

      public Money Tax
      {
          get { return tax; }
      }

      public Money Shipping
      {
          get { return shipping; }
      }

      public Money Total
      {
          get { return total; }
      }
  }

The code in CalculateTotals is very procedural in nature. What’s worse, the method does not have a single responsibility, and suffers from procedural cohesion as a result (notice how we’re now talking about the method’s cohesion, instead of the class’s—Constantine was writing before object-orientation was all the rave). Notice in particular the ugly dependence between tax and shipping. We’re calculating some of the tax in the loop along with the subtotal, but we may have to add more after the loop if the state charges tax on shipping.

Getting rid of the cohesion problem is easy:

  public class Order
  {
      // ...

      public Money Subtotal
      {
          get 
          { 
              Money subtotal = Money.Zero;
              foreach (LineItem item in items)
              {
                  subtotal += item.Subtotal;
              }
              return subtotal;
          }
      }

      public Money Tax
      {
          get 
          {
              Money tax = Money.Zero;
              foreach (LineItem item in items)
              {
                  tax += item.Tax;
              }
              if (ChargeTaxOnShipping(state))
              {
                  tax += Shipping * TaxPercent;
              }
              return tax;
          }
      }

      public Money Shipping
      {
          get { return ShippingFor(Subtotal); }
      }

      public Money Total
      {
          get { return Subtotal + Shipping + Tax; }
      }
  }

Much nicer. Now each method has only one responsibility, and is easy to understand. However, it’s annoying to have to duplicate those loops everywhere. Some of the nicer languages have internal iterators or closures to help. C# 2.0 has similar constructs, but they’re a bit clumsy to use because of all the static typing noise:

  public class Order
  {
      // ...

      public Money Subtotal
      {
          get 
          { 
              return Sum(items, delegate(LineItem item) 
                  { return item.Subtotal; }); 
          }
      }

      public Money Tax
      {
          get 
          {
              Money tax = Sum(items, delegate(LineItem item) 
                  { return item.Tax; });
              if (ChargeTaxOnShipping(state))
              {
                  tax += Shipping * TaxPercent;
              }
              return tax;
          }
      }

      public Money PaymentTotal
      {
          get 
          { 
              return Sum(payments, delegate(Payment payment) 
                  { return payment.Amount; }); 
          }
      }

      private delegate Money MonetaryPropertyDelegate<T>(T item);

      private Money Sum<T>(ICollection<T> collection,
          MonetaryPropertyDelegate<T> moneyGetter)
      {
          Money result = Money.Zero;
          foreach (T item in collection)
          {
              result += moneyGetter(item);
          }
          return result;
      }
  }

Posted in | no comments |

Nugger Tugger Designs

Posted by Brandon Byars Thu, 15 Feb 2007 05:12:00 GMT

Does your design ever look like this?

You’ll notice that, while neither humans, sharks, elephants, birds, bears, beavers, zebras, wasps, cats, or hyenas are amphibians, Nugger-Tugger himself is an amphibian. Really quite incredible when you think about it.

What works for art doesn’t work for design. I really like Nugger-Tugger, because I’ve seen some designs that remind me of a kid just trying to get a little bit of everything. And “Nugger-Tugger” has the added advantage of sounding a lot like “Mother-Tugger,” which is roughly what I hear developers say when they have to work in such systems.

Posted in | no comments |

Older posts: 1 2 3 4