A Day In The Lyf

…the lyf so short, the craft so longe to lerne

Archive for the ‘Design’ Category

Throw Out Those Utility Classes

How many times have you written an xxxUtils class, where xxx is some framework supplied class that you can’t extend or subclass? I always seem to end up with several in any decent sized project, StringUtils, DateUtils, DictionaryUtils, etc. In most cases, these classes are the result of language limitations. In Ruby and Smalltalk, for example, what would be the point of a StringUtils class when you could simply add methods to the String class directly? But C# and Java make String sealed (final) so you can’t even subclass it.

Utility classes like these tend to suffer from logical cohesion. In spite of the friendly-sounding name, logical cohesion is actually a fairly weak form of cohesion; it’s just a loose jumble of functions that have something in common. It can in no way be considered object-oriented.

Our DictionaryUtils makes an interesting case study because it was small. It only did two things: compared two dictionaries key-by-key for equality (useful in testing), and converting the entries to a Ruby-esque string. That last method made me a little jealous of how convenient Hashes are in Ruby:

middlestate:~ bhbyars$ irb
>> {'a' => 1, 'b' => 2, 'c' => 3}
=> {"a"=>1, "b"=>2, "c"=>3}

For the non-Ruby readers, I just created a 3-element Hash in one line. The command-line interpreter spit out a string representation. Our DictionaryUtils.ConvertToText could manage that last bit, but I wanted to be able to create hashtables as easily in C# as I could in Ruby. Naturally, that meant a third method on DictionaryUtils. Or did it?

C# on Hash

DictionaryUtils.Create seemed bloviated and ugly as soon as I first wrote it, so I quickly scratched it out and started a new class:

public class Hash
{
    public static Hashtable New(params object[] keysAndValues)
    {
        if (keysAndValues.Length % 2 != 0)
            throw new ArgumentException(“Hash.New requires an even number of parameters”);

        Hashtable hash = new Hashtable();
        for (int i = 0; i < keysAndValues.Length; i += 2)
        {
            hash[keysAndValues[i]] = keysAndValues[i + 1];
        }
        return hash;
    }
}

This allowed me to create small loaded Hashtables in one line, which was convenient, especially for test methods (although the syntax isn’t as explicit as Ruby’s). I then decided to merge the static DictionaryUtils methods into Hash, as instance methods. First, of course, I had to make Hash an actual dictionary implementation. This was trivial:

private IDictionary proxiedHash;

public Hash(IDictionary dictionary)
{
    proxiedHash = dictionary;
}

public bool Contains(object key)
{
    return proxiedHash.Contains(key);
}

public void Add(object key, object value)
{
    proxiedHash.Add(key, value);
}

public void Clear()
{
    proxiedHash.Clear();
}

// etc…

Then I changed the return value of Hash.New to a Hash instead of a Hashtable. The last line became return new Hash(hash) instead of return hash.

Next I moved the ConvertToText method, which, as an instance method, conveniently mapped to ToString.

public override  string ToString()
{
    SeparatedStringBuilder builder = new SeparatedStringBuilder(", ");
    ICollection keys = CollectionUtils.TryToSort(Keys);
    foreach (object key in keys)
    {
        builder.AppendFormat("{0} => {1}", Encode(key), Encode(this[key]));
    }
    return "{" + builder.ToString() + "}";
}

private object Encode(object value)
{
    if (value == null)
        return "<NULL>";

    IDictionary dictionary = value as IDictionary;
    if (dictionary != null)
        return new Hash(dictionary).ToString();

    if (value is string)
        return "\"" + value + "\"";

    return value;
}

The SeparatedStringBuilder class is a StringBuilder that adds a custom separator between each string. It’s very convenient whenever you’re a building a comma-separated list, as above. It’s proven to be handy in a variety of situations. For example, I’ve used it to build a SQL WHERE clause by making ” AND ” the separator. It’s included with the code download at the bottom of this article.

Notice, also, that we’re still using a CollectionUtils class. Ah, well. I’ve got to have something to look forward to fixing tomorrow…

The DictionaryUtils.AreEqual method conveniently maps to an instance level Equals method:

public override bool Equals(object obj)
{
    IDictionary other = obj as IDictionary;
    if (other == null) return false;
    Hash hash = new Hash(other);
    return hash.ToString() == ToString();
}

public override int GetHashCode()
{
    return proxiedHash.GetHashCode();
}

The syntax is much cleaner than the old DictionaryUtils class. It’s nicely encapsulated, fits conveniently into the framework overrides, and is object-oriented, allowing us to add other utility methods to the Hash class easily. It’s especially nice for testing, since the Equals method will work against any dictionary implementation, not just Hashes:

Assert.AreEqual(Hash.New(“address”, customer.Address), propertyBag);

The approach was simple, relying on proxying for fulfilling the IDictionary implementation (I’m probably abusing the word “proxying,” since we’re not doing anything with the interception. Really, this is nothing more than the Decorator design pattern). That was easy only because the framework actually provided an interface to subtype; the same isn’t true of String and Date. However, it isn’t true of StringBuilder either; if you look at the code, SeparatedStringBuilder looks like a StringBuilder, it talks like a StringBuilder, and it quacks like a StringBuilder, but there is no syntactic relationship between them since StringBuilder is sealed and doesn’t implement an interface. While the need for SeparatedStringBuilder may represent a special case, I think I’d prefer creating similar-looking objects rather than relying on a framework-provided xxx and a custom built xxxUtils class. Proxying, as used by Hash, generally makes such implementations trivial and clean, leaving you to spend your time developing what you really want without making the API unnecessarily ugly.

All the code needed to compile and test the Hash class can be found here.

Advertisements

Written by Brandon Byars

August 28, 2007 at 11:40 pm

Posted in .NET, Design

Tagged with

Dealing with Exceptions

We had an interesting discussion on the XP news group this past week about dealing with exceptions. The discussion seemed to focus on functions that return something rather than functions called for their side effects. Four general approaches were mentioned:

Return a special value (null or an error code)

This was generally seen as the weakest approach. Returning a null forces a null-check on each client of your method, which has the unfortunate effect of distributing complexity instead of localizing it. The canonical example is with lists—it is much better to return an empty list than a null one. Client code can work without any changes with an empty list. The same principle applies to error codes.

Throw an exception

This was the most controversial option. Several developers argued that exception handling is prone to the same weakness as the special return value, that it spreads complexity around. As Ron Jeffries’ said: “Throwing an exception doesn’t solve the problem any more than throwing up in the morning solves one’s drinking problem.”

Nevertheless, throwing an exception has proven to be the mainstream choice in modern languages, and the way exceptions bubble up the call stack can come in handy when used judiciously. Anthony Williams advocated using them, noting that they clearly signal an abortive operation (rather than relying on clients to check a return value), and allow application developers to write a global exception handler that catches any uncaught exceptions throughout the application.

The discussion intrigued me largely because I continue to use exceptions when convenient, although I’m interested in learning alternative approaches. I think that at least some of the hostility towards exceptions can be traced to the ugliness of checked exceptions (as in Java), which force clients to write try/catch blocks even if they’d prefer the exception bubble up.

Return a polymorphic value

This is probably the purest object-oriented solution, but I’m skeptical that it can work as a general case. Woolf’s Null Object Pattern is a special case—the empty list mentioned above being a simple example. We use a more sophisticated example in the content management system we developed for our website at work. When you go to ”/Content/AboutUs,” for example, the application knows “AboutUs” is supposed to be a content-managed page, and requests the content from the database. If the page doesn’t exist, rather than return null or throw an exception, it returns a new web page whose content matches our 404 page content.

If a page is content-managed, then users can edit it directly on the page in a fairly typical way. The content-managed portion is outlined with an ‘edit’ link at the top that puts the content in a WYSIWYG editor. However-this is the crucial point-the web page returned when it doesn’t exist in the database is a content-managed page. Yes, it has the 404 content, and anybody visiting that URL without content editing privileges sees what they are supposed to see. But if you do have content editing privileges, you see the familiar border with the edit link, and you can use the exact same procedure that you use to update other pages to create new pages.

The advantage to this approach is obvious—clients do not need to behave any differently regardless of whether the method call succeeded or failed. It was suggested that perhaps the same approach could be used even if some other exception occurred. I think in some cases it may be possible. I looked over some of our code for an example only to discover that the large majority of methods we use that throw exceptions have a void return type, but found an example of an exception-throwing method in our credit card processing code. The code simply sends an XML document to the company we use to process credit cards and returns a value representing the result of that transaction. It looks something like this:

public CreditCardResponse Send(string xml)
{
    IServerXMLHTTPRequest server = GetServer();
    try
    {
        server.send(message);
    }
    catch (Exception ex)
    {
        throw new ApplicationException("connection failed");
    }
    return Parse(server.responseText);
}

This code clearly could benefit from removing the thrown exception and enhancing or subclassing CreditCardResponse. That return type is already responsible for reporting errors given in the return XML by the credit card processor; why would clients want to treat failing to connect to the credit card processor any differently than one of those errors?

Callbacks

At first glance, I don’t see why a callback would be any easier than an exception handler for dealing with errors, other than perhaps making it easier to localize handling certain exceptions in one place. They have the unwanted effect of cluttering up the API by adding exception handling callback parameters.

Written by Brandon Byars

August 27, 2007 at 9:35 pm

Posted in Design

Using Higher Order Functions in Windows Forms Applications

My wife is in the middle of a research project comparing diet to the age of reproduction in African house snakes. She has to collect quite a bit of data, and when I finally looked at the spreadsheets she was maintaining, I was ashamed that I had not written something for her earlier.

This was really the first Windows Forms application that I’ve had the opportunity to do in years (my UI’s aren’t very inspiring). However, I have to maintain a couple at work that were primarily written by former colleagues, and I’ve always been a bit dismayed at the enormous amount of duplication that the standard event-driven application generates.

Despite the fact that the application I wrote for my wife was nothing more than a one-off application, one which you don’t expect to have to maintain, I focused on eliminating the duplication I see in the Windows applications at work. The result isn’t something that I would even begin to consider done for a corporate application, but I found the duplication removal techniques worth writing about. The code can be found here.

The biggest gains in removing duplication, and the ones most readers are likely to be least familiar with, are the use of higher order functions. My impression is that most C# developers aren’t very comfortable with higher order functions. Actually, I think that’s probably true for most developers working within mainstream commercially developed (Microsoft, Borland, Sun) languages. They’re simply not emphasized enough.

For example, all the forms had a ListView to display the data. All of them had to define the column header names and the data that goes in each row. It looked something like this:

protected override void AddHeaders()
{
    AddHeader(“Weight”);
    AddHeader(“Length”);
    AddHeader(“HL”);
    AddHeader(“HW”);
}

protected override void AddCells()
{
    AddCell(Weight);
    AddCell(Length);
    AddCell(HeadLength);
    AddCell(HeadWidth);
}

Having the subclass define the column header names and the data that goes in each row didn’t bother me. What did bother me was having to specify the order that the headers and data needed to be shown in two different place. However, while the header names were static, the data would be different for each invocation. The result was to specify the order only once, in an associative array (I used .NET 2.0’s generic Dictionary, which seemed to maintain the order I entered the items). The key would be the column name, and the value would be a function to retrieve the data value.

// The superclass for all Forms…
public class SnakeForm : Form
{
    protected delegate object GetterDelegate(object value);

    private IDictionary associations;

    protected virtual void AddListViewAssociations(
        IDictionary associations)
    {
        throw new NotImplementedException(“Override…”);
    }

    protected virtual IEnumerable ListViewHeaders
    {
        get
        {
            foreach (string header in associations.Keys)
                yield return header;
        }
    }

    protected virtual IEnumerable ListViewValues(object value)
    {
        foreach (GetterDelegate getter in associations.Values)
            yield return getter(value);
    }

    protected virtual void AddCells(object source)
    {
        foreach (object value in ListViewValues(source))
            AddCell(value);
    }

    private void SnakeForm_Load(object sender, EventArgs e)
    {
        associations = new Dictionary();
        AddListViewAssociations(associations);
        AddHeaders();
    }

    private void AddHeaders()
    {
        foreach (string header in ListViewHeaders)
            AddHeader(header);
    }

    private void AddHeader(string name)
    {
        ColumnHeader header = new ColumnHeader();
        header.Text = name;
        lvData.Columns.Add(header);
    }
}

The important things to note are that the subclass is passed, in a template method, a collecting parameter, associations, each entry of which represents a column name along with a way of retrieving the value for a row in that column. The delegate used to retrieve the value can be passed a single state parameter, which is needed by the report forms that need to pass in the source object for each row. Given that information, the superclass can manage most of the work. (AddListViewAssociations would have been abstract, except for the fact that Visual Studio’s designer doesn’t much care for abstract classes.)

For example, here is the information for the measurement form that was first given to show the problem:

protected override void AddListViewAssociations(
    IDictionary associations)
{
    associations.Add(“Weight”, delegate { return Weight; });
    associations.Add(“Length”, delegate { return Length; });
    associations.Add(“HL”, delegate { return HeadLength; });
    associations.Add(“HW”, delegate { return HeadWidth; });
}

One of the benefits of removing the ordering duplication is that the column names now sit beside the functions for retrieving the values, making it easier to understand. Notice that the GetterDelegate definition actually accepts an object parameter. C#’s anonymous delegate syntax lets you ignore unused parameters, making for a somewhat more readable line.

One of the forms shows the information about feedings per snake, and needed that parameter. Below is the entire implementation of the form (aside from the designer-generated code).

// ReportForm is a subclass of SnakeForm
public partial class FeedingBySnakeReport : ReportForm
{
    public FeedingBySnakeReport()
    {
        InitializeComponent();
    }

    protected override void AddListViewAssociations(
        IDictionary associations)
    {
        associations.Add(“Snake”, delegate(object obj)
            { return ((FeedingReportDto)obj).Snake; });
        associations.Add(“Diet”, delegate(object obj)
            { return ((FeedingReportDto)obj).Diet; });
        associations.Add(“Date”, delegate(object obj)
            { return ((FeedingReportDto)obj).Date; });
        associations.Add(“Weight”, delegate(object obj)
            { return ((FeedingReportDto)obj).Weight; });
        associations.Add(“Food Weight”, delegate(object obj)
            { return ((FeedingReportDto)obj).FoodWeight; });
        associations.Add(“Ate?”, delegate(object obj)
            { return ((FeedingReportDto)obj).Ate; });
        associations.Add(“%BM”, delegate(object obj)
            { return ((FeedingReportDto)obj).PercentBodyMass; });
        associations.Add(“Comments”, delegate(object obj)
            { return ((FeedingReportDto)obj).Comments; });
    }

    protected IEnumerable GetReportValues()
    {
        FeedRepository repository = new FeedRepository();
        return repository.FeedingsBySnake(Snake);
    }
}

In case you’re wondering what this form does, it allows you to select a snake, or all snakes, and see the feeding information in the ListView. It also lets you export all the data to a CSV file. Not bad for 30 lines of code.

Another thing that bothered me about all the event handlers was how similar they looked. The workflow was abstracted in the superclass into a HandleEvent method:

protected delegate void EventHandlerDelegate();

protected virtual void HandleEvent(EventHandlerDelegate handler)
{
    Cursor = Cursors.WaitCursor;
    try
    {
        handler();
    }
    catch (Exception ex)
    {
        ShowError(ex.Message);
    }
    finally
    {
        Cursor = Cursors.Default;
    }
}

HandleEvent takes a function that handles the meat of the event handler and wraps it within the code that’s common to all event handlers. Here’s a couple examples:

// In DataEntryForm, an abstract superclass, and subclass of SnakeForm
private void btnSave_Click(object sender, EventArgs)
{
    HandleEvent(delegate
    {
        if (!IsOkToSave())
            return;

        Save();
        AddRow(null);
        FinishListViewUpdate();
        Reset();
    });
}

// In ReportForm, an abstract superclass, and subclass of SnakeForm
private void btnShow_Click(object sender, EventArgs e)
{
    HandleEvent(delegate
    {
        lvData.Items.Clear();

        // GetReportValues() is a template method defined in the subclasses.
        IEnumerable reportValues = GetReportValues();
        foreach (object record in reportValues)
            AddRow(record);
    });
}

Managing the ListView proved to be fertile territory for removing duplication through higher order functions. For example, I used the first row’s data to set the column alignments automatically—if it looked like a number or date, right-align the data; otherwise left-align it.

private void SetAlignments(object record)
{
    int i = 0;

    // A bit hackish, but the report dtos currently provide strings only…
    foreach (object value in ListViewValues(record))
    {
        if (IsNumber(value) || IsDate(value))
            lvData.Columns[i].TextAlign = HorizontalAlignment.Right;
        else
            lvData.Columns[i].TextAlign = HorizontalAlignment.Left;

        i += 1;
    }
}

private bool IsNumber(object value)
{
    try
    {
        double.Parse(value.ToString().Replace(”%”, ””));
        return true;
    }
    catch
    {
        return false;
    }
}

private bool IsDate(object value)
{
    try
    {
        DateTime.Parse(value.ToString());
        return true;
    }
    catch
    {
        return false;
    }
}

Look how alike IsNumber and IsDate look. We can simplify:

private delegate void ParseDelegate(string text);

private bool IsNumber(object value)
{
    return CanParse(value, delegate(string text)
        { double.Parse(text.Replace(”%”, ””)); });
}

private bool IsDate(object value)
{
    return CanParse(value, delegate(string text) { DateTime.Parse(text); });
}

private bool CanParse(object value, ParseDelegate parser)
{
    try
    {
        parser(value.ToString());
        return true;
    }
    catch
    {
        return false;
    }
}

I used a similar trick to auto-size the column widths in the ListView based on the width of the largest item. Here’s the refactored code:

private delegate string GetTextDelegate(int index);

private void AutoSizeListView()
{
    int[] widths = new int[lvData.Columns.Count];
    FillSizes(widths, delegate(int i) { return lvData.Columns[i].Text; });

    foreach (ListViewItem item in lvData.Items)
    {
        FillSizes(widths, delegate(int i) { return item.SubItems[i].Text; });
    }

    for (int i = 0; i < lvData.Columns.Count; i++)
    {
        if (!IsHidden(lvData.Columns[i]))
        {
            lvData.Columns[i].Width = widths[i] + 12;
        }
    }
}

private void FillSizes(int[] widths, GetTextDelegate text)
{
    using (Graphics graphics = CreateGraphics())
    {
        for (int i = 0; i < lvData.Columns.Count; i++)
        {
            SizeF size = graphics.MeasureString(text(i), lvData.Font);
            if (size.Width > widths[i])
                widths[i] = (int)size.Width;
        }
    }
}

private bool IsHidden(ColumnHeader header)
{
    return header.Width == 0;
}

If this were a more long-lived application, I really should have bit the bullet and created my own ListView subclass. The methods above reek of Feature Envy.

Being able to treat functions as first-class objects is extremely useful. For some reason, it doesn’t get the attention it deserves in most development books. And it’s often somewhat obscured by intimidating sounding names like “lambda expressions” thanks to its roots in lambda calculus. However, much of what I was able to do in this application was possible only because I was able treat functions as data and pass them as parameters. And it was helped by the fact that I didn’t have to explicitly define each function as a method; I could create them anonymously like any other data object (although C#’s anonymous delegate syntax is somewhat obscured by the static typing).

Written by Brandon Byars

July 17, 2007 at 12:22 am

Posted in .NET, Design

Tagged with

Big Methods Considered Harmful

Several years back, as a young programmer out of school who thought he understood OOP inside and out, I remember a conversation with a colleague about having to take over somebody else’s code. My colleague was upset because the original programmer used so many small methods that it was hard to figure out what anything was doing. Isn’t it so much easier, he rationalized (and I agreed), to just use a few methods, and a few objects, and make it obvious what you’re doing?

Years later, my colleague having moved on, we’re left with a mess of a system in certain parts—and those big method parts are now the hardest to understand, maintain, and extend. With the accretion of features, fixes, and cruft, some of those methods have morphed to over 1000 lines of code, and appear impervious to refactoring due to our complete inability to understand what the hell the method actually does. It’s a tremendous counter-example to our earlier rationalizations.

I now recognize the “bigger is better” attitude as the mark of an immature object-oriented developer, somebody who hasn’t understood the real power of OO yet. Kent Beck make the point vividly with his Composed Method pattern in Smalltalk Best Practice Patterns. Large methods do indeed make it easier to follow the flow of control, but they do so at the expense of flexibility and composability.

Small methods allow you to isolate assumptions. Small methods allow you to say things once and only once, leading to code that is DRY and elegant. Small methods let you easily see the big picture without getting lost in the details (our earlier naive fallacy was wanting to see the details up front). Small methods help you discover new responsibility—feature envy stands out more. Small methods help you isolate rates of change, keeping responsibilities that have to change in every subclass tucked away in one set of methods, and those that don’t have to change in another set of methods. Small methods allow you to see everything at the same level of abstraction. Small methods make most comments unnecessary. Small methods make unit testing easier, since the units are smaller. Small methods aid in creating cohesive systems, where each method has only one reason to change. And small methods make performance tuning easier.

Yes, small methods can help performance. A lot of people, particularly those from the C or C++ world, seem to have trouble believing that. It’s true that methods have some overhead to maintain the stack, but for 99.999% of applications the overhead that incurs simply isn’t worth worrying about, and if it is worth worrying about then you’re probably not writing in an object-oriented language anyhow. Algorithmic improvements are several orders of magnitude more important than inlining method calls. And small methods, which isolate assumptions so well, make algorithmic improvements easier to spot. Want to use a memoization cache? You’ll likely have to affect only one method.

The C language has macros, which are textually substituted in a preprocessing step to simulate function calling without incurring the overhead. Consider the following quote:

There is a tendency among older C programmers to write macros instead of functions for very short computations that will be executed frequently… The reason is performance: a macro avoids the overhead of a function call. This argument was weak even when C was first defined, a time of slow machines and expensive function calls; today it is irrelevant. With modern machines and compilers, the drawbacks of function macros outweigh their benefits.

The author of that quote is Brian Kernighan (The Practice of Programming), who also happens to be the co-author of the first book on the C language.

Many people think (or are even trained) that the only reason to break apart a method is if you want to reuse a part of it. That line of thinking is indefensible. The most difficult part of programming is not maximizing reuse; it’s minimizing complexity. Reuse is just one of the tools we use to minimize complexity; writing clean code that communicates well is another.

Written by Brandon Byars

May 28, 2007 at 10:43 am

Posted in Design, Readability

Cohesion

Wherefore Design?

When I first started programming, I naturally assumed that programming was the hardest part of programming. What a fool I was.

There are a number of things that good developers do that, at first glance, appear to slow down how fast they write code. A great deal of time is spent communicating with users and other developers; tests are written; code that already appears to work is refactored. And a great deal of time is spent in the ivory tower world of design. Perhaps not up-front (I prefer the agile motto of designing all the time), but design is an activity that takes time. If all these tasks take so much time away from writing code (which, on the surface of things, seems to be the job of developers), then why bother?

It turns out that programming is the easiest part of programming. It’s the same mistake managers make when they revolt against their developers pairing up to solve a problem. Writing code is not about typing. It’s not even so much about writing code. It’s about designing a system to meet the customers’ needs, and it’s about ensuring that the system will continue to be able to meet the customers’ needs in the future.

Unfortunately, the first large system I wrote was with a team where everybody, myself included, thought that programming was about writing code. I got an enormous amount of code written in a very short amount of time. A year and a half later, the system still had not gone live, and I had completely lost confidence in my ability to make a change without breaking some essential functionality. This is a common phenomenon, which I can informally depict as a productivity curve:

I followed the red line for the system. I think most developers probably do. The problem is that initial surge of productivity. It’s intoxicating.

Avoiding the red curve is what separates good developers from everybody else.

Design is one of those extra things good developers do in an attempt to follow the black line in the graph above. You miss out on the initial addictive surge, but good developers recognize it as fools’ gold anyhow.

Remember Alice? It’s a song about Alice.

In his study of structural design, Larry Constantine identified cohesion as the central metric in creating good designs. The object-oriented evolution did little to change this fact—indeed, object-orientation is a natural result of trying to increase cohesion over procedural code. If you want good design, then you need to understand cohesion. A system without cohesion is a real Nugger-Tugger

Non-cohesive designs ramble, and they say the same thing over and over again, just in different places. They never really do what they’re supposed to. It’s like listening to Arlo Guthrie sing Alice’s Restaraunt. You really can get anything you want at Alice’s restaurant.

Uncle Bob helped restate cohesion for the OO world as the Single Responsibility Principle, which basically states that a class should have only one reason to change. Constantine’s definition included any module, so it’s fair to say the same thing about methods.

Cohesion is about minding your own business, and it is the design principle that explains why Feature Envy stinks. For example, imagine an order needing to calculate a subtotal. Here’s one way to do it:

public class Order
{
    // ...

    public Money Subtotal
    {
        get
        {
            Money subtotal = Money.Zero;
            foreach (LineItem item in items)
            {
                subtotal += item.Product.UnitPrice * item.Quantity;
            }
            return subtotal;
        }
    }
}

There are a couple of things wrong with this code. First the feature envy—the Subtotal property seems more interested in LineItem’s data than its own class’s data. Clearly, Order has more than one reason to change. It has to change if any of Order’s responsibilities change, and it just might have to change if LineItem’s responsibilities change.

Second, notice how the Subtotal property is actually reaching into the Product’s data as well—yet another reason to change. This is an example of violating the Law of Demeter. LoD purists like to spout out a very precise definition for Demeter. I prefer to think of it as the “two dot” rule. If I see two dots in a single expression, I probably need to rethink what I’m doing. I’ve now coupled myself both to the LineItem class (which is ok, since Order contains LineItem objects) as well as the Product class (which is unnecessary).

This is important: increasing cohesion reduces coupling. Constantine saw them as two sides of the same coin.

Let’s try again:

public class Order
{
    // ...

    public Money Subtotal
    {
        get
        {
            Money subtotal = Money.Zero;
            foreach (LineItem item in items)
            {
                subtotal += item.Subtotal;
            }
            return subtotal;
        }
    }
}

public class LineItem
{
    // ...

    public Money Subtotal
    {
        get { return Product.UnitPrice * Quantity; }
    }
}

Now Order is minding its own business. In programming, spreading ignorance is A Good Thing.

Different levels of cohesion

Rather than speaking of modules which either have or lack cohesion, Constantine identified different levels of cohesion. While I don’t consider it too important to know them by name, I do find it helpful to reflect on a few of my mistakes in reference to those concepts.

Logical cohesion

Sometimes, we tend to group functionality into a class simply because it naively seems to go together. In that first system I mentioned above, I wrote a class called Validation. As you may expect, Validation has many reponsibilities. in this case, it validated addresses, phone numbers, accounts, and a host of other unrelated things. Validation suffers from what Constantine called logical cohesion.

More common (and less damaging) examples of logical cohesion are found in languages like C# and Java, and are hard to get around due to language limitations. Consider the System.Math class in .NET. What’s its purpose? Well, it calculates arc-cosines and square roots and ceilings and exponents, as well as rounding and truncating. The same thing is true of the utility or helper classes that predominate in the mainstream world. Find yourself doing the same thing over and over again with text? Create a StringUtils class that does it for you.

Those methods really belong on the string and float classes, not tucked away in some utility classes. More powerful languages allow you to do this. Want to add a natural log method to Float? Here’s Ruby code that does just that:

class Float
  def ln
    # ... code that performs a natural logarithm
  end
end

Float already exists—it ships with Ruby. That’s ok; Ruby has no problem with you adding methods to classes that already exist. Rumor has it that the next version of C# may let you do the same thing. It will be interesting to gauge the mainstream response to this feature.

Temporal and Procedural cohesion

Constantine identified two levels of cohesion, temporal and procedural, that have to do with putting unrelated tasks together simply because they happen to occur at more or less the same time. The only difference between the two is that procedural cohesion requires some procedural relationship between the two elements. When I first started programming, I was extraordinarily guilty of creating unnecessary procedural cohesion, due to some delusional belief that minimizing the number of loops would improve performance.

As an example of procedural cohesion, let’s look at an alternative implementation of Order:

public class Order
{
    private Money subtotal;
    private Money tax;
    private Money shipping;
    private double taxPercent;
    private Money total;

    // ...

    public void CalculateTotals()
    {
        subtotal = Money.Zero;
        tax = Money.Zero;

        foreach (LineItem item in items)
        {
            subtotal += item.Subtotal;
            tax += item.Tax;
        }

        shipping = ShippingFor(subtotal);
        if (ChargeTaxOnShipping(state))
        {
            tax += shipping * taxPercent;
        }
        total = subtotal + shipping + tax;
    }

    public Money Subtotal
    {
        get { return subtotal; }
    }

    public Money Tax
    {
        get { return tax; }
    }

    public Money Shipping
    {
        get { return shipping; }
    }

    public Money Total
    {
        get { return total; }
    }
}

The code in CalculateTotals is very procedural in nature. What’s worse, the method does not have a single responsibility, and suffers from procedural cohesion as a result (notice how we’re now talking about the method’s cohesion, instead of the class’s—Constantine was writing before object-orientation was all the rave). Notice in particular the ugly dependence between tax and shipping. We’re calculating some of the tax in the loop along with the subtotal, but we may have to add more after the loop if the state charges tax on shipping.

Getting rid of the cohesion problem is easy:

public class Order
{
    // ...

    public Money Subtotal
    {
        get
        {
            Money subtotal = Money.Zero;
            foreach (LineItem item in items)
            {
                subtotal += item.Subtotal;
            }
            return subtotal;
        }
    }

    public Money Tax
    {
        get
        {
            Money tax = Money.Zero;
            foreach (LineItem item in items)
            {
                tax += item.Tax;
            }
            if (ChargeTaxOnShipping(state))
            {
                tax += Shipping * TaxPercent;
            }
            return tax;
        }
    }

    public Money Shipping
    {
        get { return ShippingFor(Subtotal); }
    }

    public Money Total
    {
        get { return Subtotal + Shipping + Tax; }
    }
}

Much nicer. Now each method has only one responsibility, and is easy to understand. However, it’s annoying to have to duplicate those loops everywhere. Some of the nicer languages have internal iterators or closures to help. C# 2.0 has similar constructs, but they’re a bit clumsy to use because of all the static typing noise:

public class Order
{
    // ...

    public Money Subtotal
    {
        get
        {
            return Sum(items, delegate(LineItem item)
                { return item.Subtotal; });
        }
    }

    public Money Tax
    {
        get
        {
            Money tax = Sum(items, delegate(LineItem item)
                { return item.Tax; });
            if (ChargeTaxOnShipping(state))
            {
                tax += Shipping * TaxPercent;
            }
            return tax;
        }
    }

    public Money PaymentTotal
    {
        get
        {
            return Sum(payments, delegate(Payment payment)
                { return payment.Amount; });
        }
    }

    private delegate Money MonetaryPropertyDelegate<T>(T item);

    private Money Sum<T>(ICollection<T> collection,
        MonetaryPropertyDelegate<T> moneyGetter)
    {
        Money result = Money.Zero;
        foreach (T item in collection)
        {
            result += moneyGetter(item);
        }
        return result;
    }
}

Written by Brandon Byars

February 21, 2007 at 3:32 pm

Posted in Design

Tagged with

Nugger Tugger Designs

Does your design ever look like this?

You’ll notice that, while neither humans, sharks, elephants, birds, bears, beavers, zebras, wasps, cats, or hyenas are amphibians, Nugger-Tugger himself is an amphibian. Really quite incredible when you think about it.

What works for art doesn’t work for design. I really like Nugger-Tugger, because I’ve seen some designs that remind me of a kid just trying to get a little bit of everything. And “Nugger-Tugger” has the added advantage of sounding a lot like “Mother-Tugger,” which is roughly what I hear developers say when they have to work in such systems.

Written by Brandon Byars

February 14, 2007 at 11:12 pm

Posted in Design