A Day In The Lyf

…the lyf so short, the craft so longe to lerne

Archive for February 2007

Cohesion

Wherefore Design?

When I first started programming, I naturally assumed that programming was the hardest part of programming. What a fool I was.

There are a number of things that good developers do that, at first glance, appear to slow down how fast they write code. A great deal of time is spent communicating with users and other developers; tests are written; code that already appears to work is refactored. And a great deal of time is spent in the ivory tower world of design. Perhaps not up-front (I prefer the agile motto of designing all the time), but design is an activity that takes time. If all these tasks take so much time away from writing code (which, on the surface of things, seems to be the job of developers), then why bother?

It turns out that programming is the easiest part of programming. It’s the same mistake managers make when they revolt against their developers pairing up to solve a problem. Writing code is not about typing. It’s not even so much about writing code. It’s about designing a system to meet the customers’ needs, and it’s about ensuring that the system will continue to be able to meet the customers’ needs in the future.

Unfortunately, the first large system I wrote was with a team where everybody, myself included, thought that programming was about writing code. I got an enormous amount of code written in a very short amount of time. A year and a half later, the system still had not gone live, and I had completely lost confidence in my ability to make a change without breaking some essential functionality. This is a common phenomenon, which I can informally depict as a productivity curve:

I followed the red line for the system. I think most developers probably do. The problem is that initial surge of productivity. It’s intoxicating.

Avoiding the red curve is what separates good developers from everybody else.

Design is one of those extra things good developers do in an attempt to follow the black line in the graph above. You miss out on the initial addictive surge, but good developers recognize it as fools’ gold anyhow.

Remember Alice? It’s a song about Alice.

In his study of structural design, Larry Constantine identified cohesion as the central metric in creating good designs. The object-oriented evolution did little to change this fact—indeed, object-orientation is a natural result of trying to increase cohesion over procedural code. If you want good design, then you need to understand cohesion. A system without cohesion is a real Nugger-Tugger

Non-cohesive designs ramble, and they say the same thing over and over again, just in different places. They never really do what they’re supposed to. It’s like listening to Arlo Guthrie sing Alice’s Restaraunt. You really can get anything you want at Alice’s restaurant.

Uncle Bob helped restate cohesion for the OO world as the Single Responsibility Principle, which basically states that a class should have only one reason to change. Constantine’s definition included any module, so it’s fair to say the same thing about methods.

Cohesion is about minding your own business, and it is the design principle that explains why Feature Envy stinks. For example, imagine an order needing to calculate a subtotal. Here’s one way to do it:

public class Order
{
    // ...

    public Money Subtotal
    {
        get
        {
            Money subtotal = Money.Zero;
            foreach (LineItem item in items)
            {
                subtotal += item.Product.UnitPrice * item.Quantity;
            }
            return subtotal;
        }
    }
}

There are a couple of things wrong with this code. First the feature envy—the Subtotal property seems more interested in LineItem’s data than its own class’s data. Clearly, Order has more than one reason to change. It has to change if any of Order’s responsibilities change, and it just might have to change if LineItem’s responsibilities change.

Second, notice how the Subtotal property is actually reaching into the Product’s data as well—yet another reason to change. This is an example of violating the Law of Demeter. LoD purists like to spout out a very precise definition for Demeter. I prefer to think of it as the “two dot” rule. If I see two dots in a single expression, I probably need to rethink what I’m doing. I’ve now coupled myself both to the LineItem class (which is ok, since Order contains LineItem objects) as well as the Product class (which is unnecessary).

This is important: increasing cohesion reduces coupling. Constantine saw them as two sides of the same coin.

Let’s try again:

public class Order
{
    // ...

    public Money Subtotal
    {
        get
        {
            Money subtotal = Money.Zero;
            foreach (LineItem item in items)
            {
                subtotal += item.Subtotal;
            }
            return subtotal;
        }
    }
}

public class LineItem
{
    // ...

    public Money Subtotal
    {
        get { return Product.UnitPrice * Quantity; }
    }
}

Now Order is minding its own business. In programming, spreading ignorance is A Good Thing.

Different levels of cohesion

Rather than speaking of modules which either have or lack cohesion, Constantine identified different levels of cohesion. While I don’t consider it too important to know them by name, I do find it helpful to reflect on a few of my mistakes in reference to those concepts.

Logical cohesion

Sometimes, we tend to group functionality into a class simply because it naively seems to go together. In that first system I mentioned above, I wrote a class called Validation. As you may expect, Validation has many reponsibilities. in this case, it validated addresses, phone numbers, accounts, and a host of other unrelated things. Validation suffers from what Constantine called logical cohesion.

More common (and less damaging) examples of logical cohesion are found in languages like C# and Java, and are hard to get around due to language limitations. Consider the System.Math class in .NET. What’s its purpose? Well, it calculates arc-cosines and square roots and ceilings and exponents, as well as rounding and truncating. The same thing is true of the utility or helper classes that predominate in the mainstream world. Find yourself doing the same thing over and over again with text? Create a StringUtils class that does it for you.

Those methods really belong on the string and float classes, not tucked away in some utility classes. More powerful languages allow you to do this. Want to add a natural log method to Float? Here’s Ruby code that does just that:

class Float
  def ln
    # ... code that performs a natural logarithm
  end
end

Float already exists—it ships with Ruby. That’s ok; Ruby has no problem with you adding methods to classes that already exist. Rumor has it that the next version of C# may let you do the same thing. It will be interesting to gauge the mainstream response to this feature.

Temporal and Procedural cohesion

Constantine identified two levels of cohesion, temporal and procedural, that have to do with putting unrelated tasks together simply because they happen to occur at more or less the same time. The only difference between the two is that procedural cohesion requires some procedural relationship between the two elements. When I first started programming, I was extraordinarily guilty of creating unnecessary procedural cohesion, due to some delusional belief that minimizing the number of loops would improve performance.

As an example of procedural cohesion, let’s look at an alternative implementation of Order:

public class Order
{
    private Money subtotal;
    private Money tax;
    private Money shipping;
    private double taxPercent;
    private Money total;

    // ...

    public void CalculateTotals()
    {
        subtotal = Money.Zero;
        tax = Money.Zero;

        foreach (LineItem item in items)
        {
            subtotal += item.Subtotal;
            tax += item.Tax;
        }

        shipping = ShippingFor(subtotal);
        if (ChargeTaxOnShipping(state))
        {
            tax += shipping * taxPercent;
        }
        total = subtotal + shipping + tax;
    }

    public Money Subtotal
    {
        get { return subtotal; }
    }

    public Money Tax
    {
        get { return tax; }
    }

    public Money Shipping
    {
        get { return shipping; }
    }

    public Money Total
    {
        get { return total; }
    }
}

The code in CalculateTotals is very procedural in nature. What’s worse, the method does not have a single responsibility, and suffers from procedural cohesion as a result (notice how we’re now talking about the method’s cohesion, instead of the class’s—Constantine was writing before object-orientation was all the rave). Notice in particular the ugly dependence between tax and shipping. We’re calculating some of the tax in the loop along with the subtotal, but we may have to add more after the loop if the state charges tax on shipping.

Getting rid of the cohesion problem is easy:

public class Order
{
    // ...

    public Money Subtotal
    {
        get
        {
            Money subtotal = Money.Zero;
            foreach (LineItem item in items)
            {
                subtotal += item.Subtotal;
            }
            return subtotal;
        }
    }

    public Money Tax
    {
        get
        {
            Money tax = Money.Zero;
            foreach (LineItem item in items)
            {
                tax += item.Tax;
            }
            if (ChargeTaxOnShipping(state))
            {
                tax += Shipping * TaxPercent;
            }
            return tax;
        }
    }

    public Money Shipping
    {
        get { return ShippingFor(Subtotal); }
    }

    public Money Total
    {
        get { return Subtotal + Shipping + Tax; }
    }
}

Much nicer. Now each method has only one responsibility, and is easy to understand. However, it’s annoying to have to duplicate those loops everywhere. Some of the nicer languages have internal iterators or closures to help. C# 2.0 has similar constructs, but they’re a bit clumsy to use because of all the static typing noise:

public class Order
{
    // ...

    public Money Subtotal
    {
        get
        {
            return Sum(items, delegate(LineItem item)
                { return item.Subtotal; });
        }
    }

    public Money Tax
    {
        get
        {
            Money tax = Sum(items, delegate(LineItem item)
                { return item.Tax; });
            if (ChargeTaxOnShipping(state))
            {
                tax += Shipping * TaxPercent;
            }
            return tax;
        }
    }

    public Money PaymentTotal
    {
        get
        {
            return Sum(payments, delegate(Payment payment)
                { return payment.Amount; });
        }
    }

    private delegate Money MonetaryPropertyDelegate<T>(T item);

    private Money Sum<T>(ICollection<T> collection,
        MonetaryPropertyDelegate<T> moneyGetter)
    {
        Money result = Money.Zero;
        foreach (T item in collection)
        {
            result += moneyGetter(item);
        }
        return result;
    }
}

Written by Brandon Byars

February 21, 2007 at 3:32 pm

Posted in Design

Tagged with

Nugger Tugger Designs

Does your design ever look like this?

You’ll notice that, while neither humans, sharks, elephants, birds, bears, beavers, zebras, wasps, cats, or hyenas are amphibians, Nugger-Tugger himself is an amphibian. Really quite incredible when you think about it.

What works for art doesn’t work for design. I really like Nugger-Tugger, because I’ve seen some designs that remind me of a kid just trying to get a little bit of everything. And “Nugger-Tugger” has the added advantage of sounding a lot like “Mother-Tugger,” which is roughly what I hear developers say when they have to work in such systems.

Written by Brandon Byars

February 14, 2007 at 11:12 pm

Posted in Design

Heuristics for Code Readability

Code readability is hard to define. There are some things that everybody can agree on, like how Hungarian notation was a sin for which Microsoft will die a thousand deaths. Certainly, the biggest gains can be made by following simple rules, like using intention revealing names, and fluent use of Extract Method refactorings to further clarify intent. However, how do you decide between things like abbreviations and commonly used prefixes like underscores to denote private variables? Certainly, the readability gains are smaller, but perhaps they’re still worth discussing.

While this is a work in progress, I’m proposing the following three heuristics:

  1. In general, code is more readable if you can read it more easily than the alternative. (duh!)
  2. In general, code is more readable if your fellow developers can read it more easily than the alternative (duh!)
  3. In general, code is more readable if your non-developer customer can read it easier than the alternative (huh?)

All three heuristics have exceptions. For example,

  1. While leaving comments and variable names in your native language is easiest for you to comprehend, none of your team members actually speak your native tongue.
  2. Your team is developing an API for clients that will have a different knowledge base. Using team conventions for the public interface may not be appropriate.
  3. Non-developers would find English sentences more readable than code.

However, I think finding the right balance between those three heuristics is a worthy goal. The first two go without saying—you and your team will be the maintainers of the code. The third one is a bit more controversial, but I think it’s defensible. To an extent, it’s what Eric Evans was talking about in Domain Driven Design when he mentioned the Ubiquitous Language. As much as possible, use the same words in your code that you use in communication with your customers.

But what about abbreviations and those ubiquitous underscore prefixes? I have a preference, but it’s not strong enough to be dogmatic about it. I think, in general, they decrease readability compared to the alternatives for the third heuristic above. Some developers may argue that, while customers may suffer reading the code a little, developers suffer by not using those underscores. That’s fine—heuristic 2 trumps heuristic 3. But in the absence of such a complaint, I think the default should be no underscores, and sparse use of abbreviations.

I prefer wording the third heuristic in the strong form given above, but in a weaker form it could also apply to programmers unfamiliar with a language. For example, a common Ruby idiom looks something like this:

$:.unshift(File.dirname(__FILE__))

The meaning behind the line above is often a necessary evil in Ruby, and it’s frequently written that way. I prefer a bit more verbose syntax, however. The following is equivalent:

$LOAD_PATH.unshift(File.dirname(__FILE__))

While you still may not know exactly what that line of code does, I suspect one of them is easier to read than the other.

Written by Brandon Byars

February 13, 2007 at 8:24 pm

Posted in Readability

Code Haiku

Code that you can read 
Reads like a storybook plot
Not disconnected

I like to think writing code is a lot like writing a haiku. Haikus have a very rigid structure, and anyone reading one can immediately feel the effect of the structural confines on the poem’s flow. Similarly, code has a strict set of syntactical rules it has to follow that are apparent to even a casual bystander.

However, within the confines of the language, there are still ways to communicate effectively. I know my wife is kidding when she tells me that my job consists of randomly stringing punctuation marks together, and I’ve thought about using her as a test of how readable my code is. She knows (everybody does) that code has to follow rules, and she will not know the rules. But will she be able to at least have an idea of what the code is doing? Will my customers, were I to show them?

As Martin Fowler has noted, it’s actually fairly easy to write code that a computer can understand – the trick is to write code that people can understand. I think showing code to a non-developer is one excellent way of improving in that regard. Show her your code and other code, using different readability conventions, to see which is better. Certain tricks, like using a Ubiquitous Language (see Evans, Domain Driven Design) should help—if you can communicate in your code using the same domain concepts that the end users refer to, then you’re doing well.

But what about abbreviations? I was pairing with a colleague today, and the thing that stood out was that, but for one exception, it was fairly impossible to tell which one of us wrote the code. The one exception was with our variable names. I invariably spelled them out, so I had a DataRow named row, a DataTable named table, and a domain-specific type MercedesMonthlySummary named month. For the same types but in different methods, my colleague had dr, dt, and mms.

I’ve seen code written by far better developers than me that used abbreviations where I would have preferred an English word, so it’s hard to discredit them. Developers get used to it, and certain abbreviations (like dt for a DataTable) become commonplace. And still, I can’t help but think that we’re losing our expressiveness when we use them.

If you don’t believe me, show your code to your wife, and see what she says.

Code that you must read 
dr dt mms
Someone understands?

Written by Brandon Byars

February 11, 2007 at 5:18 pm

Posted in Readability

Follow

Get every new post delivered to your Inbox.