A Day In The Lyf

…the lyf so short, the craft so longe to lerne

Archive for the ‘Readability’ Category

Big Methods Considered Harmful

Several years back, as a young programmer out of school who thought he understood OOP inside and out, I remember a conversation with a colleague about having to take over somebody else’s code. My colleague was upset because the original programmer used so many small methods that it was hard to figure out what anything was doing. Isn’t it so much easier, he rationalized (and I agreed), to just use a few methods, and a few objects, and make it obvious what you’re doing?

Years later, my colleague having moved on, we’re left with a mess of a system in certain parts—and those big method parts are now the hardest to understand, maintain, and extend. With the accretion of features, fixes, and cruft, some of those methods have morphed to over 1000 lines of code, and appear impervious to refactoring due to our complete inability to understand what the hell the method actually does. It’s a tremendous counter-example to our earlier rationalizations.

I now recognize the “bigger is better” attitude as the mark of an immature object-oriented developer, somebody who hasn’t understood the real power of OO yet. Kent Beck make the point vividly with his Composed Method pattern in Smalltalk Best Practice Patterns. Large methods do indeed make it easier to follow the flow of control, but they do so at the expense of flexibility and composability.

Small methods allow you to isolate assumptions. Small methods allow you to say things once and only once, leading to code that is DRY and elegant. Small methods let you easily see the big picture without getting lost in the details (our earlier naive fallacy was wanting to see the details up front). Small methods help you discover new responsibility—feature envy stands out more. Small methods help you isolate rates of change, keeping responsibilities that have to change in every subclass tucked away in one set of methods, and those that don’t have to change in another set of methods. Small methods allow you to see everything at the same level of abstraction. Small methods make most comments unnecessary. Small methods make unit testing easier, since the units are smaller. Small methods aid in creating cohesive systems, where each method has only one reason to change. And small methods make performance tuning easier.

Yes, small methods can help performance. A lot of people, particularly those from the C or C++ world, seem to have trouble believing that. It’s true that methods have some overhead to maintain the stack, but for 99.999% of applications the overhead that incurs simply isn’t worth worrying about, and if it is worth worrying about then you’re probably not writing in an object-oriented language anyhow. Algorithmic improvements are several orders of magnitude more important than inlining method calls. And small methods, which isolate assumptions so well, make algorithmic improvements easier to spot. Want to use a memoization cache? You’ll likely have to affect only one method.

The C language has macros, which are textually substituted in a preprocessing step to simulate function calling without incurring the overhead. Consider the following quote:

There is a tendency among older C programmers to write macros instead of functions for very short computations that will be executed frequently… The reason is performance: a macro avoids the overhead of a function call. This argument was weak even when C was first defined, a time of slow machines and expensive function calls; today it is irrelevant. With modern machines and compilers, the drawbacks of function macros outweigh their benefits.

The author of that quote is Brian Kernighan (The Practice of Programming), who also happens to be the co-author of the first book on the C language.

Many people think (or are even trained) that the only reason to break apart a method is if you want to reuse a part of it. That line of thinking is indefensible. The most difficult part of programming is not maximizing reuse; it’s minimizing complexity. Reuse is just one of the tools we use to minimize complexity; writing clean code that communicates well is another.

Advertisements

Written by Brandon Byars

May 28, 2007 at 10:43 am

Posted in Design, Readability

Heuristics for Code Readability

Code readability is hard to define. There are some things that everybody can agree on, like how Hungarian notation was a sin for which Microsoft will die a thousand deaths. Certainly, the biggest gains can be made by following simple rules, like using intention revealing names, and fluent use of Extract Method refactorings to further clarify intent. However, how do you decide between things like abbreviations and commonly used prefixes like underscores to denote private variables? Certainly, the readability gains are smaller, but perhaps they’re still worth discussing.

While this is a work in progress, I’m proposing the following three heuristics:

  1. In general, code is more readable if you can read it more easily than the alternative. (duh!)
  2. In general, code is more readable if your fellow developers can read it more easily than the alternative (duh!)
  3. In general, code is more readable if your non-developer customer can read it easier than the alternative (huh?)

All three heuristics have exceptions. For example,

  1. While leaving comments and variable names in your native language is easiest for you to comprehend, none of your team members actually speak your native tongue.
  2. Your team is developing an API for clients that will have a different knowledge base. Using team conventions for the public interface may not be appropriate.
  3. Non-developers would find English sentences more readable than code.

However, I think finding the right balance between those three heuristics is a worthy goal. The first two go without saying—you and your team will be the maintainers of the code. The third one is a bit more controversial, but I think it’s defensible. To an extent, it’s what Eric Evans was talking about in Domain Driven Design when he mentioned the Ubiquitous Language. As much as possible, use the same words in your code that you use in communication with your customers.

But what about abbreviations and those ubiquitous underscore prefixes? I have a preference, but it’s not strong enough to be dogmatic about it. I think, in general, they decrease readability compared to the alternatives for the third heuristic above. Some developers may argue that, while customers may suffer reading the code a little, developers suffer by not using those underscores. That’s fine—heuristic 2 trumps heuristic 3. But in the absence of such a complaint, I think the default should be no underscores, and sparse use of abbreviations.

I prefer wording the third heuristic in the strong form given above, but in a weaker form it could also apply to programmers unfamiliar with a language. For example, a common Ruby idiom looks something like this:

$:.unshift(File.dirname(__FILE__))

The meaning behind the line above is often a necessary evil in Ruby, and it’s frequently written that way. I prefer a bit more verbose syntax, however. The following is equivalent:

$LOAD_PATH.unshift(File.dirname(__FILE__))

While you still may not know exactly what that line of code does, I suspect one of them is easier to read than the other.

Written by Brandon Byars

February 13, 2007 at 8:24 pm

Posted in Readability

Code Haiku

Code that you can read 
Reads like a storybook plot
Not disconnected

I like to think writing code is a lot like writing a haiku. Haikus have a very rigid structure, and anyone reading one can immediately feel the effect of the structural confines on the poem’s flow. Similarly, code has a strict set of syntactical rules it has to follow that are apparent to even a casual bystander.

However, within the confines of the language, there are still ways to communicate effectively. I know my wife is kidding when she tells me that my job consists of randomly stringing punctuation marks together, and I’ve thought about using her as a test of how readable my code is. She knows (everybody does) that code has to follow rules, and she will not know the rules. But will she be able to at least have an idea of what the code is doing? Will my customers, were I to show them?

As Martin Fowler has noted, it’s actually fairly easy to write code that a computer can understand – the trick is to write code that people can understand. I think showing code to a non-developer is one excellent way of improving in that regard. Show her your code and other code, using different readability conventions, to see which is better. Certain tricks, like using a Ubiquitous Language (see Evans, Domain Driven Design) should help—if you can communicate in your code using the same domain concepts that the end users refer to, then you’re doing well.

But what about abbreviations? I was pairing with a colleague today, and the thing that stood out was that, but for one exception, it was fairly impossible to tell which one of us wrote the code. The one exception was with our variable names. I invariably spelled them out, so I had a DataRow named row, a DataTable named table, and a domain-specific type MercedesMonthlySummary named month. For the same types but in different methods, my colleague had dr, dt, and mms.

I’ve seen code written by far better developers than me that used abbreviations where I would have preferred an English word, so it’s hard to discredit them. Developers get used to it, and certain abbreviations (like dt for a DataTable) become commonplace. And still, I can’t help but think that we’re losing our expressiveness when we use them.

If you don’t believe me, show your code to your wife, and see what she says.

Code that you must read 
dr dt mms
Someone understands?

Written by Brandon Byars

February 11, 2007 at 5:18 pm

Posted in Readability