A Day In The Lyf

…the lyf so short, the craft so longe to lerne

Archive for February 2008

True Multiple Dispatch

When I first started learning about Design Patterns some years ago, I found myself having to revisit the Visitor pattern over and over again. Something about it just didn’t click with me.

Several years later, I now have experience using the Visitor pattern in a few applications. But the idea of multiple dispatch still seems a bit foreign to me. The authors of Design Patterns claimed that Visitors allowed you to imitate multiple dispatch in languages that didn’t natively support it. That was a rather mind-blowing statement to me – how can a language natively support something as strange as multiple dispatch?

Lately, I’ve been learning a little bit about Lisp, and Common Lisp (CL) does natively support multiple dispatch. It turns out that understanding CL’s approach to object-orientation is useful for more than just a better understanding of the Visitor pattern.

The Fundamental Unit of Behavior

Most programmers in mainstream OO languages will tell you that the class is the fundamental unit of behavior. There are subtleties – for example, in Smalltalk and Ruby, classes are actually objects, which arguably makes the object the fundamental unit, but both of those languages still use classes to organize behavior. Languages like JavaScript don’t use classes at all, but achieve similar results through adding behavior to an object’s prototype.

Common Lisp is class-based, but in CL, the class simply holds the data, like a C struct. When it comes to behavior, the method reigns supreme.

Putting methods into classes, which would have made CL look more like other object-oriented languages, never worked out very well. The message passing paradigm popularized by Smalltalk made methods and traditional Lisp functions semantically different, meaning much of the power that stems from Lisp’s functional roots was lost. So the authors of the Common Lisp object system (CLOS) unified methods with functions, giving birth to the generic function.

Generic functions are like abstract methods in other languages. They define an interface for behavior, but defer the implementation of that behavior. But – and this is an awfully big but – generic functions are not attached to any class.

(defgeneric area (shape)
   (:documentation "Returns the area of the given shape."))

In typical functional style, the area function takes the object as a parameter, instead of sending the object an area message as you would in traditional OO languages. So far, this isn’t too different from Python, which requires you to pass self to all methods.

The actual implementation of a generic function is defined by methods. But methods don’t belong to classes; they belong to the generic function. Each method for a generic function is a specialization of that function for specific classes. For example:

(defmethod area ((shape square))
   (* (width shape) (width shape)))

(defmethod area ((shape circle))
   (* (radius shape) (radius shape) *pi*))

In a traditional OO language, you would have the Shape class define an abstract Area method, which was implemented in both the Square and Circle subclasses. In contrast, when the area generic function is called in CL, it compares the actual argument to the formal arguments in each of its applicable methods. If it finds a matching method, then its code is executed (and, although its outside the scope of this article, if it finds multiple matching methods, they are all called according to certain combination rules).

Multimethods

The area example above is simple because it takes only one argument (equivalent to taking no arguments in message passing languages (except Python (of course!))).

It is obviously possible to have generic methods that operate on multiple arguments. And – and this is an awfully big and – methods can specialize on each of those arguments, not just the first one. Methods that specialize on multiple parameters are called multimethods (the following silly example was adapted from here):

(defgeneric test (x y)
   (:documentation "Returns the type of the two arguments))

(defmethod test ((x number) (y number))
   '(num num))

(defmethod test ((i integer) (y number))
   '(int num))

(defmethod test ((x number) (j integer))
   '(num int))

(defmethod test ((i integer) (j integer))
   '(int int))

(test 1 1)      => (int int)
(test 1 1/2)    => (int num)
(test 1/2 1)    => (num int)
(test 1/2 1/2)  => (num num)

Specializing on more than one parameter doesn’t help with the combinatorial explosion, but it is occasionally useful And it’s not something you can do in a message passing language natively, which is why you need support from design patterns. The great thing about native support for multiple dispatch is that you can do it without any of the plumbing code that design patterns require (no confusing Visit and Accept methods). There is a common complaint against design patterns that says they exist simply to cover up language limitations. At least in the case of the poor Visitor, that certainly seems to be the case.

Written by Brandon Byars

February 14, 2008 at 10:57 am

Posted in Design Patterns, Languages, Lisp

Tagged with

Understanding Syntactic Macros

Paul Graham finally persuaded me to pick up a little Lisp. I’d already learned some Scheme, but the books I read were mainly academic in nature (here and here), and didn’t show off the power that Paul keeps talking about. So I switched over to Practical Common Lisp hoping that would give me a better view.

It’s very obvious that a number of programming concepts that have been added in various languages to “revolutionize” the language landscape are nothing more than rediscoveries of Lisp concepts. But the One Big Thing that Lisp has and no other language does is its macro system. I don’t think I’d being doing Lisp a disservice by saying that macros are at the heart of why Lisp programmers claim their language is the most powerful one on the planet.

Lisp macros take some getting used to. I found it useful to compare them to related concepts from other languages.

Preprocessor Substitution

It is perhaps unfortunate that the word “macro” is used to describe superficially similar concepts in both C and Lisp. In actuality, the two concepts are as different as night and day.

C’s macro system is nothing more than a preprocessor textual substitution. It really was never intended as a tool for abstraction, although some rather miserable attempts were made to treat them as such (remember MFC message maps?). Rather it was added as a performance hack, like a low-level inline function. There are some good uses of C macros, but in general they’re overused.

Let’s pretend, though, that we wish to go ahead and abstract looping into a C macro. The idiomatic C loop looks something like this:

int i;
for (i=0; i < n; i++) {
    /* loop body */
}

There’s an awful lot of duplication in doing that again and again, so let’s try and move it into a macro:

/* The \ character at the end of a line is a continuation character */
#define LOOP(n) \
    int i; \
    for (i=0; i < (n); i++)

Now the looping code might look something like this:

LOOP(n) {
    /* loop body */
}

The obvious problem with this macro is that you can only use it once in any given lexical scope, and only if you haven’t already declared the variable i. C macros don’t define their own scope as a function would, which is one large reason for they struggle to act as good abstractions.

But, to continue the fantasy, let’s pretend that LOOP works just fine for us. So good, in fact, that we use it often enough to notice the following recurring pattern:

LOOP(10) {
    printf("%d", i);
}

Beautiful code, without question, but because we see it so often, we’d like to abstract it out further. Something like this:

#define LOOP10 \
    LOOP(10) { \
        printf("%d", i); \
    }

Except, of course, that doesn’t work. The preprocessor only makes one pass through the file, meaning a macro can’t call another macro.

Code Generation

So the C macro system is a simple code generation scheme. We could, in theory, expand the code generation capability to allow multiple passes through a file, which would allow one C-like-macro to expand into another C-like-macro, which is then expanded into real code in a subsequent pass.

However, without some static analysis, we still won’t be able to generate unique variable names within any given lexical scope. Actually, even with static analysis, we won’t be able to do so. The problem is that any run-time code generation, like using Reflection.Emit in C#, or instance_variable_set in Ruby, thwarts the static analysis.

To make matters worse, a code generator sits outside the surrounding code. To make it work, you have to write the instructions in a pseudo-language, different from the surrounding language. C macros, for example, have rules different from normal C code. It’s common to uppercase them, which helps make it obvious that this clip of code you’re looking at isn’t standard C code.

Metaprogramming

Reflection in powerful languages like Ruby can give you some of the same benefits of macros, but they are very different concepts. Metaprogramming is a run-time concept. Macros are like metaprogramming at compile time (or interpretation-time—Lisp lets you decide if you want to compile the code first or not). What’s the difference?

Run-time metaprogramming uses dynamic data types to remove duplication. The ActiveRecord class in Ruby on Rails is a wonderful example. Unlike traditional object-relational mappers, you don’t have to specify the mapping from columns to properties using ActiveRecord. In fact, you don’t even have to specify the properties at all!

At run-time, an ActiveRecord class queries the database table whose name matches the name of the class, according to a well-known (and configurable) naming scheme. It then automatically adds property getters and setters for each column in the table. So you could have the following class definition:

class Order < ActiveRecord::Base
end

And you could use it like this:

order = Order.find(1)
puts order.id
puts order.ship_to_name

Notice that we never had to define id or ship_to_name. Rails used metaprogramming to define them for us. Some languages, like Ruby and Smalltalk, also let you intercept messages sent to objects for which no method could be found to bind to. Rails uses this feature to also give the appearance of adding special find methods:

Order.find_by_ship_to_name
Order.find_by_city_and_state

Metaprogramming is essentially code-generation at run-time, which provides extreme flexibility at the cost of some performance that static code-generation might be able to provide. But – and here’s what separates metaprogramming from syntactic macros – metaprogramming magic is bound by the rules of the language. You cannot change the language at run-time.

Syntactic Macros

So how do Lisp macros differ from static code-generation or metaprogramming?

To answer that question, we first need to understand something about Lisp source code. Lisp is one of the few languages that is truly homoiconic, or self-representing. Take a look at the following Lisp lists:

(2 3)
(> 2 3)
(if (> 2 3) True False)

The first list is just data – a collection of atoms. The second list, if interpreted, returns T (true) or NIL (false), depending on the result of calling > with the given arguments. The third list returns the atom True (not the same as T) or False, depending on the answer to the predicate.

But here’s the mind-stretching part: all three of these lists are just data. if and > are nothing more than atoms in a list. Lisp just has special processing rules that say, if the first atom in a list binds to a function name, apply that function with the rest of the list. The fact that Lisp has those processing rules doesn’t change the fact that its code is nothing more than a list of data.

Let’s look at a more involved if statement:

(if (hungry Spot)
    (progn
      (bark Spot)
      (wag-tail Spot)))

If Spot is hungry, he barks and wags his tail. Typically, an if statement allows only one clause if the predicate is true (and, optionally, one if it’s false). progn is a special Lisp form that lets you group multiple clauses together. It’s like the curly braces in C:

if (hungry(spot)) {
    bark(spot);
    wagTail(spot);
}

Without those braces, wagTail would happen unconditionally. Notice that Lisp provides the same functionality without any special syntax. progn is just another Lisp atom in a list, which, like any atom, is interpreted as a function (or something like a function) if it happens to be the first element of a list. Also notice that the if expression shown above is the source code for a full program – and that source code is just a list of atoms!

Maybe I shouldn’t get too carried away with that. In his book, Peter Seibel says too many people get so distracted by Lisp’s list-heritage that they miss some of the more practical aspects of the language, and in fact, lists probably aren’t used much in day-to-day usage. But the fact that there is no representational difference between Lisp data and Lisp source code means that a Lisp program can interpret its source code like any other data. Understanding that single point is the key to understanding macros.

Seibel introduces macros with a silly example, but one that’s useful for showing the code-as-data mentality. Here’s hello, world in Lisp:

(format t "hello, world")

The t atom (like true in C# and Java) tells format to print to standard output. Now imagine that we wrote the following macro:

(defmacro backwards (expr) (reverse expr))

backwards is the name of the macro, which takes an expression (represented as a list), and reverses it. Here’s hello, world again, this time using the macro:

(backwards ("hello, world" t format))

When the Lisp compiler sees that line of code, it looks at the first atom in the list (backwards), and notices that it names a macro. It passes the unevaluated list ("hello, world" t format) to the macro, which rearranges the list to (format t "hello, world"). The resulting list replaces the macro expression, and it is what will be evaluated at run-time. The Lisp environment will see that its first atom (format) is a function, and evaluate it, passing it the rest of the arguments.

So Lisp macros provide compile-time code generation, but in doing so, you have access not just to the code-generation instructions – which are just Lisp data – but also to the entire Lisp language and environment. And the resulting code doesn’t suffer a run-time performance penalty—the call to reverse above happens at compile time; the run-time environment sees only the expanded macro.

This really opens your eyes to new ways of removing duplication. Take the (if ... (progn ...)) example listed above. While progn is nice in that it gives you functionality similar to C’s braces without any special syntax, it can get a bit bulky to have to type it all the time. Common Lisp provides a when macro to make it easier; you’d use it like this:

(when (hungry Spot)
   (bark Spot)
   (wag-tail Spot))

But when doesn’t have to be baked into the language. Given if and progn, we could compile-time generate it with the following macro:

(defmacro when (condition &rest body)
   `(if ,condition (progn ,@body)))

OK, there’s some funky Lisp syntax in there (more than I want to go into to), but underneath the covers, it’s still just lists and atoms. The &rest atom gathers up any remaining parameters (after the condition parameter has been satisfied), and puts them into a single list which you can access through the variable body. The if list is quoted because we don’t want the macro to evaluate the if expression – remember it will try to if the first atom matches a function (or special form, or macro). Instead, we want to return the unevaluated list that will be evaluated later, at run-time. The comma and comma-at-sign are ways of splicing the parameters into the list, similar to string interpolation in Ruby.

So, in our when example with Spot the dog, (hungry Spot) will be the condition parameter, and ((bark Spot) (wag-tail Spot)) will be the body parameter. After when evaluates its arguments, it will return a list (source code) that looks just like the (if...(progn...)) that we wrote by hand.

Lisp also has a handy way of dealing with the variable naming problem we saw with our C macro, where we hard-coded the variable name i. You can use the function gensym to create a variable name that is guaranteed to be unique within its lexical scope. You can assign the result of gensym to another variable, and use that other variable in your macro. It will be replaced with the actual variable name when the macro expands. (Nifty, no?)

Finally, Lisp solves the multiple-pass code-generation problem. Macros can expand into macros, which can themselves expand into macros, for as long as you want. In fact, macros are so integral to the language, that this seems to be pretty common. Macros can call other macros or other functions. Some macros can become some large and complex that you might have an entire package dedicated to supporting functions for it.

The great thing about macros is that they let you actually change the language (to an extent), something that metaprogramming cannot accomplish. Let’s say you don’t care for Lisp’s clunky looping mechanisms. You’ve seen list comprehensions in Python or Erlang and want to bring some of that elegance to Lisp.

Too late – somebody else already did. The loop macro is incredibly powerful and complex. Some old-school Lispers hate it, because it really does change the language, which, for Lispers, is a way of saying it’s not using as many parentheses as it should. The following example is taken from Practical Common Lisp:

(loop for i in *random*
   counting (evenp i) into evens
   counting (oddp i) into odds
   summing i into total
   maximizing i into max
   minimizing i into min
   finally (return (list min max total evens odds)))

The result of that expression is a list containing the minimum, maximum, and sum of the numbers in the *random* global variable, as well as a list containing all even numbers and a list containing all odd numbers.

Try doing that in C.

Written by Brandon Byars

February 12, 2008 at 2:54 pm

Posted in Languages, Lisp

Tagged with ,

Follow

Get every new post delivered to your Inbox.