A Day In The Lyf

The lyf so short, the craft so longe to lerne

Parse-Time Execution

Posted by Brandon Byars Mon, 05 May 2008 02:24:00 GMT

Update: A few short hours after posting this article, Ola Bini left a comment explaining how calling what Ruby does ‘parse-time execution’ is just plain wrong. In hindsight, it seems like a silly mistake to make. Of course Ruby parses a file before executing it – how many times had I seen the standard parse error? Ah well, that’s why we blog, no? Making silly mistakes is all part of the learning process.

Rather than try to hide my idiocy, I decided instead to clarify – to myself as much as anyone – what my thinking was that led me to the mistake. I had just learned about syntactic macros in Lisp, and was a bit overzealous in applying the same concepts to Ruby. However, I do believe that Ruby provides many of the same benefits. Being able to include things that are typically syntactic definitions as part of the execution environment makes it less important whether the code executes at parse time or execution time. Ruby provides that ability. If you just ignore the silly ‘parse-time’ phrase, and focus instead on the fact that we are defining syntactic constructs at execution time, then the advantage becomes clear.

End Update

My recent foray into understanding Lisp macros got me thinking more about code that executes at parse-time. While few other languages have access to the raw parse tree, many other languages have parse-time executable code. Having worked in some of those languages, I’d seen it before, but never really thought about the difference between parse-time and run-time execution until recently.

Developers tend to think of executable code as a run-time concept. This is especially true in the mainstream static languages of the day. Indeed, the very term “run-time” acts as a conceptual block for many of us (including, until recently, me), for it is intended to describe the environment in which code can run. What is ignored in that concept is that interpreted languages can execute code as it’s parsed, before the entire application has finished parsing, and some languages let you execute code as the source code is compiling.

In most cases, the difference isn’t worth thinking about:

puts "Hello, world!"
puts "Goodbye!"

The Ruby code above shows a trivial example. The first line of code is parsed and executed. “Hello, world!” prints on our console before the second line is even parsed. You can verify this simply by adding a syntactic error on the second line:

puts "Hello, world!"
blarf "Goodbye!"

As you may suspect, blarf isn’t a predefined function in the Ruby language, and when the second line executes, the Ruby interpreter will spit out a NoMethodError to us. However, it will not do so until first saying hi to us! The first line parses just fine, and before parsing the second line, it is executed.

OK, so what? The example above isn’t very interesting. Let’s show another example. Ruby allows open classes, which means that you should be able to change a class even after it’s been defined:

class Dog
  def wag_tail
    puts "tail wagged..."
  end
end

class Dog
  def bark
    puts "Wuff!"
  end
end

So now a Dog can both wag it’s tail and bark. But notice what happens if we get Evil (which basically means we start thinking like Microsoft or Sun), and want to seal our class:

class Dog
  def wag_tail
    puts "tail wagged..."
  end
end

Dog.freeze

class Dog
  def bark
    puts "Wuff!"
  end
end

The only difference between this code and the code above is the call to freeze, which makes the receiver immutable. The receiver in this case is the Dog class, and so when we get to the next line, which tries to reopen the class, the interpreter throws a TypeError. So a line of code that has already executed causes an exception to be thrown when the next line is parsed.

That’s a little bit more interesting, but not very instructive unless you want to be a framework developer for Microsoft or Sun. Let’s find a better example; take a look at the following code:

class Blog
  def title
    @title
  end

  def title=(value)
    @title = value
  end
end

This code still gets executed as it’s parsed, but all it does is add the Blog class to the symbol table. The getter and setter methods are added to the class, but the code within those methods isn’t executed.

Of course, no self-respecting Ruby programmer would write the code above. Instead, it’d look like this:

class Blog
  attr_accessor :title
end

Once the Ruby interpreter gets to the end keyword, it knows it has parsed a complete executable instruction and executes it. Again, the Blog class is added to the symbol table. But we’re no longer statically adding methods to the class. Instead, attr_accessor, a method of Class, is executed. attr_accessor adds the getter and setter to the class when it gets executed (essentially by eval‘ing the boilerplate code above). We depend upon attr_accessor running at parse time! Otherwise, our getter and setter wouldn’t exist.

Parse-time executable statements are often called macros in Ruby, which denotes their similarity to Lisp macros (although Ruby lacks access to the parse tree). Rails has become popular in part for its ability to use macros to simplify your configuration:

class Blog < ActiveRecord::Base
  has_many :articles
end

Here we’re managing our object-relational mapping relationships using a macro. As the has_many statement gets parsed, the various methods to manage the relationship get added to the Blog class.

When you first grok macros, you start seeing duplication that you never would have noticed before:

class Rollback < ActiveRecordError
end

class DangerousAttributeError < ActiveRecordError
end

class MissingAttributeError < NoMethodError
end

The code above was stolen from ActiveRecord::Base. We could, if we choose, eliminate the duplication with something like this:

  expose_exceptions ActiveRecordError, :Rollback, :DangerousAttributeError
  expose_exceptions NoMethodError, :MissingAttribute

expose_exceptions, as shown here, takes the exception’s superclass as the first argument, followed by a list of exception class names. It would be trivial to implement, but it’s not the approach Rails takes, and for good reason. While there is indeed duplication in the Rails code, it is justifiable, since it allows a body of comments (stripped out in my example above) to explain what the exception is there for.

What else does parse-time execution get us? On a previous project, we used Rails fixtures to store our test data, even though we weren’t using Rails (I wrote about this here). Rails fixtures were designed for testing, but because they facilitate a fairly nice way of storing test data (in YAML), we co-opted it for that purpose. To make it work, we had to write an ActiveRecord class for each of our tables. It was a pretty mechanical process, but we had to override certain things since we didn’t abide by Rails naming conventions (and what conventions we claimed to abide by were applied inconsistently). The following definitions were typical:

class Account < ActiveRecord::Base
  set_table_name 'Accounts'
  set_primary_key 'AccountId'
end

class Order < ActiveRecord::Base
  set_table_name 'Orders'
  set_primary_key 'Id'
end

After several of these definitions, the duplication became obvious. We couldn’t completely eliminate it because of our own naming inconsistencies, but at least we could make reasonable defaults that could be overridden if needed. In particular, notice the table names are the plural of the class names (Rails expects this too, but with lower cases and underscores). set_table_name is a class method, not an instance method, so it may not be immediately clear how to eliminate that duplication.

Ruby provides certain hook methods during parse-time events. One such hook is called when your class is subclassed. We used that to trigger a call to set_table_name:

class StandardTable < ActiveRecord::Base
  set_primary_key 'Id'

  def self.inherited(subclass)
    subclass.class_eval "set_table_name '#{subclass.name.pluralize}'"
    super
  end
end

class Account < StandardTable
  set_primary_key 'AccountId'
end

class Order < StandardTable; end

As soon as a command to subclass StandardTable is parsed (which means reaching the end statement), the class level inherited method hook is invoked, which eval’s the default set_table_name. As Account shows with set_primary_key, the subclass can still override the macros if needed.

Executing code at parse-time is an extremely powerful technique, even without access to the parse tree. Languages that don’t allow you the option of parse-time execution weaken your ability to make powerful abstractions. Working in languages that allow you parse time execution expand your thinking in useful directions, allowing you to spot duplication that you might never have seen before.

Posted in , | 2 comments | atom

Trackbacks

Use the following link to trackback from your own site:
http://brandonbyars.com/blog/trackbacks?article_id=parse-time-execution&day=04&month=05&year=2008

Comments

Leave a response

  1. Ola Bini
    about 4 hours later:
    Sorry mate, but the word "parse-time execution" is actually totally wrong with regards to Ruby. Nothing is executed during parse-time. A file is always completely parsed before any execution happens. The reason that your "blarf" example works is because it's a semantic error, not a syntactic error. Try replacing it with something syntactic, like doing puts "hello world" blarf "hello world") The closing parenthesis will trigger a syntax error, and you won't see any output. See, there really only is one execution time in Ruby and that is runtime. Metaprogramming tricks of all kinds works because you can manipulate class/modules and methods at runtime, so that's what you end up doing. In fact, even defining a class is mostly just straight up execution (which is why you can use if statements around your def's)
  2. Brandon Byars
    about 10 hours later:
    Thanks for the clarification Ola. Always nice to have a language implementor put you in your place :)
    I see that parse-time execution is a poor turn of phrase for the Ruby example. But I think you nailed the differentiator I was trying to describe -- making definitions straight execution, so you can manipulate them (as in the Dog.freeze example), gives you some of the same flexibility that syntactic macros give you.

Leave a comment