A Day In The Lyf

…the lyf so short, the craft so longe to lerne

RestMvc – RESTful Goodies for ASP.NET MVC

with 4 comments

Last summer, I found myself building a RESTful ASP.NET MVC service that had an HTML admin UI. Oftentimes, the resource that was being edited in HTML was the same resource that needed to be sent out in XML via the service, which mapped nicely to the REST ‘multiple representations per resource’ philosophy.

There are obviously some very nice RESTful libraries for ASP.NET MVC, but none quite met my needs. Simply Restful Routing, which comes with MVC Contrib, takes a Rails-inspired approach of handing you a pre-built set of routes that more or less match a RESTful contract for a resource. While obviously convenient, that’s never been my preferred way to manage routing. It adds a bunch of routes that you probably have no intention of implementing. It keeps the routes centralized, which never seemed to read as well to me as the way Sinatra keeps the routing configuration next to the block that handles requests to that route.

Additionally, one of the problems I encountered with other routing libraries like Simply Restful is that they define the IRouteHandler internally, which removes your ability to add any custom hooks into the routing process. I needed just such a hook to add content negotiation. I also wanted some RESTful goodies, like responding with a 405 instead of a 404 status code if we did route to a resource (identified by a URI template), but not to a requested HTTP verb on that resource. I wanted the library to automatically deal with HEAD and OPTIONS requests. In the end, I created my own open-source library called RestMvc which provides such goodies with Sinatra-like routing and content negotiation.

Routing

public class OrdersController : Controller
{
    [Get("/orders")]
    public ActionResult Index() { ... }

    [Post("/orders"]
    public ActionResult Create() { ... }

    [Get("/orders/{id}.format", "/orders/{id}")]
    public ActionResult Show(string id) { ... }

    [Put("/orders/{id}")]
    public ActionResult Edit(string id) { ... }

    [Delete("/orders/{id}")]
    public ActionResult Destroy(string id) { ... }
}

Adding the routes for the attributes above is done in Global.asax.cs, in a couple of different ways:


RouteTable.Routes.Map();
// or RouteTable.Routes.MapAssembly(Assembly.GetExecutingAssembly());

That is, in effect, the entire routing API of RestMvc. The Map and MapAssembly extension methods will do the following:

  • Create the routes defined by the HTTP methods and URI templates in the attributes. Even though System.Web.Routing does not allow you to prefix URI templates with either / or ~/, I find allowing those prefixes can enhance readability, and thus they are allowed.
  • Routes HEAD and OPTIONS methods for the two URI templates (“orders” and “orders/{id}”) to a method within RestMVC capable of handling those methods intelligently.
  • Routes PUT and DELETE for /orders, and POST for /orders/{id}, to a method within RestMvc that knows to return a 405 HTTP status code (Method Not Supported) with an appropriate Allow header. This method and the ones that handle HEAD and OPTIONS, work without any subclassing for the Controller as shown above. However, if you need to customize their behavior — for example, to add a body to OPTIONS — you can subclass RestfulController and override the appropriate method.
  • Adds routes for tunnelling PUT and DELETE through POST for HTML browser support. RestMvc takes the Rails approach of looking for a hidden form field called _method set to either PUT or DELETE. If you don’t want the default behavior, or you do want the tunnelling but with a different form field, you can call ResourceMapper directly instead of accepting the defaults that the Map and MapAssembly extension methods provide.
  • Notice the optional format parameter on the Get attribute above the Show method. Routes with an extension are routed such that the extension gets passed as the format parameter, if the resource supports multiple representations (e.g. /orders/1.xml routes to Show with a format of xml). The ordering of the URI templates in the Get attribute is important. Had I reversed the order, /orders/1.xml would have matched with an id of “1.xml” and an empty format
  • The last point is a convenient way to handle multiple formats for a resource. Since it’s in the URL, it can be bookmarked and emailed, or tested through a browser, with the same representation regardless of the HTTP headers. Even if content negotiation is used, it allows you to bypass the standard negotiation process. Note that having different URLs for different representations of the same resource is generally frowned upon by REST purists. RestMvc does not automatically provide these routes for you, but lets you add them if you want.

    Content Negotiation

    Content negotiation is provided as a decorator to the standard RouteHandler. Doing it this way allows you to compose additional custom behavior that needs access to the IRouteHandler.

    // In Global.asax.cs
    var map = new MediaTypeFormatMap();
    map.Add(MediaType.Html, "html");
    map.Add(MediaType.Xhtml, "html");
    map.Add(MediaType.Xml, xml");
    
    var connegRouter = new ContentNegotiationRouteProxy(new MvcRouteHandler(), map);
    
    RouteTable.Routes.MapAssembly(Assembly.GetExecutingAssembly(), connegRouter);

    In the absence of a route URI template specifying the format explicitly, the connegRouter will examine the Accept request header and pick the first media type supported in the map. Wildcard matches are supported (e.g. text/* matches text/html). The format parameter will be set for the route, based on the value added in the MediaTypeFormatMap.

    The content negotiation is quite simple at the moment. The q parameter in the Accept header is completely ignored. By default, it tries to abide by the Accept header prioritization inferred from the order of the MIME types in the header. However, you can change it to allow the server ordering, as defined by the order MIME types are added to the MediaTypeFormatMap, to take priority. This was added to work around what I consider to be a bug in Google Chrome – despite being unable to natively render XML, it prioritizes XML over HTML in its Accept header. The library does not currently support sending back a 406 (Not Acceptable) HTTP status code when no acceptable MIME type is sent in the Accept header.

    Next Steps

    I haven’t worked on RestMvc in a few months, largely because I shifted focus at work and haven’t done any .NET programming in a while. However, I had planned on doing some automatic etagging, and to make the content negotiation more robust.

    Contributors welcome! The code can be found on github.

Written by Brandon Byars

January 6, 2011 at 5:02 pm

Posted in .NET

Tagged with , ,

Picking Up The Pen Again

with one comment

800.

That’s the number of days my Muse has slumbered. That’s the number of days since I set my pen down. 800 days ago, I wrote my last blog post.

Quite a bit has happened since then. The economy fell out from under us. The US elected its first black President in history. Michael Jackson died.

718 days ago, my wife delivered our second son, 3,622 days after she delivered our first son. 520 days ago, my family moved to another country. And, yesterday, my Muse awoke.

My Muse surveyed the landscape after shaking off his hibernation languor, and decided that my old blog just looked, like, so last decade. So, I moved it, and I updated it, and I changed the DNS to point to the new location1.

800 days after setting my pen down, I have a new blog. And tomorrow, my Muse and I start discussing what to do about that.

1While most of the links moved over fine, the RSS feed has changed.

Written by Brandon Byars

January 4, 2011 at 12:43 pm

Posted in Writing

Funcletize This!

I was recently involved in troubleshooting a bug in our staging environment. We had some code that worked in every environment we had put it in, except staging. Once there, you perform the equivalent of an update on a field (using LINQ in C#), only to be greeted by a ChangeConflictException.

I’m embarrassed by how long it took to figure out what was wrong. It was obviously an optimistic locking problem, and I even mentioned that it was because the UPDATE statement wasn’t updating anything once I first saw the exception. Optimistic locking works by adding extra fields to the WHERE clause to make sure that the data hasn’t changed since you loaded it. If one of those fields had changed, the WHERE clause wouldn’t match anything, and the O/RM would assume that somebody’s changed the data behind your back and throw an exception.

It turns out that failing to match any rows with the given filter isn’t the only way that LINQ will think no rows were updated; it’s also dependent on the NOCOUNT option in SQL Server. If the database is configured to have NOCOUNT ON, then the number of rows affected by each query won’t be sent back to the client. LINQ interprets this lack of information as 0 rows being updated, and thus throws the ChangeConflictException.

In itself, the bug wasn’t very interesting. What is interesting is what we saw when we opened Reflector to look at the LINQ code around the exception:

IExecuteResult IProvider.Execute(Expression query)
{
    // …
    query = Funcletizer.Funcletize(query);
}

Love it. Uniquifiers, Funcletizers, and Daemonizers of the world unite.

Written by Brandon Byars

October 26, 2008 at 12:13 pm

Posted in .NET, Database

Tagged with ,

Windows is a Ghetto

Over the years, I’ve tolerated a lot of operating system abuse. Earlier in my career, I tended to keep my mouth shut (mainly because I didn’t know any better), but over time, the annoyances have been building up into a crescendo of anger, such that at the end of the day I consistently find myself red-faced, foaming at the mouth, and screaming adjective interjection adjective noun, exclamation1 at my monitor. As the blood boils and the temper flares at the inanity I find myself forced to endure, I’ve consoled myself by deciding that I would some day vent my frustrations to the world at large.

This is that rant.

The Actors

I’ve used variants of two main operating systems. I’ve decided to give them fake names to protect the guilty.

The first operating system, the older of the two, is known as Eunuchs. The first time I used Eunuchs, I found it strange and cryptic. I couldn’t understand its attachment to the command line, nor could I grasp why so many developers and universities were so enamored with it. My very first paid programming project involved me writing a deployment tool that was used to release Eunuchs applications to production. However (and I found this quite strange at the time), the developers wanted the tool written in that other operating system, so they could get a nice usable GUI for it.

That other operating system, which I’ll call MacroHard Windows, was all about the GUI. Their command line was terrible, and it is terrible still. Back then, being paid to write a GUI deployment tool, I didn’t see why that mattered.

I’ve written a lot of code since then, both in Eunuchs and with MacroHard tools, and I’ve decided that MacroHard is appropriately named. It’s hard. Really hard.

The Rant

Windows is the worst development environment on the face of the planet

To be clear, I mean that without the slightest hint of exaggeration. For example, I understand times were hard in the early days of computing. I never had the opportunity to work on the ENIAC, but apparently it once went almost 5 days without a failure, something I have yet to achieve on any MacroHard platform. The first time I ever programmed anything, it was a blackjack program on my calculator. Programming on Windows makes me yearn for those halcyon calculator days of yesteryear.

For example?

Eunuchs has files and processes. Sure, it has pipes and sockets and runtime process information too, but those are just files, really. Sure it has network servers and daemons and startup commands, but those are nothing more than processes, and configuring them means putting a bunch of stuff in a bunch of files. If you want to manage them, you use file commands. Or process commands. Yes, sometimes they’re cryptic, and sometimes the names of things look like they were invented by 19 year old kids whose idea of social interaction was playing World of Warcraft, but it’s really not that hard to figure out once you get the hang of it.

Windows has files and processes and registry hives and COM+ catalogs and IIS metabases and app pools and Active Directory stores and Windows services and performance counters and event logs and global assembly caches. If you want to manage them, you open a barely usable GUI, and if you click into enough nested dialog boxes, you might get lucky and find the setting you need. Or you buy “enterprise” tools to manage them like Macrohard Operations ManagerTM, which has an entirely different maze of dialog boxes to follow. A colleague once put it better than I possibly could: The solution to MacroHard problems is to buy more MacroHard solutions.

Somehow, far too many IT executives have been sold on the idea that this is easier for people to understand. It isn’t.

Oh sure you can script these multitude of entities that Windows supports, if you have a new enough version of Windows. Of course, every component has a different scriptable interface. Hell, even IIS has multiple scriptable interfaces. You can use ADSI or WMI, but only WMI if you’re configuring an IIS 6.0 server. You can use C++ or VB to use ABO (admin base objects), or C++ to use specialized COM interfaces.

And sometimes, the canonical UIs that MacroHard want you to use interact with the subsystem you’re trying to script in such a way as to effectively remove any confidence in your scripts. For example, it’s possible to script your way into a disabled Windows service, where apparently the only way to remove the service is to shut down the services MacroHard Management Console Snap-InTM. If you’re a Eunuchs developer, you may want to re-read that last sentence, because it probably doesn’t appear even remotely possible to you that having a GUI open could prevent you from scripting the removal of a daemon.

But aren’t you being too hard on MacroHard?

I suppose there are some things that MacroHard has done better than Eunuchs. For example, MacroHard has Active Directory, which is an enterprise-ready LDAP security solution, not something you get out of the box in Eunuchs. And with Active Directory comes Windows Integration, where a bunch of applications that MacroHard wrote are integrated into the operating system in such a way that they automatically authenticate based on your Windows login. For example, they wrote one application (a particularly bad one, but quite widely used) called Internet Destroyer. It’s possible, as a web developer, for you to set your web site up requiring integrated Windows authentication. Anyone using Internet Destroyer to visit your website will automatically authenticate against your domain controller, which is occasionally quite handy.

However, the magic that makes that work is sometimes too obscure to fathom. For example, you can set up system-wide database settings in Windows called DSN’s. The DSN gives you two ways to connect to the database – using Windows Authentication, or using a SQL login. MacroHard also has a component called perfmon, that collects performance information for you, which has an option to log that information to a database given a DSN. Strangely, using a SQL login doesn’t work with perfmon, although good luck finding documentation that says so. If you’re a Eunuchs developer, you might want to re-read that last sentence, since it probably doesn’t appear even remotely possible that a process can’t connect to a database using a connection string with a database login.

Ok, but why are you really mad?

I’m mad because once upon a time, I wanted to know why a build failed, and that build happened to be using the MacroHard test framework called MHTest, as opposed to the open source alternatives, which are known for their annoying insistence to tell you why your build fails when it does fail. Here’s what I got with MHTest:

Test Failure: VerifyLogFilePath

At first I just found it amusing that MHTest told me what failed, but not why it failed. After all, that annoyance only manifests itself on the command line; if I used Visual Studio to run the MHTests, it’ll give me more information.

After a while, I thought it’d be nice to see how much of our code was actually covered by MHTest. Naturally, I tried to integrate that feedback into our build, which meant calling it on the command line. It turns out that MHTest’s code coverage spits out a binary file. If you want to interpret it, you’ll need to use Visual Studio.

But the good news is that, unlike the open source alternatives, MHTest randomly adds files to your projects and only works if you add magic GUIDs to your project files. This is actually a great feature if, like me, you get billed by the hour and you want large checkins to pretend that you accomplished a lot.

I’m mad because I wanted to run both versions 1 and 2 of Powershell on my machine (Powershell is MacroHard’s attempt at a less sucky command line), so that we could test against both versions, but MacroHard won’t let you run both on them on one machine. I’m mad because, for years, I thought that the Windows command line would randomly hang, but recently I discovered that if you accidentally click a mouse in a command window while it’s running a process spitting out console messages, and you happen to be in Quick Edit Mode (which is a Windows command line euphemism for completely unusable, but less so than the alternative), then the process will be paused, until you right click in the same command window. I’m mad because I like to test websites with multiple versions of Internet Destroyer, but without some crazy hackery, it’s not possible to run multiple versions on one machine. I’m mad because I installed MacroHard .NET 3.5 Service Pack 1, which forced me to close down every application on my machine – yes, forced, not asked – and reboot my machine, only to find out that it’s buggy enough that MacroHard feels compelled to supercede it with .NET 3.5 Service Pack 1 Service Release 1.

I’m mad because when I finally figure out the right combination of nested modal dialog boxes to pop up to configure IIS, I find a text box into which I cannot paste and from which I cannot copy, and whose ‘OK’ button is only enabled after accidentally clicking in the upper left hand quadrant of grayspace on the form. I’m mad because when I want to add a keyboard shortcut to Visual Studio I get a box showing me 3.87 commands at a time (out of hundreds) from which to select the command to assign the shortcut to, with no way to expand my view. I’m mad because I have absolutely no clue what error 0×00001ad59add means, and I am the goddamned system administrator I’m supposed to contact.

Mainly I’m mad because I love software development, and MacroHard has done much to lessen the joy that comes with hacking.

Reasons for optimism

Eunuchs is indeed an ironic name, given that modern derivatives can claim a veritable biblical list of progenitors: Eunuchs beget Minix, which beget Linux, which beget Knoppix… Those children have traditionally been considered as a better server platform than MacroHard, although MacroHard has made unfortunate inroads (in part thanks to one good thing MacroHard did – they were the first to realize that a “server” didn’t have to mean big iron). But the traditional advantage of MacroHard came from its desktop market, where until recently it had no competitors. It is one of the ironies of our industry that the competitor that now exists is the same one which MacroHard, in its infancy, was commissioned to help, and that the competitor now runs a derivative of Eunuchs.

OS X is becoming the de facto operating system for those who want to continue having fun developing software, but who like pretty and usable GUIs. It will never have much of a server market, but it doesn’t matter: what you develop on a Mac can be deployed to a Linux server with only minor gotchas like filename case-sensitivity. Many developers have never needed GUIs, or have been satisfied with the uglier (but improving) ones found in the Linux world, and they continue to have fun developing software. But I understand now why those developers who acted as the customers for my first paid project wanted a GUI: the command line is absolutely essential to development, but GUIs are helpful from time to time as well. Large portions of the Linux world have understood that principle for a long time, but it took Apple’s expertise on usability and their operating system’s Eunuchs heritage to provide the first OS that was both user-friendly and developer friendly.

Unfortunately for MacroHard, they are too weighted down by the enormous size of Windows to respond effectively. It took them five years to get Vista out the door, and that was only after removing every interesting feature from it. And even now, nobody uses it. As a software consultant with a history of working on MacroHard platforms, you’d think I’d know something about Vista, but honestly, I haven’t even seen it.

The development world is changing, and that’s a good thing. I have a feeling that the future of software development will be much more fun than it is now.

1This clever phrase was stolen from Steve Yegge: http://steve-yegge.blogspot.com/2008/09/programmings-dirtiest-little-secret.html

Written by Brandon Byars

October 19, 2008 at 12:20 am

Posted in Operating Systems

Tagged with , ,

Orthogonality

Orthogonality means that features can be used in any combination, that the combinations all make sense, and that the meaning of a given feature is consistent, regardless of the other features with which it is combined. The name is meant to draw an explicit analogy to orthogonal vectors in linear algebra: none of the vectors in an orthogonal set depends on (or can be expressed in terms of) the others, and all are needed in order to describe the vector space as a whole.

– Michael Scott (Programming Language Pragmatics)

I’ve used Delphi and Visual Basic at previous jobs. I disliked both of them. VB has this annoying distinction between objects and primitives, so I’d consistently type

Dim server
server = "localhost"

Dim catalog
catalog = CreateObject("MyCompany.MyObject")

…only to be greeted by an ugly Windows popup box that Object doesn't support this property or method. The problem, as all you grizzled VB programmers no doubt spotted immediately, is the last line should start with the keyword Set. VB requires you to prefix assignments on “objects” with Set. But if you try to put a Set in front of assignments on what VB considers primitives (like the first assignment above), you get an error that reads Object required: '[string: "localhost"]'.

Delphi likewise frustrated me with mundane annoyances:

DoOneThing();

if value = goal then
    DoSomething();
else
    DoSomethingElse();

The code above doesn’t compile, and it won’t compile until you remove the semicolon from the call to DoSomething(). Semicolons complete a statement, and the statement is the entire if-else clause.

These problems in VB and Delphi are related to the concept of orthogonality mentioned in the opening quote. VB doesn’t let you compose assignment to objects the same way it lets you compose assignment to primitives. Delphi doesn’t let you end an if clause the same way it lets you end an else clause. These inconsistencies encourage even experienced programmers to make silly syntax mistakes and make the language harder to use.

What is orthogonality?

The key principles I extracted from Michael Scott’s quote listed above are consistency and composability. Composability means that features can be combined, and consistency stands in for the Principle of Least Surprise—features act how you expect they would, regardless of how they’re being combined. VB’s assignments lack consistency. Delphi’s semicolon parsing doesn’t act consistently when composed within a surrounding statement.

Scott claims that a highly orthogonal language is easier to understand and easier to use. Nowhere does he mention that it’s easier to implement. I’m sure the Delphi grammar was simplified considerably by refusing to allow the DoSomething statement to end in a semicolon when contained within an outer if-else clause. It’s also likely that the implementation of Visual Basic was simplified by having the programmer tell the compiler whether an assignment referred to an object or a primitive.

I suspect many non-orthogonal aspects of languages are there to make them easier to implement. However, some languages that are trivially easy to implement can also be amazingly orthogonal. Lisp is a prime example; it is a highly orthogonal language, and yet an entire Lisp interpreter written in Lisp fits on just one page of the Lisp 1.5 Programmer’s Manual.

I found it instructive to list out syntactic constructs that make languages less orthogonal. It’s amazing how mainstream most of them are:

Statements

Statements aren’t necessary, and a number of languages avoid them altogether. Having only expressions makes the language easier to work in. Compare the inconsistency of Delphi’s statement parsing to the composability of Ruby’s expressions:

puts if x == 0
  "Zero"
else
  "Not Zero"
end

The if clause is an expression; it returns the last value evaluated (e.g., either “Zero” or “Not Zero”). And that value can be composed within another expression.

So what happens when you don’t have a reasonable return value? Smalltalk always returns self, which is convenient because it allows method chaining. In the Ruby example above, the entire expression returns the result of the call to puts, which happens to be nil.

The beauty of expressions is that the composability doesn’t just go one level deep:

x = if a == 1; 1; else; 2 end

y = x = if a == 1; 1; else; 2; end

puts y = x = if a == 1; 1; else; 2; end

puts "returns nil" if (puts y = x = if a == 1; 1; else; 2; end).nil?

As the example above shows, orthogonality doesn’t guarantee good code. However, it allows the language to be used in unanticipated ways, which is A Good Thing. Moreover, since everything is an expression, you can put expressions where you wouldn’t normally expect them. For example, in Ruby, the < operator represents inheritance when the receiver is a class, but the superclass could be an expression:

class Test < if x == 1; OneClass; else; TwoClass
end

Pushing a language like this, only to find that it’s turtles all the way down, is a signpost for orthogonality.

Primitives

We already saw the clumsiness of VB primitives, but most mainstream languages share a similar problem. Java, for example, has a confusing dichotomy of longs and Longs, the first a primitive and the second a full-fledged object. C has stack-allocated primitives, which are freed automatically when they fall out of scope, and heap-allocated variables, which you have to free yourself. C# has value types, which force another abstraction – boxing – into the programmer’s lap.

private static object DBNullIf(int value, int nullValue)
{

    return (value != nullValue) ? (object)value : DBNull.Value;

}

The code above should be a head-scratcher. Isn’t object the superclass of all other types? And if so, why do we have to explicitly cast our int variable to an object to make this code compile? Why don’t we have to do the same for the DBNull?

I mentioned above that ease of implementation is a common source of non-orthogonality. With primitives, we can see another, more legitimate reason: performance. There is a cost to keeping primitives out of the language. Below, we’ll see several more non-orthogonal language features that make performance easier to optimize.

Nulls

Nulls have been problematic enough that an entire design pattern has been created to avoid them. Access violations and object reference exceptions are some of the most common programmer errors. Most languages divide object references into two types: those that point to an object, and those that point to nothing. There’s nothing intrinsically non-orthogonal about that division, except that in most languages the references that point to nothing work differently. Instead of returning a value, they throw an exception when dereferenced.

In Ruby, the division is still more or less there — some references point to objects, and some point to nothing — but both types of references still return a value. In effect, Ruby has built the Null Object pattern into the language, as the references that point to nothing return a nil value. But, like everything else in Ruby, nil is an object (of type NilClass), and can be used in expressions:

1.nil?      # returns false
nil.nil?    # returns true

You never get an NullReferenceException in Ruby. Instead, you get a NoMethodError when you try to call a method on nil that doesn’t exist, which is exactly the same error you’d get if you called a method on any object that didn’t exist.

Magic Functions

Most object-oriented languages have certain functions that aren’t really methods. Instead, they’re special extensions to the language that make the class-based approach work.

public class Magic
{
    public static void ClassMethod()
    {
    }

    public Magic()
    {
    }

    public void NormalMethod()
    {
    }
}

Magic.ClassMethod();
Magic magic = new Magic(5);
magic.NormalMethod();

Notice the context shift we make when constructing new objects. Instead of making method calls via the normal syntax (receiver.method()), we use the keyword new and give the class name. But what is the class but a factory for instances, and what is the constructor but a creation method? In Ruby, the constructor is just a normal class-level method:

regex = Regexp.new(phone_pattern)

Typically, the new class-level method allocates an instance and delegates to the newly created instance’s initialize method (which is what most Ruby programmers call the “constructor”). But, since new is just a normal method, if you really wanted to, you could override it and do something different:

class Test
  def self.new
    2 + 2
  end
end

Test.new    # returns 4!!

Operators also tend to be magic methods in many languages. Ruby more or less treats them just like other methods, except the interpreter steps in to break the elegant consistency of everything-as-a-method to provide operator precedence. Smalltalk and Lisp, two other highly orthogonal languages, do not provide operator precedence. Operators are just normal methods (functions in Lisp), and work with the same precedence rules as any other method.

So here we have yet another reason, on top of ease of language implementation and performance, to add non-orthogonality to a language. Ruby adds operator precedence, even though it adds an element of inconsistency, presumably because it makes the language more intuitive. Since intuitiveness is one of the purported benefits of orthogonality, there is a legitimate conflict of interest here. I think I would prefer not having operator precedence, and leaving the language consistent, but it seems more of a question of style than anything else.

Static and sealed Methods

Class-based object-oriented languages impose a dichotomy between classes and instances of those classes. Most of them still allow behavior to exist on classes, but that behavior is treated differently than instance behavior. By declaring class methods as static, you’re telling the compiler that it’s free to compute the address of this function at compile-time, instead of allowing the dynamic binding that gives you polymorphism.

Ruby gives you some degree of polymorphism at the class level (although you can’t call superclass methods, for obvious reasons):

class Animal
  def self.description
    "kingdom Animalia"
  end
end

class Person < Animal
  def self.description
    "semi-evolved simians"
  end
end

In most mainstream languages, not even all instance methods are polymorphic. They are by default in Java, although you can make them statically-bound by declaring them final. C# and C++ take a more extreme approach, forcing you to declare them virtual if you want to use them polymorphically.

Behavior cannot be combined consistently between virtual and sealed (or final) methods. It’s a common complaint when developers try to extend a framework only to find out that the relevant classes are sealed.

Instance Variables

Only the pure functionalists can get by without state; the rest of us need to remember things. But there’s no reason why the mechanism of retrieving stored values has to be treated differently from the mechanism of calling a parameter-less function that computes a value, nor is there a reason that the mechanism for storing a value has to be different from the mechanism of calling a setter function to store the value for you.

It is common OO dogma that state should be private, and if you need to expose it, you should do so through getters and setters. The evolution of the popular Rails framework recently reaffirmed this dogma. In prior versions, sessions were exposed to the controllers via the @session instance variable. When they needed to add some logic to storing and retrieving, they could no longer expose the simple variable, and refactored to a getter/setter attribute access. They were able to do so in a backward-compatible way, by making the @session variable a proxy to an object that managed the logic, but it was still a process that a more orthogonal language wouldn’t have required. The language should not force you to distinguish between field access and method access.

Both Lisp and Eiffel treat instance variables equivalently to function calls (at least when it comes to rvalues). Lisp simply looks up atoms in the environment, and if that atom is a function (lambda), then it can be called to retrieve the value no differently than if the atom is a variable containing the value. Eiffel, an object-oriented language, declares variables and methods using the same keyword (feature), and exposes them – both to the outside world and to the class itself – the same way (Bertrand Meyer called this the Uniform Access principle):

class POINT
feature
    x, y: REAL
            -- Abscissa and ordinate

    rho: REAL is
            -- Distance to origin (0,0)
        do
            Result := sqrt(x^2 + y^2)
        end

Once you get passed Eiffel’s tradition of YELLING TYPES at you and hiding the actual code, Uniform Access makes a lot of sense. Instance variables are just like features without bodies. C# has properties, which provide similar benefits, but force you to explicitly declare them. Instead of a getRho and setRho method, you can have a property that allows clients to use the same syntax regardless of whether they’re using a property or a field. Because Ruby allows the = symbol as part of a method name, it allows a similar syntax.

However, the separation between variables and properties is superfluous. For example, there’s no need for them to have separate access levels. If other classes need the state exposed, then declare it public. If the language doesn’t offer instance variables, then you’re simply exposing a property method. If you run into the same problem that Rails ran into, and suddenly need to add behavior around exposed state, refactoring should be easy. Just add a private property method that is now the state, and leave the public property method.

So, in my hypothetical language, we might have the following:

class Controller
  public feature session
end

And when you feel you need to add behavior around the exposed session dictionary, it should be easy:

class Controller
  public feature session
    # added behavior goes here
    private_session
  end

  private feature private_session
end

One thing that this hypothetical syntax doesn’t allow is separate access levels for getting and setting, but it shows the basic idea.

Inconsistent Access Levels

Since we’re on the subject of access levels, Ruby’s private access level is not very orthogonal at all. Unlike C++ derived languages, Ruby’s private is object-level, not class-level, which means that even other instances of the same class can’t directly access the private method. That’s a reasonable constraint.

However, instead of making object-level private access level orthogonal, the implementors simply disallowed developers to specify the receiver for private methods. This undoubtedly made implementing object-level private access much easier. Unfortunately, it means that you can’t even use self as the receiver within the object itself, which makes moving a method from public or protected to private non-transparent, even if all references to the method are within the class itself:

class TestAccessLevels
  def say_my_name
    puts self.name
  end

  private
  def name
    "Snoopy"
  end
end

# The following line throws an exception
TestAccessLevels.new.say_my_name

Second class types

Having special types, like null, is a special case of having primitives, and it’s a common performance optimization. Being a first-class type has a well-defined meaning. Specifically, its instances can:

  • be passed to functions as parameters
  • be returned from functions
  • be created and stored in variables
  • be created anonymously
  • be created dynamically

It’s becoming increasingly common for modern languages to move towards first-class functions. In the Execution in the Land of the Nouns, Steve Yegge parodied the typical transmogrifications Java programmers had accustomed themselves to in order to sidestep the language’s lack of first-class functions. Java’s cousin (descendant?), C#, has more or less had them since .NET 2.0 in the form of anonymous delegates.

What neither Java nor C# have are first-class classes. Both have reflection, and even allow you to create new types at runtime (painfully…). But, because both languages are statically typed, you can’t access these runtime-created types the same way you can access normal types. Assuming Foo is a runtime-created type, the following code won’t compile:

Foo foo = new Foo();

The only way to make it work is by heavy use of reflection. Ruby, on the other hand, makes it trivially easy to add types at runtime. In fact, all types are added at runtime, since it’s an interpreted language.

Single Return Values

Since you can pass more than one argument to functions, why should you only be allowed to return one value from functions? Languages like ML add orthogonality by returning tuples. Ruby works similarly, unpacking arrays automatically for you if you use multiple lvalues in one expression.

def duplicate(value)
  [value, value]
end

first, second = duplicate("hi there")

This feature allows you to combine lvalues and rvalues more consistently.

Inconsistent Nesting

Nesting is the standard programming trick of limiting scope, and most languages provide blocks that you can nest indefinitely if you want to limit scope. For example, in C#:

public void TestNest()
{
    string outer = "outer";
    {
        string middle = "middle";
        {
            string inner = "inner";
        }
    }
}

In addition to blocks, Ruby allows you to nest functions, but the nesting doesn’t work in an orthogonal way:

def outer
  inner_var = "inner variable"

  def inner
    "inner method"
  end
end

outer
puts inner
puts inner_var

In this example, the last line will throw a NameError, complaining that inner_var is undefined. This is as we should expect – since it’s defined inside an inner scope from where we’re calling it, we should not be able to access it. However, the same is not true for the inner method defined in the call to outer. Despite the fact that it’s defined within a nested scope, it actually has the same scoping as outer.

Ruby’s scoping gets even weirder:

def outer
  begin
    inner_var = "inner"
  end

  inner_var
end

puts outer

This code works, printing “inner” to the console. It shouldn’t.

JavaScript similarly suffers from strange scoping rules. Because all variables have function scope, and not block scope, a variable is available everywhere within a function regardless of where it is defined:

function testScoping(predicate) {
  if (predicate) {
    var test = "in if block";    
  }
  alert(test);    // works, even though it's in an outer block
}

Fixing your mistakes

Speaking of JavaScript, few things bug me more about programming languages than those that try to fix your mistakes for you. JavaScript automatically appends semicolons for you at the end of a line if you forget to. Most of the time, that works fine, but every now and then it creates a ridiculously hard bug to diagnose:

return
{
  value: 0
}

The return statement above looks like it returns an object with a single property. What it really does, though, is simply return. The semicolon was appended on your behalf at the end of the return keyword, turning the following three lines into dead code. It is for this reason that Douglas Crockford recommends always using K&R style braces in JavaScript in JavaScript: The Good Parts.

Summary

Orthogonality makes the language easier to extend in ways that the language implementors didn’t anticipate. For example, Ruby’s first-class types, combined with its duck-typing, allows the popular mocking framework Mocha to have a very nice syntax:

logger = stub
logger.expects(:error).with("Execution error")

do_something(:blow_up, logger)

The fact that classes in Ruby are also objects means that the same syntax works for classes:

File.stubs(:exists?).returns(true)

I picked on JavaScript a couple times, but it did a better job of any other language I know of with literals. JavaScript has literal numbers and strings, like most languages. It has literal regular expressions and lists like Perl and Ruby. But where it really shines is in it’s literal object syntax. Objects are basically dictionaries, and Ruby and Perl have hash literals, but JavaScripts objects include function literals:

function ajaxSort(columnName, sortDirection) {
    setNotice('Sorting...');
    new Ajax.Request("/reports/ajaxSort", {
            method: "get",
            parameters: { sort: columnName, sortDir: sortDirection },
            onSuccess: function (transport) { 
                $("dataTable").innerHTML = transport.responseText;
            },
            onComplete: function () { clearNotice(); }
        }
    );
}

In time, that syntax was leveraged to form the JSON format.

In a more dramatic example, when object-orientation became all the rage, Lisp was able to add object-orientation to the language without changing the language. The Common Lisp Object System (CLOS) is written entirely in Lisp. The reason Lisp was able absorb an entire new paradigm is largely due to the language’s orthogonality. Not all function-like calls work the same way; some are called “special forms” because they act differently (for example, by providing lazy evaluation of the arguments). However, Lisp allows the programmer to create their own special forms by writing macros, which are themselves written in Lisp.

It helps to have turtles all the way down.

Written by Brandon Byars

July 21, 2008 at 10:36 pm

Posted in Languages

Tagged with , , , , , , , ,

Coding Values

I always enjoy reading Kent Beck’s books. In a way, they remind me of a poster my mom has hanging in her house: All I Really Need to Know I Learned in Kindergarten. With the frenzied pace of change in software development, it’s easy to forget about the basics. A good remedy for that is to sit down and read one of Beck’s books.

Recently, I read Implementation Patterns, and while I enjoyed Smalltalk Best Practice Patterns more, I still got something out of the book (despite being neither a Java nor a Smalltalk developer). As with XP, Beck explains his programming practices in terms of values and principles. I always appreciated that, in Extreme Programming Explained (both editions), he not only described his values, but laid out concrete practices that embodied those values. In Implementation Patterns, he takes a similar approach, claiming the three values of Communication, Simplicity, and Flexibility are what inspire his programming practices.

He describes them as listed in priority order. Where they clash, which is rare, it is better to be communicative than overly simple. And simplicity trumps flexibility. Too much terrible software is written under the presupposition that it must be flexible.

What Beck did not write about were values that run counter-productive to quality code. I tried coming up some as an exercise – those that I’ve witnessed in myself and others. I was able to come up with the following:

Ownership

I’ve worked with one developer who held on to code he had written so tightly that, even after we had instituted a code-sharing policy, it took an enormous amount of effort to loosen his grip. We’d go and refactor his code, and he’d come in after us and refactor it back. We’d add functionality to his code, and he’d come in after us and change it.

There were certain very odd aspects of his behavior:

  • His code was really bad, so it was strange that he resisted outside help so much
  • Even after we started rotating coding tasks to get people involved in parts of the code they had previously not touched, he showed little interest in actually doing any of those tasks
  • He didn’t care to hear any suggestions about how to improve the system where it involved changing “his” code, because, as he like to say, “it’s complicated.”

His behavior ultimately got him fired, but not before he had time to leave a legacy of terrible code that others now struggle to maintain. In his defense, his sense of ownership exhibited a high degree of pride and aesthetics. Unfortunately, his sense of aesthetics was also quite poor, and his refusal to allow feedback meant that it’s unlikely improved much today.

Cleverness

I’ve found myself guilty of this sin a few too many times. Cleverness often manifests itself in extraordinarily terse code, using some backdoor of the language. It’s become passé these days to show off clever Ruby code, as it once was in Perl, but overly clever solutions can be found in all languages.

A related value is Knowledge, or the verisimilitude thereof. Often I find myself (or others) eager to show off that they know obscure parts of a language and use them for some temporary benefit. Knowledge-Show-Offs are easy to spot: look for phrases like “verisimilitude thereof.”

Both Ownership and Cleverness run counter to Beck’s primary value of Communication.

Comfort

How often have you seen C code in Java? Or Java code in Ruby? We are creatures of habit, and it often takes a conscious effort to break out of your comfort zone. Too often, we hold on to old ideas because they’re comfortable ideas, and label new ideas as “passing fads,” a particularly common slur in our industry.

Why is it so hard to make unit testing mainstream? Why don’t more people pair program? Both practices remove people from their comfort zone, and require learning a new way of programming.

Note that none of these values are bad in the general sense. I’d like to feel a sense of ownership about my work, to be clever, and to feel comfortable. But values collide, and we need to understand that certain values are more applicable in certain aspects of our lives than others.

Written by Brandon Byars

July 14, 2008 at 7:56 am

Posted in Agile

Tagged with

Beating Sapir-Whorf

The Sapir-Whorf hypothesis is a famous linguistic theory that postulates that what you think is constrained by what you can say. In other words, the way you understand the world is limited by the language(s) you speak.

The hypothesis is regarded as incorrect by modern linguists. However, the spirit of the theory appears to be largely true if we refer only to computer languages – our solution space is limited to the number of ways we know to express solutions. Paul Graham was indirectly talking about Sapir-Whorf when he described the Blub Paradox. Blub, the language that sits halfway up the hypothetical totem pole that represents the relative power of computer languages, has certain constructs that make languages lower on the pole seem downright dysfunctional. The Blub programmer would look down, realize that the language he’s looking at doesn’t even have some language feature X, and scoff at how much extra work it would be to write programs without X. X, the Blub programmer argues, is so fundamental to designing your application that it would criminal to use any languages without X.

When the Blub programmer looks up the totem pole, though, he just sees weird languages. What he doesn’t realize, of course, is that programmers in those languages are looking down at Blub and wondering how anybody could possibly program without language feature Y.

The Blub paradox makes a nice story, and while it has value, it oversimplifies the language landscape. There isn’t a linear totem pole for expressive power when it comes to languages. Graham suggests looking at the size of the parse tree, which is indeed a useful single metric, but ignores aspects like long-term maintainability, readability, syntactic sugar, etc. The computer language landscape is diverse precisely because there are different opinions on the best way to solve different problems.

What should be obvious is that not all software problems should be solved with the same tool. Too many programmers go down that path, and use Java (or C#, or C++…) as the hammer and let everything become a nail.

Granted, we don’t always have control over which language we use. We write programs in Java (or C#, or C++…) because that’s our job, even though we may suspect that it’s not always the best tool. Fair enough. But if our ability to describe solutions to a problem is constrained by the programming languages we know, doesn’t it make sense to learn other languages even if we can’t use them on the job?

Knowing a diverse set of programming paradigms can help you become a better programmer. It’s helped me. For example, I wrote a closure-based undo in C#. A more typical OO-based undo would have resorted to the command design pattern, but the closure-based implementation required much fewer changes to the codebase. I don’t think I would have come up with the solution if I had not had some prior exposure to functional languages.

The key is to make sure that when you pick up a new language, that it’s different enough from languages you already know that you force yourself to wrap your head around some new concept. You don’t get that, for instance, going from Java to C#. I’ve tried learning at least a little about several languages, and found useful new concepts in the following:

In the object-oriented space:

  • C# / Java (statically typed OO)
  • Eiffel (design-by-contract, anchored types)
  • Ruby (open classes, incredible reflection, metaprogramming)
  • Smalltalk (like Ruby, but with keyword arguments and a very lightweight syntax)

In the hybrid OO space

  • Perl / Python (dynamic typing, default variables, list comprehensions)
  • C++ / Delphi (destructors)
  • JavaScript (prototypes instead of classes, hybrid functional)

In the procedural space:

  • C (weak typing, systems level procedural, teaches good understanding of computer memory)

In the functional space:

  • Scheme / Common Lisp (homoiconic, syntactic macros, multiple dispatch, continuations, no penalty for tail recursion)
  • Erlang (nice distributed model, pattern matching)

In the “everything-else” space:

  • shell scripting (using processes as the unit of abstraction)
  • SQL (yes, even the much-maligned relational, set-based model provides another, occasionally superior, way of solving problems)
  • XSLT (declarative)

In the “I wish I could speak intelligently about” space:

  • Prolog (propositional logic programming)
  • Forth (stack-based programming)

PragDave recommends that you learn a new programming language each year. It’s terrific advice. Pick a language, find some coding exercises, and try to wrap your head around solving those problems idiomatically in the new language. Don’t be surprised if you start out solving them the same way you would have in a language you already knew, but remember the end goal is to learn new problem solving techniques, not just new syntax. Since you probably won’t be using the language on the job, you’ll almost certainly forget the syntax after a while. Start posting questions to mailing lists, blog about some of your sample work, and don’t be afraid to make a fool of yourself. The way I see it, if you’re still afraid to make a fool of yourself, then you clearly haven’t made a fool of yourself enough.

Written by Brandon Byars

May 13, 2008 at 11:09 pm

Posted in Languages

Follow

Get every new post delivered to your Inbox.