Sunday, June 10, 2007

"Typesafe" Enum in Ruby

In which I attempt to TDD a syntactically sweet Ruby enum with limited success.

Update Nov 2 2007: See my post of moments ago for a gem that implements something like the enum implemented in this post.

The other day I was thinking about the syntactic sugar you can add to Ruby yourself, and my mind wandered to the type-safe enumeration pattern. One of the bits of syntactic sugar Java 5 introduced was to convert the type-safe enum from a best practice that required some pretty verbose code to a first-class language feature. In Java, you've got to wait for things like that, but in Ruby, we can do a lot on our own. I decided to see how clean a syntax I could come up with that would give me the same functionality in Ruby.

[Of course it can't be quite the same: Ruby's not statically typed, and type-checking is one of the main touted benefits of typed enums. Still, the enum does a number of handy tricks that I'd like to have in my Ruby projects.]

Pre-Java5, a type-safe enum looked something like this:

// Java < 5
public class Suit extends TypeSafeEnum { // You have to write that superclass yourself.
 public static final Suit SPADES = new Color("SPADES");
 public static final Suit CLUBS = new Color("CLUBS");
 public static final Suit DIAMONDS = new Color("DIAMONDS");
 public static final Suit HEARTS = new Color("HEARTS");
 private Suit(String value) {}
}

That TypeSafeEnum class should provide a mechanism to look up constants by their value, a reasonable toString() method, and some ugly stuff to trick out serialization so that you can do things like suit==SPADES and get the expected results. (Note you could sometimes get bit by your class getting loaded by multiple classloaders, breaking ==. I won't get into the details because we use Ruby now and don't have to worry about it. Phew!)

Java 5 cleaned that up by allowing you to just write this:

// Java >= 5
public enum Suit { SPADES, CLUBS, DIAMONDS, HEARTS }

It's way better. In fact, it's got so little in the way of distracting keywords and punctuation that it barely feels like Java! I can't let Ruby be outdone. Surely we can be just as terse and clear and still get the features.

Here's what I'd see as the ideal Ruby syntax for declaring an enumeration of values:

enum Status
  NOT_STARTED
  IN_PROGRESS
  COMPLETE
end

I can't see a quick way to make that valid Ruby though, so let's start with the following and hope we don't have to tweak it unless it's in the direction of the even loftier goal above:

class Status < EnumeratedValue
  NOT_STARTED
  IN_PROGRESS
  COMPLETE
end

So what do we want this thing to do? Here's a first test:

def test_constant_values_are_instances_of_enumerated_value_type
  assert_equal Status, Status::NOT_STARTED.class
end

Before we can start working on the goal of the test, we have to get past some NameErrors. First we create EnumeratedValue, then we override Module.const_missing.

class EnumeratedValue
  def self.const_missing sym
  end
end

With an empty implementation (implicitly returning nil) we get a useful failure: NOT_STARTED is a NilClass instead of a Status. So we want that const_missing to create us a Status and stick it in a constant. Easy enough.

class EnumeratedValue
  def self.const_missing sym
    const_set sym, self.new
  end
end

Next up, let's get to_s reasonable.

def test_to_s_gives_fully_qualified_constant_name
  assert_equal 'Status::NOT_STARTED', Status::NOT_STARTED.to_s
end

That passes once we do

class EnumeratedValue
  def self.const_missing sym
    const_set sym, self.new(sym)
  end
  
  def initialize name
    @name = name
  end
  
  def to_s
    "#{self.class.name}::#{@name}"
  end
end

What next? Comparison operators would be nice, as would making each EnumeratedValue class Enumerable over its values. But before I start implementing those features, I realize there's something pretty uncool I haven't dealt with yet. This test fails.

def test_misspelled_constant_name_raises_NameError
  assert_raises(NameError) { Status::NOTS_TARTED }
end

Oops. With the uber-open const_missing approach, using a constant that isn't there creates it just in time for you to get behavior you weren't expecting. One of the best reasons to use constants instead of symbols is that you catch misspellings with a big loud NameError instead of some strange test failure. Our current implementation loses this checking. What are we going to do?

I'd be happy to flip a bit once I'm finished defining my enumeration of values and check for that in const_missing, but I don't want to clutter my enum with that.

class Status < EnumeratedValue
  NOT_STARTED
  IN_PROGRESS
  COMPLETE
  done!       # Poo.
end

I thought for a moment I could use Module.nesting to save my bacon, setting it up so you'd get lazy constant creation inside the enumerated value class but not from outside. Unfortunately I need the nesting at the point the missing constant was encountered, not the nesting of the const_missing method. (What is that method for anyway?)

There's no hook method that will kick in when the class definition ends, so I turn to the Ruby feature I always think of when I need to set up some state for some code to run then clean that state up: block methods. With a minor change in the enum declaration, we should have the hook we need to make sure constants are only declared intentionally.

I change Status to this:

class Status < EnumeratedValue
  values do
    NOT_STARTED
    IN_PROGRESS
    COMPLETE
  end
end

Tests pass! Exciting. But that declaration is pretty wordy &emdash; wordier than the one I marked as "Poo" above. Can't we collapse the whole thing into just one method call, passing a block full of constants to be defined?

I change Status to this:

enum :Status do
  NOT_STARTED
  IN_PROGRESS
  COMPLETE
end

and create this method at the top level:

def enum sym, &block
  type = Class.new(EnumeratedValue)
  Object.const_set sym, type
  type.class_eval &block
end

We run our tests and ... failure! The block passed to the enum method is a closure bound to the top level where it occurs in the code. The class_eval method changes the value of self but not the binding, so Ruby is looking for the constants in Object, not in Status.

Can we get around this? Kernel.eval allows us to provide an arbitrary binding, but will only accept a string, not a block or Proc. With that restriction, it appears we can't be as brief and clean as we'd like to be. Either we go with the double-nested solution above, or we settle for declaring with symbols or strings, like this:

enum :Status, %w{ NOT_STARTED IN_PROGRESS COMPLETE }

where enum is defined as something like:

def enum type_name, values
  eval <<-END
    class #{type_name} < EnumeratedValue
      values do
        #{values.join(';')}
      end
    end
  END
end

It's pretty good, though I'm sad to have come so close to our ideal and been unable to make it work without compromise. If you see something I've missed, let me know.

Are we defeated? Java still has the cleaner enum declaration. But it took them years and a new compiler. We were able to come up with something just as terse, even if it wasn't quite as clean as we'd dreamed it might be, in a matter of minutes. On the way we learned a little about Ruby's eval methods, constant lookup, const_missing, and ourselves.

Let's call it a draw and be thankful we get to work with Ruby.

For reference, here's a similar implementation on ruby-talk. (It looks like it's quoted from another message, but it doesn't seem to be.) Here's a page with that implementation as well as a few others collected more readably.

I may implement this the rest of the way and turn it into a gem. At this point I don't think it's valuable enough.

7 comments:

Anonymous said...

You could do something cunning by aliasing Object's
const_missing out of the way and adding your own version that adds the constants to your target namespace, then, once the declaration block has finished executing, put the 'real' const_missing back.

Only catch is, you might end up clashing with constants that are already 'visible' from Object, but I think that's a problem with any const_missing based solution.

John Hume said...

That's an interesting idea that would definitely work, but I wouldn't be comfortable doing a temporary substitution of something so global.

Regarding the clash being a problem with any const_missing-based solution, I don't see it as a problem when you're only overriding const_missing in the target module itself, as I've done above. (Obviously a clash is still possible, but it won't happen unless you're drunk.)

John Hume said...

PS -- Piers, thanks for being my first commenter!

Shane Harvie said...

Nice work John - I was thinking (for the first time in a long while I might add) that Java has a cleaner solution than Ruby the other day for exactly this reason. Glad I wasn't missing something entirely... I don't mind the %w notation though - it's better than anything I came up with.

Anonymous said...

Hey, I've been working on this same problem myself for quite awhile and decided to solve it before googling for others' solutions. Here's what I got for the enum declaration:

#Sample Enumeration type
#
class Types < Enum
# set up the constants with parameters
set :INTEGER, 4
set :BOOLEAN, 1
set :STRING, 2
set :FLOAT, 8
set :CHAR, 1
set :DOUBLE, 6

# define the constructor
set_constructor do |context, args|
context.set_var :num_bytes, args[0]
end

# define any methods
set_method :num_bytes do |context, args|
context.get_var :num_bytes
end

# call the initialization routine
init
end

I tried to avoid %w for 'ideological' reasons and to make it more challenging and this is the best I could get.. you can probably guess most of the implemenation from the example.. any ways to improve this?

Aslak Hellesøy said...

This looks complicated. What's wrong with symbols and why do you need typesafe enums in the first place?

John Hume said...

Aslak,
Certainly just using magic symbols is far simpler. I like the really clear "NameError: uninitialized constant" message you get when you misspell something, but a good test/spec suite should ensure that you haven't misspelled anything if you go with symbols, so that's not terribly compelling.

What's more compelling to me, though I have to admit I've used it very little, is the ability to put behavior into the value objects. I didn't go into that in this post, but the (Rspec) example code for the gem I just released includes it.

(Note that I put "typesafe" in scare quotes. Anyone looking for type-safety is looking in the wrong language.)