"Typesafe" Enum in Ruby
In which I attempt to TDD a syntactically sweet Ruby enum with limited success.
Update Nov 2 2007: See my post of moments ago for a gem that implements something like the enum implemented in this post.
The other day I was thinking about the syntactic sugar you can add to Ruby yourself, and my mind wandered to the type-safe enumeration pattern. One of the bits of syntactic sugar Java 5 introduced was to convert the type-safe enum from a best practice that required some pretty verbose code to a first-class language feature. In Java, you've got to wait for things like that, but in Ruby, we can do a lot on our own. I decided to see how clean a syntax I could come up with that would give me the same functionality in Ruby.
[Of course it can't be quite the same: Ruby's not statically typed, and type-checking is one of the main touted benefits of typed enums. Still, the enum does a number of handy tricks that I'd like to have in my Ruby projects.]
Pre-Java5, a type-safe enum looked something like this:
// Java < 5
public class Suit extends TypeSafeEnum { // You have to write that superclass yourself.
public static final Suit SPADES = new Color("SPADES");
public static final Suit CLUBS = new Color("CLUBS");
public static final Suit DIAMONDS = new Color("DIAMONDS");
public static final Suit HEARTS = new Color("HEARTS");
private Suit(String value) {}
}
That TypeSafeEnum
class should provide a mechanism to look up constants by their value, a reasonable toString()
method, and some ugly stuff to trick out serialization so that you can do things like suit==SPADES
and get the expected results. (Note you could sometimes get bit by your class getting loaded by multiple classloaders, breaking ==
. I won't get into the details because we use Ruby now and don't have to worry about it. Phew!)
Java 5 cleaned that up by allowing you to just write this:
// Java >= 5
public enum Suit { SPADES, CLUBS, DIAMONDS, HEARTS }
It's way better. In fact, it's got so little in the way of distracting keywords and punctuation that it barely feels like Java! I can't let Ruby be outdone. Surely we can be just as terse and clear and still get the features.
Here's what I'd see as the ideal Ruby syntax for declaring an enumeration of values:
enum Status
NOT_STARTED
IN_PROGRESS
COMPLETE
end
I can't see a quick way to make that valid Ruby though, so let's start with the following and hope we don't have to tweak it unless it's in the direction of the even loftier goal above:
class Status < EnumeratedValue
NOT_STARTED
IN_PROGRESS
COMPLETE
end
So what do we want this thing to do? Here's a first test:
def test_constant_values_are_instances_of_enumerated_value_type
assert_equal Status, Status::NOT_STARTED.class
end
Before we can start working on the goal of the test, we have to get past some NameError
s. First we create EnumeratedValue
, then we override Module.const_missing
.
class EnumeratedValue
def self.const_missing sym
end
end
With an empty implementation (implicitly returning nil) we get a useful failure: NOT_STARTED is a NilClass instead of a Status. So we want that const_missing
to create us a Status and stick it in a constant. Easy enough.
class EnumeratedValue
def self.const_missing sym
const_set sym, self.new
end
end
Next up, let's get to_s
reasonable.
def test_to_s_gives_fully_qualified_constant_name
assert_equal 'Status::NOT_STARTED', Status::NOT_STARTED.to_s
end
That passes once we do
class EnumeratedValue
def self.const_missing sym
const_set sym, self.new(sym)
end
def initialize name
@name = name
end
def to_s
"#{self.class.name}::#{@name}"
end
end
What next? Comparison operators would be nice, as would making each EnumeratedValue class Enumerable over its values. But before I start implementing those features, I realize there's something pretty uncool I haven't dealt with yet. This test fails.
def test_misspelled_constant_name_raises_NameError
assert_raises(NameError) { Status::NOTS_TARTED }
end
Oops. With the uber-open const_missing
approach, using a constant that isn't there creates it just in time for you to get behavior you weren't expecting. One of the best reasons to use constants instead of symbols is that you catch misspellings with a big loud NameError instead of some strange test failure. Our current implementation loses this checking. What are we going to do?
I'd be happy to flip a bit once I'm finished defining my enumeration of values and check for that in const_missing
, but I don't want to clutter my enum with that.
class Status < EnumeratedValue
NOT_STARTED
IN_PROGRESS
COMPLETE
done! # Poo.
end
I thought for a moment I could use Module.nesting
to save my bacon, setting it up so you'd get lazy constant creation inside the enumerated value class but not from outside. Unfortunately I need the nesting at the point the missing constant was encountered, not the nesting of the const_missing
method. (What is that method for anyway?)
There's no hook method that will kick in when the class definition ends, so I turn to the Ruby feature I always think of when I need to set up some state for some code to run then clean that state up: block methods. With a minor change in the enum declaration, we should have the hook we need to make sure constants are only declared intentionally.
I change Status to this:
class Status < EnumeratedValue
values do
NOT_STARTED
IN_PROGRESS
COMPLETE
end
end
Tests pass! Exciting. But that declaration is pretty wordy &emdash; wordier than the one I marked as "Poo" above. Can't we collapse the whole thing into just one method call, passing a block full of constants to be defined?
I change Status to this:
enum :Status do
NOT_STARTED
IN_PROGRESS
COMPLETE
end
and create this method at the top level:
def enum sym, &block
type = Class.new(EnumeratedValue)
Object.const_set sym, type
type.class_eval &block
end
We run our tests and ... failure! The block passed to the enum
method is a closure bound to the top level where it occurs in the code. The class_eval
method changes the value of self
but not the binding, so Ruby is looking for the constants in Object, not in Status.
Can we get around this? Kernel.eval allows us to provide an arbitrary binding, but will only accept a string, not a block or Proc. With that restriction, it appears we can't be as brief and clean as we'd like to be. Either we go with the double-nested solution above, or we settle for declaring with symbols or strings, like this:
enum :Status, %w{ NOT_STARTED IN_PROGRESS COMPLETE }
where enum
is defined as something like:
def enum type_name, values
eval <<-END
class #{type_name} < EnumeratedValue
values do
#{values.join(';')}
end
end
END
end
It's pretty good, though I'm sad to have come so close to our ideal and been unable to make it work without compromise. If you see something I've missed, let me know.
Are we defeated? Java still has the cleaner enum declaration. But it took them years and a new compiler. We were able to come up with something just as terse, even if it wasn't quite as clean as we'd dreamed it might be, in a matter of minutes. On the way we learned a little about Ruby's eval methods, constant lookup, const_missing, and ourselves.
Let's call it a draw and be thankful we get to work with Ruby.
For reference, here's a similar implementation on ruby-talk. (It looks like it's quoted from another message, but it doesn't seem to be.) Here's a page with that implementation as well as a few others collected more readably.
I may implement this the rest of the way and turn it into a gem. At this point I don't think it's valuable enough.
7 comments:
You could do something cunning by aliasing Object's
const_missing out of the way and adding your own version that adds the constants to your target namespace, then, once the declaration block has finished executing, put the 'real' const_missing back.
Only catch is, you might end up clashing with constants that are already 'visible' from Object, but I think that's a problem with any const_missing based solution.
That's an interesting idea that would definitely work, but I wouldn't be comfortable doing a temporary substitution of something so global.
Regarding the clash being a problem with any const_missing-based solution, I don't see it as a problem when you're only overriding const_missing in the target module itself, as I've done above. (Obviously a clash is still possible, but it won't happen unless you're drunk.)
PS -- Piers, thanks for being my first commenter!
Nice work John - I was thinking (for the first time in a long while I might add) that Java has a cleaner solution than Ruby the other day for exactly this reason. Glad I wasn't missing something entirely... I don't mind the %w notation though - it's better than anything I came up with.
Hey, I've been working on this same problem myself for quite awhile and decided to solve it before googling for others' solutions. Here's what I got for the enum declaration:
#Sample Enumeration type
#
class Types < Enum
# set up the constants with parameters
set :INTEGER, 4
set :BOOLEAN, 1
set :STRING, 2
set :FLOAT, 8
set :CHAR, 1
set :DOUBLE, 6
# define the constructor
set_constructor do |context, args|
context.set_var :num_bytes, args[0]
end
# define any methods
set_method :num_bytes do |context, args|
context.get_var :num_bytes
end
# call the initialization routine
init
end
I tried to avoid %w for 'ideological' reasons and to make it more challenging and this is the best I could get.. you can probably guess most of the implemenation from the example.. any ways to improve this?
This looks complicated. What's wrong with symbols and why do you need typesafe enums in the first place?
Aslak,
Certainly just using magic symbols is far simpler. I like the really clear "NameError: uninitialized constant" message you get when you misspell something, but a good test/spec suite should ensure that you haven't misspelled anything if you go with symbols, so that's not terribly compelling.
What's more compelling to me, though I have to admit I've used it very little, is the ability to put behavior into the value objects. I didn't go into that in this post, but the (Rspec) example code for the gem I just released includes it.
(Note that I put "typesafe" in scare quotes. Anyone looking for type-safety is looking in the wrong language.)
Post a Comment