Friday, September 12, 2008

Cleaner Utility Modules with Module#module_function or extend self

[Disclaimer: GLoc is a nice and generally well written library, and I'd encourage anyone who needs to translate strings to consider it. Also strongly consider i18n, particularly if you're using Rails 2.x.]

My pair and I were looking at the code for GLoc this week. When a class includes the GLoc module, it makes its methods available on instances of the class (as one would expect with include) as well as on the class itself (which would normally only happen with a call to extend). Those same methods are also available as module methods of GLoc itself. So all of the following will work.

GLoc.l(:hi)

class Greeter
  include GLoc
  
  def greet
    l(:hi)
  end
  
  def self.greet
    l(:hi)
  end
end

What caught our eye was how the library goes about making itself so available. Minus all the functionality, here's what it does.

module GLoc
  module InstanceMethods
    # ...
    # useful methods defined here
    # ...
  end
  
  include ::GLoc::InstanceMethods
  
  module ClassMethods
    include ::GLoc::InstanceMethods
  end
  
  def self.included(target)
    super
    class << target
      include ::GLoc::ClassMethods
    end
  end
  
  class << self
    include ::GLoc::InstanceMethods
  end
end

Patrick's comment was "I don't know off-hand what the cleanest way is to do that, but I know that ain't it."

So what's the cleanest way?

First let's clear away the noise of the nested modules. The convention of nesting a ClassMethods module to extend on including classes makes sense when there are distinct behaviors to add at the instance and class levels, but when you want the same methods in both places, it's unnecessary noise.

Here's what we get when we just define the behavior in the top-level module.

module GLoc
  # ...
  # useful methods defined here
  # ...
  
  def self.included(target)
    super
    target.extend self
  end
  
  class << self
    include ::GLoc::InstanceMethods
  end
end

That's much easier on the eye. (Note that calls extend on target rather than having its singleton class include the module. That's important because otherwise the included hook method goes into an (indirect) infinite recursion and you end up with a SystemStackError: stack level too deep.)

Next up, we want a nicer way to expose the methods on the module itself (so you can call GLoc.l(...)).

There's a little-used method on Module for exactly that: Module#module_function. It's a visibility modifier like public, private, and protected and supports the same two usage patterns: you can pass it the names of previously defined methods as symbols or you can use it like a pseudo-keyword that affects subsequently defined methods. It's a weird visibility modifier though: it makes the methods it's applied to private (in the Ruby sense, so they're callable only with an implicit self as receiver) but it also creates public copies of the methods as singleton methods of the module. That gives us two of the three usages that GLoc wants to provide, leaving only the "pretend the including class also called extend" feature to implement in the included hook.

Using module_function gets us down to this.

module GLoc
  module_function
  
  # ...
  # useful methods defined here
  # ...
  
  def self.included(target)
    super
    target.extend self
  end
end

Very nice.

Unfortunately that wouldn't actually work for GLoc.

You can't tell from these snippets, because I've hidden the useful methods, but GLoc's useful methods rely on some lower-level methods that the author didn't want to expose as module functions. Private methods (or any other methods that don't have module_function applied to them) aren't copied into the module's singleton class like the module functions, so they're not there to be called: NoMethodError. Luckily the same sort of flexible API can be created without any copying of methods from one method table to another by having the module extend itself.

module GLoc
  extend self
  
  # ...
  # useful methods defined here
  # ...
  
  def self.included(target)
    super
    target.extend self
  end
end

Using extend self is a bit mind-bending. After all it creates this strange situation.

GLoc.is_a? GLoc    # => true

But keep in mind that it's not for use in modules that actually model anything, just for modules that are bundles of functions. In other words, it's for the kinds of objects where you'd never ask what it is_a?, so the surprising line above shouldn't worry you.

GLoc is what triggered this write-up, but the most instantly grok-able example of a utility module is Ruby's Math. A class might include Math so it has convenient access to sin, cos, sqrt, or whatever. While it's far from intuitive that an instance of that class is_a?(Math), we know why Math is sitting in the class's ancestors list: it's a side-effect of how mix-ins work. (For other kinds of modules, it's quite intuitive that mixing in alters the inheritance chain: it's not surprising, for example, that [1, 2, 3].is_a?(Enumerable).)

So in the pros column module_function is clear and declarative. In the cons column it falls down if you want to compose module functions out of lower-level functions and don't want to expose those the same way. It also scores at best a below average on the principle of least surprise test (at least by my intuition) because for each method you def, methods are added to two different method tables, and modifying the module later will only change one of those. If those cons trouble you (either purely on principle or because of actual requirements), you can use extend self. The only real con there is that it's a slightly weird looking idiom.

I'd like to see either module_function or extend self used on every utility module to communicate about what sort of module it is. What do you think?

1 comment:

Ben said...

Thanks for the detailed explanation. This really helped me to understand the pro's and con's of each of these. There really isn't much out there on the extend self technique.