Friday, September 12, 2008

Keep Backtraces Honest: Set Forwardable::debug

If you use Forwardable at all, you may have noticed that when you misspell or forget to implement something, the backtraces can be a little baffling. Take this example.

require 'forwardable'

class Foo
  extend Forwardable
  def_delegator :a, :bar # a is not defined
  def_delegator :b, :baz # b is defined but returns nil
  def b; end
end

f = Foo.new

When you call f.bar, you'll get an unsurprising NameError: undefined local variable or method 'a' for #<Foo:0x1044fec>. The backtrace will point at the line where you called bar, which is a little weird, but since that line doesn't have any mention of 'a' on it, you'll probably know to go looking for bar and discover the missing (or misspelled) delegate accessor.

When you call f.baz, you'll get an unsurprising NoMethodError: undefined method 'baz' for nil:NilClass. But, again, the backtrace will point at the line where you called baz, and here you're much more likely to go chasing down the wrong problem. If you really created your own instance of Foo just before that line, you're probably not going to worry that Foo.new returned nil. But what if you'd gotten that instance of Foo from some other call? The backtrace suggests that other call returned nil.

It's a dirty lie!

You have a perfectly good instance of Foo. The real problem is that its delegate accessor returned nil.

So why does the backtrace lie?

The first time I ran into this, I assumed Forwardable was implemented using some weird native magic that kept it from appearing in the backtrace. That seemed odd, since it doesn't do anything you couldn't do from normal Ruby code, but I didn't have time to dig into it.

When I finally did, I was surprised to find that it's normal Ruby code, but the method it writes (via module_eval) is (as interpolated for this example):

def baz(*args, &block)
  begin
    b.__send__(:baz, *args, &block)
  rescue Exception
    $@.delete_if{|s| /^\\(__FORWARDABLE__\\):/ =~ s} unless Forwardable::debug
    Kernel::raise
  end
end

As you'll have guessed, "(__FORWARDABLE__)" is passed as the file name to module_eval, so Forwardable's default behavior is to delete itself from backtraces, potentially making them misleading and wasting a lot of debugging time.

I don't know why it does that, but thankfully the authors realized you may not want that and made the hiding conditional on Forwardable::debug being false.

I highly recommend that any application using Forwardable has some bootstrapping code set that flag.

Forwardable.debug = true

Cleaner Utility Modules with Module#module_function or extend self

[Disclaimer: GLoc is a nice and generally well written library, and I'd encourage anyone who needs to translate strings to consider it. Also strongly consider i18n, particularly if you're using Rails 2.x.]

My pair and I were looking at the code for GLoc this week. When a class includes the GLoc module, it makes its methods available on instances of the class (as one would expect with include) as well as on the class itself (which would normally only happen with a call to extend). Those same methods are also available as module methods of GLoc itself. So all of the following will work.

GLoc.l(:hi)

class Greeter
  include GLoc
  
  def greet
    l(:hi)
  end
  
  def self.greet
    l(:hi)
  end
end

What caught our eye was how the library goes about making itself so available. Minus all the functionality, here's what it does.

module GLoc
  module InstanceMethods
    # ...
    # useful methods defined here
    # ...
  end
  
  include ::GLoc::InstanceMethods
  
  module ClassMethods
    include ::GLoc::InstanceMethods
  end
  
  def self.included(target)
    super
    class << target
      include ::GLoc::ClassMethods
    end
  end
  
  class << self
    include ::GLoc::InstanceMethods
  end
end

Patrick's comment was "I don't know off-hand what the cleanest way is to do that, but I know that ain't it."

So what's the cleanest way?

First let's clear away the noise of the nested modules. The convention of nesting a ClassMethods module to extend on including classes makes sense when there are distinct behaviors to add at the instance and class levels, but when you want the same methods in both places, it's unnecessary noise.

Here's what we get when we just define the behavior in the top-level module.

module GLoc
  # ...
  # useful methods defined here
  # ...
  
  def self.included(target)
    super
    target.extend self
  end
  
  class << self
    include ::GLoc::InstanceMethods
  end
end

That's much easier on the eye. (Note that calls extend on target rather than having its singleton class include the module. That's important because otherwise the included hook method goes into an (indirect) infinite recursion and you end up with a SystemStackError: stack level too deep.)

Next up, we want a nicer way to expose the methods on the module itself (so you can call GLoc.l(...)).

There's a little-used method on Module for exactly that: Module#module_function. It's a visibility modifier like public, private, and protected and supports the same two usage patterns: you can pass it the names of previously defined methods as symbols or you can use it like a pseudo-keyword that affects subsequently defined methods. It's a weird visibility modifier though: it makes the methods it's applied to private (in the Ruby sense, so they're callable only with an implicit self as receiver) but it also creates public copies of the methods as singleton methods of the module. That gives us two of the three usages that GLoc wants to provide, leaving only the "pretend the including class also called extend" feature to implement in the included hook.

Using module_function gets us down to this.

module GLoc
  module_function
  
  # ...
  # useful methods defined here
  # ...
  
  def self.included(target)
    super
    target.extend self
  end
end

Very nice.

Unfortunately that wouldn't actually work for GLoc.

You can't tell from these snippets, because I've hidden the useful methods, but GLoc's useful methods rely on some lower-level methods that the author didn't want to expose as module functions. Private methods (or any other methods that don't have module_function applied to them) aren't copied into the module's singleton class like the module functions, so they're not there to be called: NoMethodError. Luckily the same sort of flexible API can be created without any copying of methods from one method table to another by having the module extend itself.

module GLoc
  extend self
  
  # ...
  # useful methods defined here
  # ...
  
  def self.included(target)
    super
    target.extend self
  end
end

Using extend self is a bit mind-bending. After all it creates this strange situation.

GLoc.is_a? GLoc    # => true

But keep in mind that it's not for use in modules that actually model anything, just for modules that are bundles of functions. In other words, it's for the kinds of objects where you'd never ask what it is_a?, so the surprising line above shouldn't worry you.

GLoc is what triggered this write-up, but the most instantly grok-able example of a utility module is Ruby's Math. A class might include Math so it has convenient access to sin, cos, sqrt, or whatever. While it's far from intuitive that an instance of that class is_a?(Math), we know why Math is sitting in the class's ancestors list: it's a side-effect of how mix-ins work. (For other kinds of modules, it's quite intuitive that mixing in alters the inheritance chain: it's not surprising, for example, that [1, 2, 3].is_a?(Enumerable).)

So in the pros column module_function is clear and declarative. In the cons column it falls down if you want to compose module functions out of lower-level functions and don't want to expose those the same way. It also scores at best a below average on the principle of least surprise test (at least by my intuition) because for each method you def, methods are added to two different method tables, and modifying the module later will only change one of those. If those cons trouble you (either purely on principle or because of actual requirements), you can use extend self. The only real con there is that it's a slightly weird looking idiom.

I'd like to see either module_function or extend self used on every utility module to communicate about what sort of module it is. What do you think?