Friday, October 19, 2007

Some Rambling on xUnit Testing Style

On our drive back to Brooklyn from the client site last night, Patrick and I were talking about testing style.

I mentioned that it had bothered me for a long time that the base class we extend in (most) xUnit frameworks is called "TestCase" even though any good test class describes multiple test cases. [To help fend off semantic confusion, here's my idea of test case identity: One test case is a set of inputs and stimuli to the code under test. There may be several things to assert about what goes on during a test case or the state of things after its run, but the inputs and stimuli are constant. When you vary them, you have another test case.]

Since we try to minimize the number of assertions in a given test (ideally keeping it to one, though in contrast to Jay I'm personally fine with that one assertion being complex) we often have multiple test methods that assert on the same test case (by the definition above) and could therefore share setup and teardown code. But since we also need tests to exercise other scenarios, we can't use the framework's setup method to create the scenario unless we're willing to set up all the stuff we need in all our tests, what you might call a "superset fixture," which feels wrong. Incidentally, I think the reason not many people use the term "fixture" for the stuff you set up for your tests is that it's always been a pretty weak concept in practice: either you have a mess of objects that various tests will use in various ways or you're setting up so little that it's barely worth talking about.

It may be practical to set up a superset fixture when doing simple state-based testing, but if you're dealing with a "wide" object, it means a really noisy setup.* When using mocks it's actually impossible to fully set up more fixture than you'll need in every test: a mock set up but not used will fail the tests that don't satisfy its expectations.

So besides having the name 'TestCase' that doesn't seem to make sense, we have these facilities for setting up and tearing down that we don't leverage much.

Then I had what I thought might be an important insight right there in the car. Maybe the class was called a TestCase and had just one setup method because it was originally intended to describe just one case, with each test method just asserting something different about the scenario created in the setup. If so, the setup could even include the stimulus of the code under test, reducing test methods to nothing but assertions. Maybe what's made both the naming and the use of shared fixture-setup seem awkward all this time is that we've tied ourselves to creating one TestCase class per production class, when all along we could have had a TestCase class for each scenario we wanted to test, with most having a very small number of test methods.

Here's a super-simple example of the sort of thing I was imagining, though my imaginings were a lot more abstract.

require 'test/unit'
require 'set'

class Set::EmptyTest < Test::Unit::TestCase
  def setup
    @set = Set.new
  end
  
  def test_size_is_zero
    assert_equal 0, @set.size
  end
  
  def test_empty
    assert @set.empty?
  end
end

class Set::AdditionTest < Test::Unit::TestCase
  def setup
    @set = Set.new
    @set.add 5
  end
  
  def test_size_is_one
    assert_equal 1, @set.size
  end
  
  def test_contains_added_item
    assert @set.include?(5)
  end
  
  def test_not_empty
    assert !@set.empty?
  end
end

class Set::DeletionTest < Test::Unit::TestCase
  def setup
    @set = Set.new [:abc, 5]
    @set.delete 5
  end
  
  def test_size_is_one
    assert_equal 1, @set.size
  end
  
  def test_no_long_contains_deleted_item
    assert !@set.include?(5)
  end
  
  def test_still_contains_other_item
    assert @set.include?(:abc)
  end
  
  def test_not_empty
    assert !@set.empty?
  end
end

All these little test cases could be a maintenance headache if they each lived in their own file, but it might not be too bad if you gave up the one-class-per-file convention. Although I'm not a good student of history, I knew xUnit frameworks started with one in Smalltalk, and it seemed like this one-TestCase-class-per-scenario approach might have been really convenient in a development environment where all code was organized hierarchically without the bother of source files that might need to be moved, renamed, etc when changing tests. I've only run Squeak long enough to build a trivial Seaside application, so I was speculating, but I could imagine it being pretty handy to organize tests with one package per class under test, then a class per test case, each with a setup, then a test method for each assertion to be verified.

In Ruby we would also do some metaprogramming to reduce the noise and make the test code more intentional. Maybe something like this.

testcase_for 'an empty set' do
  
  setup { @set = Set.new }
  
  test('size is zero') { assert_equal 0, @set.size }
  
  test('empty') { assert @set.empty? }
end

testcase_for 'adding an item to a set' do
  setup do
    @set = Set.new
    @set.add 5
  end
  
  test('size is one') { assert_equal 1, @set.size }
  
  test('contains added item') { assert @set.include?(5) }
  
  test('not empty') { assert !@set.empty? }
end

You probably noticed this looks a lot like RSpec contexts, which gets at why I was so excited. I wondered if Kent Beck's original intent had been something much closer to BDD, and it had just taken the rest of us a long time to catch up.

So when I got home I went digging around for articles about unit testing style and found surprisingly little. I also looked for anything about the original intent of the framework. (Googling these topics was a little depressing because of all the weak information, plagiarism, and content spam.)

My search stopped when I found the Kent Beck article where he originally presented the unit testing framework pattern we now know so well. (I believe that was first published in The Smalltalk Report in October 1994. Thanks to Farley for digging up that obscure nugget.) I was disappointed to find that the example in that first article conforms pretty much exactly to the classic form of unit test we've all seen before, including a setup method that sets up more fixture than any one test method uses.

In case you find the Smalltalk a little painful to read, here's my translation of Beck's example test case to Ruby. (The examples above, you'll now see, are based on Beck's.)

require 'test/unit'
require 'set'

class SetTest < Test::Unit::TestCase
  
  def setup
    @empty = Set.new
    @full = Set.new [:abc, 5]
  end
  
  def test_add
    @empty.add 5
    assert @empty.include?(5)
  end
  
  def test_delete
    @full.delete 5
    assert @full.include?(:abc)
    assert !@full.include?(5)
  end
  
  def test_illegal
    begin
      @full[0]
      fail
    rescue NoMethodError
      # expected
    end
  end
end

So that was a bit of a letdown. On the other hand, I did finally learn why the base class is called a TestCase when it represents so many different test cases.

As a test writer, you tend not to think of your TestCase subclass as a normal class. All the instantiation and running is in the framework, and none of your code ever interacts with TestCase instances, so their life-cycle (their very existence as normal objects) is usually irrelevant to you as a user of the framework. From the framework's point of view, however, their life-cycle is central.

As you may or may not know, your xUnit runner creates one instance of your TestCase class for each test method, passing to the constructor the name of the test method the new instance will run. Sure, well written setup and teardown methods would allow the runner to use one instance for all the test methods, but that would require the test writer not to accidentally leave state hanging around in the instance. Why put that burden on the framework user when the framework can just as easily start with a completely clean slate every time?

So the framework creates one TestCase instance per test method, each of which is a test case of its own. It works in the intuitive sense of "test case" as well as in OO terms. Score one for the forefathers of agile software development!

I'm interested to hear what styles people have used or seen in unit test/spec suites. Have you tried creating multiple TestCases per class to keep individual test case classes more maintainable? Did it work out? What about BDD specs? Do you find having the structure of the test or spec map directly to the purpose of the code (as opposed to having private test helper methods scattered around your source file) advantageous? How far have you gone with keeping test setup all inline in the tests themselves? Let me know.

* Yes, wide objects that are hard to set up for test are a smell that the code under test may be poorly factored, but if there's anything we've learned driving back and forth from Brooklyn to central New Jersey every week, it's that you often have to live with bad smells, though you should always be on the lookout for a route that avoids them without making you late.

No comments: