Friday, February 06, 2009

The Ins and Outs of clj-record Validations

One of the first features I tackled in clj-record was a declarative validation API. Here I'll look at how it works, both in terms of how to use it and how it's implemented.

How to declare validations

The init-model form can include an option-group that opens with the keyword :validation like this.

    (:nickname "Nickname is required." #(not (empty? %)))))

Following :validation you provide any number of list forms with the attribute name as a keyword, the message that should be reported if the validation fails, and the function that implements the validation. It must take one argument, the value being validated, and return logical true or false. The function can be defined inline, as shown above, or a reference to a function defined elsewhere. (That is, even though init-model is a macro, your code will be evaluated as you expect.)

Built-in validations are provided in clj-record.validation.built-ins.

; assuming you've required [clj-record.validation.built-ins :as valid]
      "Nickname must be three to fifteen letters."
      (valid/match #"^[A-Za-z]{3,15}$"))))

clj-record.validation.built-ins/match is a higher order function that takes a regular expression pattern and returns a validation function that passes only if the value matches the pattern (in whole or in part, so use ^ and $ in your pattern if you want to match the whole string).

Some of the functions in clj-record.validation.built-ins are, like match, higher order functions that return validation functions. Others, for example numeric?, are simple validation functions that you refer to directly.

    (:age "Age ain't nothing but a number." valid/numeric?)))

Since they're just normal functions, the built-ins can be combined just like any other functions.

    (:age "Age ain't nothing but a number (or nil)."
      #(or (nil? %) (valid/numeric? %)))))

How to run validations

ActiveRecord's validation API is nice looking in the normal case but conceptually a bit nasty. The valid? method looks like a pure predicate, but it mutates the model behind the scenes, clearing and re-populating its errors collection each time it's called. Since immutability is highly valued in functional languages, it was clear a similar approach wouldn't be appropriate for clj-record.

Model validations in clj-record are run using the validate function of the model namespace, which returns a validation-result. (If you can suggest a better name for that, please do.) The result can be inspected with the predicate clj-record.validation/valid? and messages for an attribute can be retrieved with clj-record.validation/messages-for, which takes the attribute name keyword and returns a collection of messages.

How validations work

The details of how validations are implemented shouldn't be relevant for normal use, but since clj-record is very young, it's pretty likely you'll need to look at the internals. If you're new to Clojure (or LISP macros in general) perhaps you'll also find this a useful case-study. I'm also interested in suggestions for improvements (to both the internals and the public-facing API).

First let's look at how option-groups in the init-model form work. The init-model macro uses the opening keyword of each option-group to look up a namespace. The validation option-group opens with :validation, so the namespace will be clj-record.validation.

It then calls a function in that namespace called expand-init-option once for each option-form in the option-group. (In the first example above, the only option-form is "(:nickname "Nickname is required." #(not (empty? %)))," but there can be more.) expand-init-option takes as arguments the model name (as a string) followed by whatever appeared in the option-form and returns a form that will appear in the expansion of init-model.

As you can guess from the examples above, clj-record.validation/expand-init-option takes as arguments the model-name followed by the (unevaluated) attribute name, message, and validation function. It returns a syntax-quoted form that calls clj-record.validation/add-validation with those same arguments verbatim, which in turn adds a validation to the (mutable) model metadata of your model.

So this:

; in a model namespace called
    (:nickname "Nickname is required." #(not (empty? %)))
    (:age "Age ain't nothing but a number." valid/numeric?)))

is equivalent to this:

; in a model namespace called
(clj-record.validation/add-validation "foo"
  :nickname "Nickname is required." #(not (empty? %)))
(clj-record.validation/add-validation "foo"
  :age "Age ain't nothing but a number." valid/numeric?)

since the init-model macro will expand to a form including those exact add-validation calls.

Notice that very little work is happening in the macro-expansion. Keeping the macro layer very thin yields a number of benefits.

  • First, it makes it trivial to allow references in the option-forms to work as you'd expect. In an earlier version of the implementation, model-metadata was built up at macro-expansion time, but it turned out I had to jump through extra eval hoops to get symbols to resolve to the right thing.
  • Second, it keeps things working with AOT (ahead-of-time compilation). After jumping through the eval hoops, I decided to test a pre-compiled model class and realized that the in-memory model-metadata built up at macro-expansion time didn't exist, because macros had been expanded back when I compiled. Oops!
  • Third, it makes testing much easier. I can define a funny validation for just one test and don't have to worry about keeping it passing in other tests. At the end of a test I just repoint a mutable ref back at the old value, and the model-metadata is back to its original shape. (I actually haven't yet used that technique for validation tests but will probably start soon. I do already use it in the tests for callbacks.)
  • Finally, it reduces conceptual overhead. Macros are complicated. Anything you can do to make sure macro-expansion is just a simple conversion into something unmagical helps reduce head scratching.

Internally the validations are stored in model-metadata as a vector, since I wanted to run them in the order they're defined (for no real reason other than predictability). The validation result returned by validate is just a clojure hash map, but I provided valid? and messages-for so the implementation is abstracted. (Currently valid? is just empty? and messages-for is just get, but perhaps they'll get fancier later.)

So much for validations. Next week, maybe callbacks ... or maybe I'll actually do some coding.


tunde ashafa said...

Great work! I'm currently hacking away using clj-record. I noticed it only supported apache derby and I've got a patch for that's passing all the tests for mysql.

John Hume said...

Cool. Please send me a pull request on github (or you can email a patch or whatever).