Thursday, May 14, 2009

The Visitor Pattern and Extensibility

I've been dealing with Dil lately. It's a compiler project for the D programming language, written in D. It makes use of the visitor pattern to provide semantic analysis. (The visitor pattern is a way of achieving dynamic dispatch via strongly typed overloads. It was a clever hack when it was first created.)

One issue with the visitor pattern is that it requires a lot of boilerplate code. If you want to have multiple semantic passes, you need to use an interface, and that interface requires a method for each type. If your visitor only cares about ten types and the interface supports fifty, you have a problem.

This doesn't much matter for dil -- very few types will not matter for any given visitor.

But let's look at a different problem. Let's say we want to log all visitor actions. Where do we do this? There are two choices: every single visitor method, or every single visited class. Similarly if we want to filter visited items, or set a breakpoint, or anything interesting like that.

I'm working on semantic analysis in Dil right now. For cleanness, I would like to split semantic analysis into possibly many phases. However, I would also like to combine semantic passes when possible for efficiency. To do this, I need to create a visitor that will coordinate between several visitors. This is an unreasonably large task with the current Visitor pattern implementation.

What else could I use, though?

In the past, I've used this concept:


void delegate(Node)[ClassInfo] handlers;
void visit(Node node)
{
if (auto ptr = node.classinfo in handlers)
{
auto dg = *ptr;
dg(node);
}
}


This design is sufficiently minimal that it's easy to do a fair bit with it:

  • Log each visited node

  • Use multiple handlers per node type

  • Use the same handler for multiple node types

  • Filter or preprocess nodes based on some criterion available in the base class

  • Ignore various node types by not writing any code for them



There is one problem with it, though: you have to do a lot of casting. In C#, you can use reflection to invoke the methods without caring about their types. In D, you can write a little wrapper:


class Invoker(T : Node)
{
void delegate(T) dg;
void invoke(Node node)
{
debug
{
// safety: cast and check
auto theNode = cast(T)node;
assert (theNode !is null, "expected: " ~ T.stringof ~
"but was " ~ node.classinfo.name);
}
else
{
// efficiency: force cast and assume
auto theNode = *cast(T*)&node;
}
return dg(theNode);
}
}
handlers[type] = &(new Invoker(dg)).invoke;


A slightly simpler version is possible with D2's closures.

There is one problem with our solution: it uses associative arrays, which could be slow.

If you know what types you support in advance, you can map each to an index to get a very fast lookup. This requires a fast means of getting this index, however. I'm generally inclined to just use the associative array; it's going to be small in any case, so it will not incur a significant penalty in most cases.

No comments: