# The Expression Problem in Java (Litterature Review)

21 Aug 2019Previously: The Visitor Pattern in Java 8

Last time I presented a way to implement the visitor pattern, by
taking advantage of Java 8's `default`

interface methods.

In the process I said this was a partial solution to the expression problem, which was defined as:

The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).

Recall that in the context of Java, we can think of a *datatype* as an interface
or parent class, and of a *case* as a class implementing/extending the interface
or parent. When using this interpretation we will call the *cases* "*data
classes*", which is a bit less awkward.

On the other hand, some of the papers we will review will take another interpretation in order to produce an interesting solution.

The solution I presented last time is partial, because it is not strictly type-safe: it uses a cast.

Today, I want to look at the solutions that have been proposed in the litterature, and try to extract their guiding insights, and show their respective strengths and shortcomings.

## The Contenders

While the litterature on the expression problem in Java-like object-oriented languages is surprisingly rich, I want to focus specifically on three papers which I think covers the space of interesting solutions:

**The Expression Problem Revisited: Four new solutions using generics**, Mads Torgersen,*ECOOP 2004*[link with pdf]**Extensibility for the Masses: Practical Extensibility with Object Algebras**, Bruno C. d. S. Oliveira & William R. Cook,*ECOOP 2012*[link with pdf]**The expression problem, trivially!**, Yanlin Wang & Bruno C. d. S. Oliveira,*Modularity 2016*, [pdf]

Since the first paper actually presents four solutions, that gives us 6 solutions to review. I'll also throw my partial solution into the mix for comparison's sake.

## The Problem's Raison d'Être: Ambiguous Call Sites

For the expression problem to be interesting *at all*, it has to involve
ambiguous call sites: the same piece of code has to perform a method call which
could be dispatched to a specialized method for any of datatype cases.

Said otherwise, if every piece of code is statically typed and doesn't involve
any kind of polymorphism (e.g. inheritance or generics), then plain static
overloading is enough, and you don't have a *problem* in the first place.

Therefore, to build a type-safe solution to the expression problem, two big avenues are open.

The first one has to be built into the compiler: the compiler will check that implementations of operations (which can be added by anyone, not just the original author of the datatype) exist for every data class. But this doesn't seem to exist. I said as much in the previous post:

In theory, there is nothing that prevents solving the expression problem at the language level. In an ideal world, we'd just be able to add abstract extension methods that have to be implemented for all classes implementing the interface. The linker would then verify that these methods were implemented for all such classes, and generate the proper virtual method tables. But no such object-oriented language exists.

The second avenue is to somehow *carry* the specialized implementation to the
call sites. This is what every solution in the litterature does, each in its own
way.

That is also how typeclasses work in Haskell. In this case, it's the typeclass instances that carry the operation's implementation to the call site.

In light of this, the statement of the expression problem is somewhat problematic, because it doesn't specify which shape the ambiguous call sites can take.

But I can think of two very interesting examples.

The first one is to apply one of the specialized methods on a list of data class
instances, *whose exact type is not known* (said otherwise, which just know they
are instances of the datatype).

Sadly, no type-safe solution in the litterature can do that. Our solution can, but again, it isn't type safe.

Interestingly, Haskell can do this only if using a compiler extension
introducing existential types. An existential type is basically just a pair of
a type and its associated typeclass instance for a given typeclass. The
existential type just says "here is an instance of *something* that has an
instance for the given typeclass". Then you have to use a list of
existentially-typed values — which crucially you mean you can't reuse a
pre-existing list that isn't existentially-typed. There has to be a way to
(statically) retrieve the correct typeclass instance when constructing the list.

The second example is, fortunately, the one that is always used as a benchmark in the litterature: a tree structure where each node is a data class instance.

This example is easier because it is possible to inject type information while building the tree — something that is not possible with plain lists, but is exactly what we're doing when we're building an existentially-typed list.

## The Benchmark Problem

In particular, the typical example uses trees that represent arithmetic expressions.

This benchmark example was there since the beginning, and is certainly partially
responsible for the name of the *expression* problem.

Our datatype is an `Exp`

type.

The cases for the datatype are:

`Lit`

: an integer literal`Add`

: addition of two expression`Neg`

: negation of an expression — which is added as an extension

Initially, we'll have a single operation: `Print`

which prints a string
representation of the expression to standard output. Later we'll add `Eval`

,
which evaluates the expression.

## Norswap's Solution

To ease us into the problem, let's see a type-unsafe solution to the problem using my formulation of the visitor pattern.

Compared to the previous post, the solution has been simplified/crippled a
little bit for the sake of brevity and better comparison. We no longer use
*implementation interfaces*, which allow the composition of independently
developped extensions (i.e. new data classes or operations).

None of Torgersen's solutions can handle composition. This is excusable, as our
trick (using `default`

methods in interfaces) wasn't available at the time the
paper was written.

## The Choice for Data-Structure Solutions

The nature of the expression problem is that each time we add a new data class, we need to add corresponding implementations for the existing operations. Conversely, each time we add a new operation, we need to implement it for all existing data classes.

Unfortunately, it's not as simple as just writing them. The "compiler" solution that neatly composes everything for us isn't available. Therefore, we will have to take care of the plumbing ourselves.

As long as we keep one dimension fixed, everything is easy. If we have a fixed set of operations, they can be encoded as an interface which we can just implement. If we have a fixed set of data classes, the simple visitor pattern suffices, and we can just implement the visitor interface to add new operations.

Things become tricky when we need to add both new operations and new data classes.

There are fundamentally two ways to do this.

**Option 1**: replacing the data classes. Each time we add a new operation, we
need to extend all data classes so that they may handle the new operation.
Operation's implementation will live inside the data classes.

This option means we need to control/parameterize the creation of our data structure. Whenever we add a new operation, we need to swap the classes that are being instantiated!

**Option 2**: replacing the operations. Each time we add a new data class, we
need to extend all existing operations so that they may handle the new data
class. Operation's implementation will typically live in some kind of visitor
implementation.

This options means we need to control/parameterized the operation's call sites. Whenever we add a new data class, we need to swap the object that holds the operation's implementations, lest it doesn't work for the new data class.

My solution uses option 2.

## Torgersen's 1st Solution: Data-Centered

This is the first solution in the "*The Expression Problem Revisited: Four new
solutions using generics*" (link) paper by Mads Torgersen. Discussion of
the other three solutions will follow.

This is a solution that takes *option 1* from the last section: replacing the
data classes. When adding a new operation, we subclass all existing data
classes. The code that create data strutures needs to be parameterized.

There are two difficulties in this solution that not readily apparent when
*option 1* is stated briefly.

First, in order to make the solution type-safe, it is necessary to know which operations the nodes in the expression tree implement. This means there needs to be someway to "carry the type" to the nodes that are down in the tree.

In this solution, this is done via generics, and in particular the use of a
F-bound: `C extends Exp<C>`

. F-bounds are a crude way to encode "self-types" in
Java. Basically it lets us use `C`

as though it meant "the type of this class"
(or, like here, the type of one of its superclasses or superinterfaces).
However, to use an F-bound, you need to "fix" `C`

. This is the role of all the
classes whose name end with `F`

, such as `class LitF extends Lit<ExpF> implements ExpF`

. Unfortunately, that makes the solution more verbose as we need
to actually add in all of these `F`

classes.

The second difficulty — which is only hinted at in the paper — is the need to carry the node creation logic to the places where you would normally call a data class constructor. Since there may be a lot of different types of nodes, it makes sense to collect the creators in a factory.

The issue with that is that each time you add a new data case you need to extend the existing factory. Each time you add a new operation, you not only need to extend every data class, but also to create a whole new factory that return instances of these new classes.

So this works, but it's quite verbose and it's relatively annoying that we actually need to change the data classes being used when we add new operations.

## Torgersen's 2nd Solution: Operation-Centered

This solution takes *option 2* outlined above. Each time we add a new data
class, we need to extend all existing operations so that they may handle the new
data class. Operation's implementation live in a visitor implementation.

This is also what my own solution does, I will explain the difference below.

However, there is a big pitfall that comes from the need to be type-safe. For the initial data classes, there are no issues. But when a new data class is added, it is necessary to add a new visitor interface. This is as expected.

However, each data class instance has to know the requirements on the visitors
it can handle. To use our previous examples, a `Neg`

node can only handle
visitors that implement `NegVisitor`

.

But it doesn't stop there. If an `Add`

node has a `Neg`

child, it too should
only accept visitors that implement `NegVisitor`

— since they can invoke the
visitor on their children.

Again, generics come to the rescue: we parameterize all data classes with `<V extends Visitor>`

(for the initial data classes) or `<V extends NegVisitor>`

(for `Neg`

— same principle would apply if we added new data classes).

This doesn't entirely fix the problem. In the `visit`

methods, it wouldnt work
to call, for instance, `add.left.accept(this)`

. Why? Because there is no
guarantee that `this`

has type `V`

.

Torgersen comes up with a really neat trick to solve this issue, which is to let
the visitor accept itself as an extra parameter of type `V`

. This parameter will
be supplied by the `accept`

methods: `visitor.visit(visitor, this)`

(where
`this`

is a an instance of a data class such as `Add<V>`

). Since `visitor`

has
type `V`

there, this type-checks ok.

The cost? Once again, we can't reuse our expression trees. They now have to be parameterized differently depending on added data classes. So creation logic has to be parameterized by the proper visitor interface (note you can't really see this is our simplistic demo code). At least, we don't need verbose factories this time.

## Torgersen's 3rd Solution: Operation-Centered with Object-Level Extensibility

This solution is an extension of the second solution. The goal is to relax the requirement on the control of all instantiation sites.

From the perspective of the previous solution, the goal here is to make it
possible to reuse old `Add<Visitor>`

and `Lit<Visitor>`

trees that were
instantiated without knowing that `Neg`

existed. These node will still work if
passed a `NegVisitor`

(which extends `Visitor`

)!

The author calls this ability to reuse old trees *object-level extensibility*.

And since the old trees were created before we added `Neg`

, they couldn't
contain `Neg`

nodes, and so using plain `Visitor`

implementations is fine as
well.

Achieving object-level extensibility is actually pretty easy. In the data
classes, just change the node's children type to `Exp<? super V>`

. Without
entering into the details, this means that `Add<NegVisitor>`

may have children
that with type `Exp<Visitor>`

or `Exp<NegVisitor>`

. On the other hand
`Add<Visitor>`

may not have a child of type `Exp<NegVisitor>`

This effectively enables reusing old trees in newer trees.

There is only one catch: your ability to rewrite the trees becomes limited.
Since `Add<Visitor>`

may not have children of type `Exp<NegVisitor>`

this may
hamper your ability to write involved tree rewrite logic that would need to
assign a newer tree as a child of an older tree.

However, as the author correctly notes, there are plenty of use-cases (maybe even most of them) that do not involve such kind of tree rewriting.

If type-safety is a must-have for you and you don't need tree rewrites, this is pretty good. You'll still pay a cost of sorts by having to carry these annoyings type parameters everywhere.

## Torgersen's 4th Solution: Type-Unsafe Hybrid

This one is interesting too.

Torgersen starts from the an operation-centered visitor solution (much like his
second solution) but pairs each operation (i.e. visitor implementation) with an
interface that defines the signature of the operation. Data classes can choose
to implement this interface. If they do so, the operation will notice (via an
`instanceof`

check) and call the implementation — otherwise it falls back on the
visitor pattern.

The solution isn't type-safe because Torgersen opts not to force the children of
each data class node to encode their visitor interface. So instead of being
typed as `Exp<V>`

or `Exp<? extends V>`

as in solution 2 and 3 respectively,
they are simply typed as `Exp`

.

The payload of foregoing type-safety here is that control over the creation logic is no longer necessary. You can finally have data classes whose types and implementations don't change depending on subsequent extensions. In that, it is similar to my solution.

If you know in advance all operations you need to implement, you can also avoid extending existing operations when you introduce a new data class, by having the data class implement the operations' associated interfaces.

Because the lack of type safety, when an expression accepts a visitor, it must verify that this visitor actually can handle the expression's data class or fall back on some general behaviour (at worse, throw a runtime exception, which is what my solution does).

To make all of this work, the solution involves some helper super-classes, which can be slightly confusing.

I'd also argue that the most use of generics in this solution is woefully
unecessary — it just saves on inlining two lines of logic into every visitor,
which you sort of have to do *anyway* because Java doesn't support `instanceof`

on generic type arguments (which are erased). Hence I made a simplified
solution that eliminates non-essential generics use, and
simplifies the scaffolding considerably, making it **much** easier to
understand, in my humble opinion.

Compared to my solution, this one is more complicated, but has the important benefit that data classes can be added without extending all operations individually, greatly reducing verbosity in that scenario.

## Object Algebras

We now discussion the solution from the "Extensibility for the Masses: Practical Extensibility with Object Algebras" paper (link). This one is quite different from those we discussed previously, and conforms to neither of our two options — because it doesn't encode data as a data structure at all!

Instead, data is encoded as a tree of method calls. Here is an example:

```
interface Algebra<E> {
E lit (int value);
E add (E left, E right);
}
public static <E> E expression (Algebra<E> a) {
return a.add(a.lit(1), a.lit(2)); // 1 + 2
}
```

The `expression`

method encodes the expression tree `1 + 2`

made of an "add
node" with two "literal node" child. Of course, there are no such nodes — it's
just a method!

To do anything useful with `expression`

, we need to supply an `Algebra<E>`

where
`E`

is an (unconstrainted) type used to represent the result of an operation on
one of our "nodes". So for instance, `a.lit(1)`

will return a value of type `E`

.
The `add`

method returns a value of type `E`

, but also takes as parameter two
values of type `E`

, corresponding to the result of "evaluating" its two
operands.

To define an operation, we need to implement `Algebra<E>`

. Here is a full
implementation that uses the same example as previously:

So there our operations are printing and evaluation. We actually use two different techniques for these two operations.

In the case of `PrintAlgebra`

, we implement `Algebra<Print>`

where `Print`

is a
functional interface we defined with a `print()`

method. Therefore, calling
`expression(new PrintAlgebra())`

will return an object that can be used to print
the expression.

This is not the most direct avenue — we could have opted to implement
`Algebra<String>`

instead and have `lit`

and `add`

return their string
representation directly. In fact, we take this approach with `EvalAlgebra`

which
implements `Algebra<Integer>`

— there, `lit`

and `add`

directly return the
integer they evaluate to.

Finally, a really neat trick not mentionned in the paper is that we can build an
actual data structure from the functional encoding. For this, simply make an
algebra that implements `Algebra<Exp>`

or `NegAlgebra<Exp>`

(depending on your
needs, and assuming `Exp`

is the parent class) and have each method return the
node it corresponds to.

Turning these data structures back into an algebra encoding is unfortunately not
possible. One could imagine that `Exp`

has a `E visit(Algebra<E>)`

method that
is overriden in data classes to simply call the corresponding algebra method.
The problem happens when you have introduced new data cases. If you added a
`neg`

method in `NegAlgebra<E>`

, now you need the signature to be `E visit(NegAlgebra<E>)`

. This is *almost* feasible, supposing we could
parameterize `Exp`

as follow: `Exp<A extends Algebra>`

and then define the
method `E visit (A<E>)`

. Unfortunately that would make `A`

a higher-order type
(i.e. a type that takes a parameter, here `E`

) and Java doesn't have those.

Of course, you could just use `Algebra<E>`

as a bound and add a cast in there,
type-unsafe but effective.

When dealing with object algebra, one may be concerned that it's no longer
possible to "build a data structure dynamically", i.e. that all data must be
statically defined like in our `expression`

method.

That is fortunately not the case. Since the algebra encoding of an expression is just method calls, any execution flow that calls algebra methods can yield expressions. And execution can contain conditions, loops, etc. One potential pitfall is that the whole construction logic needs to be re-run each time we want to run an operation of our data. If the construction logic is expensive, this can be a problem. Fortunately there is a solution: simply return a function object that encodes the expression:

```
public E expr1 (NegAlgebra<E> a) {
return expensive_predicate()
? a.add(a.lit(1), a.lit(2))
: a.add(a.lit(1), a.neg(a.lit(2)));
}
// use: expr1(my_algebra);
// slow!
public E Function<NegAlgebra<E>, E> expr2() {
return expensive_predicate()
? a -> a.add(a.lit(1), a.lit(2))
: a -> a.add(a.lit(1), a.neg(a.lit(2)));
}
// use: expr2.apply(my_algebra);
// fast!
```

When we pass an algebra to `expr2`

, `expensive_predicate()`

is not run — it is
only run once when the `expr2`

is created.

Finally, object algebra make "tree reuse" easy. You can compose an expression
built with an `Algebra<E>`

and one built with a `NegAlgebra<E>`

pretty easily:
the trick is that they only interface using `E`

, so as long as `E`

is the same,
anything goes. Of course, this means you have to use *compatible* algebras. It
could be argued that is not type-safe (or that it is another advantage): nothing
prevents you from using two algebra with different semantics together, passing
the result of one to the other... as long as `E`

is the same.

The paper mentions other interesting possibilities: multi-parameter algebra
(mimicking type families), combinators for automatic combination of multiple
algebra, as well as allowing extension of the `E`

parameter (e.g. `Eval<E extends Number> implements Algebra<E>`

), ...

There is a lot to like about object algebra, it's a really elegant technique —
in fact it's the shortest implementation, and one of the most readable. It has
many advantages, from the possibility of building a real data structure to "tree
reuse". Perhaps its main weakness is being a bit *too* alien to be integrated in
many code bases, where one will *need* to have actual data structures. For
instance, in my Autumn parsing library, the parser combinators can't be
canonically represented as function calls — while it is possible to implement
parsing that way (by using objects similar to `Print`

in our example), it would
be incredibly slow. However it is possible to use an object algebra to build the
parser combinator graph, and to reuse the encoding for visiting it. However,
this is only possible because this graph is immutable (and so will always stay
in sync with its functional encoding).

It's definitely a technique to keep in the back of your mind.

## Covariant Return Types

Finally, we look at our last solution, from the paper "The Expression Problem, Trivially!" (link).

This solution is very very close to Torgersen's first (data-driven) solution, but the essential differences between both is that this solution foregoes the use of generics in favor of covariant return types. What are covariant return types? In Java, you can override a method by a method with a different return type, but only if that type is a subtype of the original return type, for instance:

```
abstract class A {
abstract Object foo();
}
abstract class B extends A {
@Override String foo();
}
```

This works because `String`

is a subclass of `Object`

. And if you didn't know,
yes you can override an abstract method without implementing it — that's an
essential feature needed in the covariant solution.

Let's have a look at the solution:

Whereas Torgersen's solution encodes the expression type as an F-shape bounded
generic type parameter (`C extends Exp<C>`

), and subsequently types children
using this type (e.g. `C left, right;`

in the `Add`

class; the covariant
solution defines the children as abstract method whose return type is `Exp`

.

Both solution need a "fix class": in Torgersen's solution, the class fixes `C`

:
`AddF extends Add<ExpF>`

and later `EvalAddF extends EvalAdd<EvalExpF>`

. In the
covariant solution, the methods are overriden with the actual expression type:
simply `Exp`

in the base case, but `EvalExp`

in the "eval" extension.

This is rather neat, and absent some issue I didn't think of, seems strictly superior Torgersen's solution. It does however come with that solution's other pitfalls, including the need to parameterize the construction logic. You'll note we didn't include factories in our code for this solution, but we did in Torgersen's solution. Don't let this fool you: they are equally needed (or can equally be dispensed with) in both cases.

## Discussion & Recommendations

I came out of this article having learned a lot more than I expected going in. The impetus for this article was that I couldn't clearly articulate how the different solutions worked and how they related to each other.

I also wanted to make the point that they were needlessly complex and that my solution was better. Having done the research, I wouldn't say that this is necessarily true. Hence the little discussion now to be had about what should be used when.

First off, you should try to determine your requirements as precisely as possible. In particular:

- Do you need strict type safety? How do you define that? (What is not allowed to happen?)
- Do you care about independent extensibility: if two different developers extend the base framework and redistribute their exensions, can a third developer come along and compose their extensions without resorting to modifying the code written by the two first developers?
- Who is going to use the solution? What is its area of surface?

With that in mind...

First off, if you care about independent extensibility, you have no choice but
to use my solution. The others *might* be modified to accomodate it, by using
Java 8's `default`

interface methods — but you'll have to figure that yourself.
Do keep in mind that this aspect of it wasn't shown in the example
code but is explained in the previous post.

Beware that independent extensibility does add a lot of boilerplate, and likely will in other solutions too. And remember my solution isn't type-safe.

If you need perfect type safety (do you really?) **and** you're using immutable
trees, I would go for Torgensen's 3rd solution (operation-centered with
object-level extensibility).

In general, I would try to think hard about whether object algebras can be used in your use case. In a sense, they're the most elegant solution. One big caveat: I would think twice about using them to build a data structure — now you have two representations to keep in sync, and double duties.

In general, I feel like the sweet spot for them is either small localized things, or a central paradigm around which everything revolves. I'd be uneasy about making an object algebra one of many big moving parts in a program. My programmer's intuition say this way lay clunky mixed-metaphors.

I would avoid using the data-centered solutions (Torgersen's 1st and the
covariant solution) **unless** object instantiation is very tightly controlled
or centralized in your program. Playing with factories is not super fun.

A few more observations:

My solution and Torgersen's 4th (hybrid) solution are pretty much tied. Mine is guaranteed to work with independent extensibility (with the proper boilerplate), but Torgersen's will also work fairly often. But for instance, it won't work if two people introduce new data classes and implement an old visitor for these classes — there is no easy/safe way to "merge" the two implementations. However, it would more natural to implement the operations directly into the data class in this case! Torgersen's solution can also lead to less boilerplate in the case where you never have to deal with independent extensibility.

The covariant solution strictly dominates Torgersen's 1st (data-centered) solution.