Generic operators

Note: this article was (largely) written by Marc Gravell. The classes mentioned as a solution are part of MiscUtil (usage page). Marc and I are collaborating on a "push" model for LINQ, which will be described when fully implemented - Jon.

Background

.NET 2.0 introduced generics into the .NET world, which opened the door for many elegant solutions to existing problems. Generic constraints can be used to restrict the type-arguments to known interfaces etc, to ensure access to functionality - or for simple equality/inequality tests the Comparer<T>.Default and EqualityComparer<T>.Default singletons implement IComparer<T> and IEqualityComparer<T> respectively (allowing us to sort elements for instance, without having to know anything about the "T" in question).

With all this, though, there is still a big gap when it comes to operators. Because operators are declared as static methods, there is no IMath<T> or similar equivalent interface that all the numeric types implement; and indeed, the flexibility of operators would make this very hard to do in a meaningful way. Worse: many of the operators on primitive types don't even exist as operators; instead there are direct IL methods. To make the situation even more complex, Nullable<> demands the concept of "lifted operators", where the inner "T" describes the operators applicable to the nullable type - but this is implemented as a language feature, and is not provided by the runtime (making reflection even more fun).

What do we want to do?

A question seen very frequently on forums etc is "how can I do maths with generics?". Usually people just want a simple sum, average etc of data, but when people find themselves using generics this becomes an almost impassable barrier. There are some existing solutions that involve passing helper classes around (usually by some bespoke interface on T), but this involves a lot of extra coding, and pollutes the API slightly by having to include this extra passenger. Compare this to the common uses Comparer<T>.Default for finding minimum/maximum; there is no extra work required there, so why should there be for sum/average?

How are operators implemented?

Consider the following simple overloads:

// primitive operator
public static float Add(float a, float b) {
    return a + b;
}
// operator defined by the Type
public static decimal Add(decimal a, decimal b) {
    return a + b;
}
// lifted operator (defined by the T in Nullable<T>)
public static decimal? Add(decimal? a, decimal? b) {
    return a + b;
}

Conceptually they are very similar - but the implementations are very different; the first simply uses the "add" IL instruction; the second uses the op_Addition static method on System.Decimal, and the third needs to check that both arguments have values (with .HasValue), and if so uses System.Decimal.op_Addition on the inner values (GetValueOrDefault()), and finally wraps the resulting decimal in a Nullable<T> wrapper.

The different ways in which operators can be implemented makes it very undesirable to wrap (or worse: write IL for); there would be a lot of special cases. Personally I find it very strange that Microsoft didn't provide the standard op_Addition etc on the primitive types (Int32, Single, etc); for sure, the compiler doesn't need them, but it would have made generic operators far easier to implement (i.e. if you know that you just look for op_Addition etc, with Nullable<T> as a special case).

How can we do it, then?

OK; we've seen that operators are tricky; so what else can we do? As it happens, .NET 3.5 included some new classes for expressing simple operations in a very abstract way; it also included code to compile these operations into regular code at runtime. This is the Expression class (and associated subclasses), introduced for LINQ, but with many other powerful applications. Maybe we could use these to do our maths?

The following demonstrates building an Expression for adding two numbers (based on generics), and then compiling this Expression into a Func<T,T,T> (i.e. a function that accepts two arguments of T, and returns a T):

static T Add<T>(T a, T b) {
    //TODO: re-use delegate!
    // declare the parameters
    ParameterExpression paramA = Expression.Parameter(typeof(T), "a"),
        paramB = Expression.Parameter(typeof(T), "b");
    // add the parameters together
    BinaryExpression body = Expression.Add(paramA, paramB);
    // compile it
    Func<T, T, T> add = Expression.Lambda<Func<T, T, T>>(body, paramA, paramB).Compile();
    // call it
    return add(a,b);       
}

As the comment indicates, this still leaves a lot to be desired with regards to caching the compiled function, but indicates the general approach. The actual code makes use of static classes and static constructors to cache the operators efficiently, and uses some shared code in ExpressionUtil to simplify the construction of the various Add, Subtract etc operators - but the theory is the same.

Isn't this expensive?

Well, compiling the operators isn't trivial, but the static constructor ensures that we only do this once for each signature; however, once we have the function (Func<T,T,T>), the JIT-compiler is remarkably good at inlining the simple code that invokes the methods; enough that runtime performance is within a few percent of regular compiled code. To be honest this surprised me too when I saw it, so scepticism is allowed; but I urge you to test it to see for yourself. An interface-based approach wouldn't have been inlineable (due to the virtual call); and a simple object-based/GetType()/switch approach would have introduced boxing and branching (and masses of code), so would have been much slower (and harder to implement).

So can it do anything else?

Interestingly, yes! Because we haven't restricted ourselves to any knowledge of the type, it means that we can write code that operates on any type. Perhaps you have a ComplexInt32, or a DecimalMatrix, etc. As long as you have declared suitable operators, the Expression approach will find your operator and use it. This is a credit to the Microsoft team responsible for the Expression implementation; well done!

As a direct example: regular LINQ provides (via System.Linq.Enumerable) a range of overloads for Sum() and Average() to cover different datatypes - 20 of each in fact (half of those are for "selectors"). Of course, overloads don't help much when you only have a T...

Using the generic operators, it is possible to cover all of these and everything else you can consider with just 2 methods (1 with and 1 without a "selector"). And in fact, we've implemented these 2 methods so that one simply calls the other, so in fact we've only had to write a single method. Now that is code re-use.

What about compile-time checking?

Yes; good point. Put simply; it can't do this. In reality this is unlikely to be a problem; if you are throwing a type at a reusable block of generic code to do a "sum" or similar, it is likely that you know that "add" makes sense for that object. It is no different to EqualityComparer<T>.Default: this also doesn't provide compile-time safety, but that doesn't mean it isn't useful in almost every case when you would use it.

Other approaches

Marc is not the first person to tackle this problem, although his approach is the most practical that I've seen. There are plenty of thoughts around the web about this, including a CodeProject page by Rüdiger Klaehn, which is the most comprehensive resource I've seen on the topic (partly because it references plenty of other approaches).


Back to the main page.