Expressiveness vs Efficiency in C#

Expressiveness vs Efficiency in C#

I do a lot of my work in C#. I love C#, especially after drowning in C for a while, because of its strong, expressive type system, managed memory, and its mixed-paradigm nature.

There are two big problems I find in C# though. One is that it’s often verbose, but I won’t get into that today. The other is that although the language provides nice abstractions for some things (like IEnumerable which I use all the time), I find there is almost always a conflict between a succinct solution and a performant solution. A succinct solution is one that uses the high-level abstractions, while a performant solution is one that doesn’t use them because of their cost. I find often that a succinct solution uses some of the functional features of C#, while a performant solution uses the imperative features.

Lets say we want to sum all the numbers between 1 and 100 which are not multiples of 3. A direct representation of this in C# using a functional notation is this:

x = Enumerable.Range(1, 100).Where(i => i % 3 != 0).Sum();

This nice line of code reads almost exactly like the English version. Unfortunately this method is slow. We’re using the interface IEnumerable to iterate through the numbers, and it’s calling a lambda function for the predicate to filter out the multiples of 3, and then again using the interface to iterate through the numbers to sum them. I don’t know exactly how much this of this the optimizer removes, but on my machine it averages about 1500ns to execute this line.

On the other hand, here is the equivalent imperative code:

int sum = 0;
for (int i = 1; i < 101; i++)
    if (i % 3 != 0)
        sum += i;
x = sum;

This essentially means exactly the same thing: for the numbers between 1 and 100, for numbers that aren’t multiples of 3, sum them. I’ve explicitly not done anything clever here, like stepping in increments of 3 instead of 1. This code is meant to be as close to the functional code as possible. It executes in about 140ns on my computer – more than 10 times faster than the functional equivalent.

Of course there is another implementation that does the same thing:

x = 3367;

By the same timing methodology, this last implementation takes about 0.28ns on my machine, although at this level the result is pretty meaningless except to say that it is blindingly fast.

Now, I’m not an expert on optimization by any means, but I would say that both of the latter implementations are almost trivially apparent from the first implementation. The compiler or JIT optimizer could generate them just by “inlining” things that are known to hold constant for certain call.

In fact I think it’s so easy that I’m going to demonstrate it (partially). First, let me rewrite the nice concise code into what the C# compiler actually sees, by writing out the extension-method calls explicitly:

x = Enumerable.Sum(
    Enumerable.Where(
        Enumerable.Range(1, 100), 
        i => i % 3 != 0));

I don’t know how the Sum function looks, but lets say that this is a reasonable implementation:

static int Sum(IEnumerable nums)
{
    int sum = 0;
    foreach (int n in nums)
        sum += n;
    return sum;
}

(You see where I’m going with this?) Now we “inline” the Sum function:

IEnumerable nums = 
    Enumerable.Where(Enumerable.Range(1, 100), i => i % 3 != 0);
int sum = 0;
foreach (int n in nums)
    sum += n;
x = sum;

By continuing recursively substituting constant (“unchanging”, not “const“) arguments into functions, we land up with very close to our original imperative code.

To go even further, since the code doesn’t access any outside state, the compiler could also just execute it at compile-time to arrive at the final answer straight away. This could even be done without any of the previous optimizations – the original expression only calls referentially transparent functions with constant arguments, and so the result must be constant and can be precomputed to just be 3367.

In my mind this is just a case of constant propagation and function inlining, two things which are already done, but obviously not to the extent they could be done.

Premature Optimization?

Who cares that it takes 1500ns to execute the line of code? It’s still bleedingly fast on a modern computer, so why does it matter?

Well, this isn’t a hypothetical scenario. Yes, it’s true that I’ve never had so sum up these numbers like that. But on the other hand I have had to work with streams of millions of bytes, and with arrays of thousands or millions of numbers. And in these situations I feel forced by C# to write low-level imperative code, rather than using the layers of abstractions that are supposed to make my life easier.

Why doesn’t it?

So why doesn’t the C# compiler do any of these optimizations? Well I don’t pretend to know, but I put forward a few theories:

1. The specification is too restrictive

Perhaps the C# specification is too restrictive when it comes to letting the optimizer produce the most efficient code. For example, IEnumerable is a reference type, and reference types are always allocated on the heap. It may be a breach of C# contract to optimize out a heap allocation. This, of course, is pure speculation. It may not be about heap allocation at all, it may be about functions maintaining reference equality, or that the optimizer can’t inline across assembly boundaries, etc. All of these boil down to the fact that something about the language model includes details that couple it to the inflated machine code that it currently outputs.

2. The analysis is too complicated

It might be that the performance cost analysis is too complicated. In the examples I pointed out how the second version is 10 times faster than the first, but what I didn’t say was that the second version is slightly longer than the first in terms of code space. (It may not actually be longer, but lets say it is). Many optimizations have this property: they sacrifice one resource for another, rather than saving all over. And of course resources are not independent of each other. A longer program may run slower because of poorer caching.

When you start looking into the details of how to calculate whether a particular optimization is actually beneficial, you may come out with such a heavy, complicated and error-prone optimizer that it’s just not worth it to run.

3. There isn’t a demand

Although I’ve run into a few situations where these sorts of optimizations would be useful, perhaps Microsoft’s market analysis came up with different answers. Perhaps the development time involved in implementing it exceeds business value and demand. I can’t say anything further about this because I only know about my own demand, I can’t speak for anyone else.

Conclusion

Whatever the reason, I think there’s a gap in C# that needs to be filled by the next generation of popular languages, or perhaps the next generation of C# or .NET JIT optimizer. The C# compiler or JIT compiler doesn’t optimize some pretty obvious cases.

Languages in general shouldn’t force you chose between a nice implementation and a fast implementation. I’m sure there are already languages that address this, but none that checks all the other boxes that C# does (including that of being popular). If anyone is thinking about creating the next big language, they should seriously consider building the language from the ground up to support this kind of optimization.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.