Month: September 2013

Simple Dependent Types

Simple Dependent Types

What are Dependent Types?

So, you might be wondering what dependent types are in the first place, so before I go any further let me try explain what I understand them to be.

Consider the following structure in C++:

The structure Foo has a single field, which is an array of 10 integers. This is simple enough. But the value 10 is arbitrary, and so the structure is not very useful. What if we want a Foo that can hold 11 integers, or 12? As you probably already know, C++ provides a feature called templates which allows us to do just that:

Now we can use Foo<10> when we want a Foo that holds 10 values like it did before, but we can also write Foo<11> when we want a Foo that holds 11 values. For example:

Note that because the array of values is a fixed length, many compilers will actually warn you if you explicitly access value 3 in a Foo<2>. This is a really nice feature, but it doesn’t go so far in practical applications.

What happens if we don’t know at compile-time how many values we want?

This is a common problem. In fact, I would say you almost never use a fixed-length array in C++, because in most cases you don’t know how long it needs to be ahead of time. Of course in C++ the solution to this is simply to allocate the values dynamically:

This is a horrible solution in many ways. For one thing, you’re performing an extra heap allocation (what if numValues is just 3, or what if the Foo is already on the heap – what a waste!). For another thing, you need to remember to free the extra heap allocation (or assign it to a smart pointer). And now you’ve lost all compile-time safety when it comes to indexing into the array.

This is where dependent types would be useful. What if C++ could do the following:

This Foo is a dependent type: the static type depends on a runtime value. We can’t say if it’s a Foo<2>, a Foo<3>, a Foo<27>, etc, without looking first at the value of numValues. There is also another dependent type hiding in this example: the type of Foo::values, which depends on a value (n) that is now possibly a variable.

As a side note, the term template is really not applicable any more, since the structure is no longer specialized only by the compiler, but also by the runtime (conceptually).

How would this work?

In the above case, it seems simple enough for the compiler to understand what’s going on and generate the appropriate code. The binary size of the type values[n] is no longer calculated at compile-time by substituting in the constant n, it’s now a function of the run-time type parameter n.

If we recursively apply this logic, then the size of stack frame itself is a function of the local variable numValues (which happens to also be a parameter).

This really doesn’t seem that difficult. Functions that are executed anyway by the compiler, are now simply executed at runtime instead of compile time (at least conceptually anyway). Sizes, field offsets, and variables offsets, which were previously static constants, now become runtime functions.

The Problem

I previously mentioned that some of the compiler’s work would now occur at runtime – calculating binary layouts and such. But the compiler also does something else that’s really useful: static type checking. This helps remove a whole class of errors. Consider the following:

The above example seems pretty simple in concept. We have two instances of Foo that are dependent on two different variables. The question is, can we assign the one to the other?

At first glance, of course not! They have different counts, so they must be incompatible, right? But what if count1 and count2 are actually the same value?

So deciding whether or not the assignment is valid requires information about the value of the variables depended on. This is quite a conundrum. The compiler essentially has three choices here:

  • it could delay the static analysis to runtime, like we have been doing for binary layout calculations, and throw an AssignmentNotValid exception or something if the runtime type checking fails
  • or it can require that all type safety is proved at compile-time only, and require that we persuade it why it should work
  • or we can just relax the type soundness in this case, and simply the require the programmer to do a cast from one type to the other, as a “signed declaration” that he accepts all responsibility if the assignment turns out to be invalid

All of these choices seem reasonable:

  • performing the “static analysis” at runtime seems like something a .NET language would do, similar to the way that it can throw an InvalidCast exception
  • requiring that the programmer prove to the compiler that the assignment can never fail seems like something that other dependently typed languages do, requiring a proof-assisting language – and this, I think, is why dependent types seem to only be for the elite of theoretical computer science
  • ignoring the possible conflict and simply requiring the user do an explicit type-cast seems like something that C and C++ would do – you can generally cast anything to anything if you believe you know more than the compiler and don’t have the patience or tools to prove it through the type system.

C already has this, kinda

To some extent C already has this (as of C99), in the form of variable length arrays (VLAs). For example:

Unfortunately VLAs are extremely limited and you can’t use them in very many contexts. Although foo has a dependent type, there is no static analysis that I know of that actually uses this information to warn the user if he’s obviously stepping out of bounds. And the concept of VLAs doesn’t generalize to user-defined types. It shows us a taste of another world, but it stops short.

Conclusion

I think dependent types get a bad reputation from anyone who encounters them, as being only of theoretical interest and only understandable to advanced mathematicians and computer science professors. The reason, I think, is because the descriptions of dependent types are very mathematical, and the implementations of dependent typed languages are quite academic. I hope I’ve shown you that this may be an unnecessary restriction, and we can hopefully build dependent types easily into popular languages, or at least into the next generation of popular languages. It may not be a complete solution, or a sound solution, but this is no different to most of the other features of todays popular languages.

See also: Practical Dependent Types

Expressiveness vs Efficiency in C#

Expressiveness vs Efficiency in C#

I do a lot of my work in C#. I love C#, especially after drowning in C for a while, because of its strong, expressive type system, managed memory, and its mixed-paradigm nature.

There are two big problems I find in C# though. One is that it’s often verbose, but I won’t get into that today. The other is that although the language provides nice abstractions for some things (like IEnumerable which I use all the time), I find there is almost always a conflict between a succinct solution and a performant solution. A succinct solution is one that uses the high-level abstractions, while a performant solution is one that doesn’t use them because of their cost. I find often that a succinct solution uses some of the functional features of C#, while a performant solution uses the imperative features.

Lets say we want to sum all the numbers between 1 and 100 which are not multiples of 3. A direct representation of this in C# using a functional notation is this:

This nice line of code reads almost exactly like the English version. Unfortunately this method is slow. We’re using the interface IEnumerable to iterate through the numbers, and it’s calling a lambda function for the predicate to filter out the multiples of 3, and then again using the interface to iterate through the numbers to sum them. I don’t know exactly how much this of this the optimizer removes, but on my machine it averages about 1500ns to execute this line.

On the other hand, here is the equivalent imperative code:

This essentially means exactly the same thing: for the numbers between 1 and 100, for numbers that aren’t multiples of 3, sum them. I’ve explicitly not done anything clever here, like stepping in increments of 3 instead of 1. This code is meant to be as close to the functional code as possible. It executes in about 140ns on my computer – more than 10 times faster than the functional equivalent.

Of course there is another implementation that does the same thing:

By the same timing methodology, this last implementation takes about 0.28ns on my machine, although at this level the result is pretty meaningless except to say that it is blindingly fast.

Now, I’m not an expert on optimization by any means, but I would say that both of the latter implementations are almost trivially apparent from the first implementation. The compiler or JIT optimizer could generate them just by “inlining” things that are known to hold constant for certain call.

In fact I think it’s so easy that I’m going to demonstrate it (partially). First, let me rewrite the nice concise code into what the C# compiler actually sees, by writing out the extension-method calls explicitly:

I don’t know how the Sum function looks, but lets say that this is a reasonable implementation:

(You see where I’m going with this?) Now we “inline” the Sum function:

By continuing recursively substituting constant (“unchanging”, not “const“) arguments into functions, we land up with very close to our original imperative code.

To go even further, since the code doesn’t access any outside state, the compiler could also just execute it at compile-time to arrive at the final answer straight away. This could even be done without any of the previous optimizations – the original expression only calls referentially transparent functions with constant arguments, and so the result must be constant and can be precomputed to just be 3367.

In my mind this is just a case of constant propagation and function inlining, two things which are already done, but obviously not to the extent they could be done.

Premature Optimization?

Who cares that it takes 1500ns to execute the line of code? It’s still bleedingly fast on a modern computer, so why does it matter?

Well, this isn’t a hypothetical scenario. Yes, it’s true that I’ve never had so sum up these numbers like that. But on the other hand I have had to work with streams of millions of bytes, and with arrays of thousands or millions of numbers. And in these situations I feel forced by C# to write low-level imperative code, rather than using the layers of abstractions that are supposed to make my life easier.

Why doesn’t it?

So why doesn’t the C# compiler do any of these optimizations? Well I don’t pretend to know, but I put forward a few theories:

1. The specification is too restrictive

Perhaps the C# specification is too restrictive when it comes to letting the optimizer produce the most efficient code. For example, IEnumerable is a reference type, and reference types are always allocated on the heap. It may be a breach of C# contract to optimize out a heap allocation. This, of course, is pure speculation. It may not be about heap allocation at all, it may be about functions maintaining reference equality, or that the optimizer can’t inline across assembly boundaries, etc. All of these boil down to the fact that something about the language model includes details that couple it to the inflated machine code that it currently outputs.

2. The analysis is too complicated

It might be that the performance cost analysis is too complicated. In the examples I pointed out how the second version is 10 times faster than the first, but what I didn’t say was that the second version is slightly longer than the first in terms of code space. (It may not actually be longer, but lets say it is). Many optimizations have this property: they sacrifice one resource for another, rather than saving all over. And of course resources are not independent of each other. A longer program may run slower because of poorer caching.

When you start looking into the details of how to calculate whether a particular optimization is actually beneficial, you may come out with such a heavy, complicated and error-prone optimizer that it’s just not worth it to run.

3. There isn’t a demand

Although I’ve run into a few situations where these sorts of optimizations would be useful, perhaps Microsoft’s market analysis came up with different answers. Perhaps the development time involved in implementing it exceeds business value and demand. I can’t say anything further about this because I only know about my own demand, I can’t speak for anyone else.

Conclusion

Whatever the reason, I think there’s a gap in C# that needs to be filled by the next generation of popular languages, or perhaps the next generation of C# or .NET JIT optimizer. The C# compiler or JIT compiler doesn’t optimize some pretty obvious cases.

Languages in general shouldn’t force you chose between a nice implementation and a fast implementation. I’m sure there are already languages that address this, but none that checks all the other boxes that C# does (including that of being popular). If anyone is thinking about creating the next big language, they should seriously consider building the language from the ground up to support this kind of optimization.