Simple Dependent Types

Simple Dependent Types

What are Dependent Types?

So, you might be wondering what dependent types are in the first place, so before I go any further let me try explain what I understand them to be.

Consider the following structure in C++:

struct Foo
{
    int values[10];
};

The structure Foo has a single field, which is an array of 10 integers. This is simple enough. But the value 10 is arbitrary, and so the structure is not very useful. What if we want a Foo that can hold 11 integers, or 12? As you probably already know, C++ provides a feature called templates which allows us to do just that:

template 
struct Foo
{
    int values[NUM_VALUES];
};

Now we can use Foo<10> when we want a Foo that holds 10 values like it did before, but we can also write Foo<11> when we want a Foo that holds 11 values. For example:

Foo<2> test()
{
    Foo<2> myTwoValues;

    myTwoValues.values[0] = 25;
    myTwoValues.values[1] = 26;

    return myTwoValues;
}

Note that because the array of values is a fixed length, many compilers will actually warn you if you explicitly access value 3 in a Foo<2>. This is a really nice feature, but it doesn’t go so far in practical applications.

What happens if we don’t know at compile-time how many values we want?

Foo test(int numValues)
{
    Foo myFoo;

    for (int i = 0; i < numValues; i++)
        myFoo.values[i] = i + 25;

    return myFoo;
}

This is a common problem. In fact, I would say you almost never use a fixed-length array in C++, because in most cases you don't know how long it needs to be ahead of time. Of course in C++ the solution to this is simply to allocate the values dynamically:

struct Foo
{
    Foo(int n) : numValues(n), values(new int[n]) {  }
    ~Foo() { delete[] values; }
    int numValues;
    int* values;
}

This is a horrible solution in many ways. For one thing, you're performing an extra heap allocation (what if numValues is just 3, or what if the Foo is already on the heap - what a waste!). For another thing, you need to remember to free the extra heap allocation (or assign it to a smart pointer). And now you've lost all compile-time safety when it comes to indexing into the array.

This is where dependent types would be useful. What if C++ could do the following:

template 
struct Foo
{
    int values[n];
};

Foo test(int numValues)
{
    Foo myFoo;

    for (int i = 0; i < numValues; i++)
        myFoo.values[i] = i + 25;

    return myFoo;
}

This Foo is a dependent type: the static type depends on a runtime value. We can't say if it's a Foo<2>, a Foo<3>, a Foo<27>, etc, without looking first at the value of numValues. There is also another dependent type hiding in this example: the type of Foo::values, which depends on a value (n) that is now possibly a variable.

As a side note, the term template is really not applicable any more, since the structure is no longer specialized only by the compiler, but also by the runtime (conceptually).

How would this work?

In the above case, it seems simple enough for the compiler to understand what's going on and generate the appropriate code. The binary size of the type values[n] is no longer calculated at compile-time by substituting in the constant n, it's now a function of the run-time type parameter n.

If we recursively apply this logic, then the size of stack frame itself is a function of the local variable numValues (which happens to also be a parameter).

This really doesn't seem that difficult. Functions that are executed anyway by the compiler, are now simply executed at runtime instead of compile time (at least conceptually anyway). Sizes, field offsets, and variables offsets, which were previously static constants, now become runtime functions.

The Problem

I previously mentioned that some of the compiler's work would now occur at runtime - calculating binary layouts and such. But the compiler also does something else that's really useful: static type checking. This helps remove a whole class of errors. Consider the following:

void test(int count1, int count2)
{
    Foo foo1;
    Foo foo2 = foo1; // Is this valid?
}

The above example seems pretty simple in concept. We have two instances of Foo that are dependent on two different variables. The question is, can we assign the one to the other?

At first glance, of course not! They have different counts, so they must be incompatible, right? But what if count1 and count2 are actually the same value?

So deciding whether or not the assignment is valid requires information about the value of the variables depended on. This is quite a conundrum. The compiler essentially has three choices here:

  • it could delay the static analysis to runtime, like we have been doing for binary layout calculations, and throw an AssignmentNotValid exception or something if the runtime type checking fails
  • or it can require that all type safety is proved at compile-time only, and require that we persuade it why it should work
  • or we can just relax the type soundness in this case, and simply the require the programmer to do a cast from one type to the other, as a "signed declaration" that he accepts all responsibility if the assignment turns out to be invalid

All of these choices seem reasonable:

  • performing the "static analysis" at runtime seems like something a .NET language would do, similar to the way that it can throw an InvalidCast exception
  • requiring that the programmer prove to the compiler that the assignment can never fail seems like something that other dependently typed languages do, requiring a proof-assisting language - and this, I think, is why dependent types seem to only be for the elite of theoretical computer science
  • ignoring the possible conflict and simply requiring the user do an explicit type-cast seems like something that C and C++ would do - you can generally cast anything to anything if you believe you know more than the compiler and don't have the patience or tools to prove it through the type system.

C already has this, kinda

To some extent C already has this (as of C99), in the form of variable length arrays (VLAs). For example:

void test(int numValues)
{
    int foo[numValues];

    for (int i = 0; i < numValues; i++)
        foo[i] = i + 25;
}

Unfortunately VLAs are extremely limited and you can't use them in very many contexts. Although foo has a dependent type, there is no static analysis that I know of that actually uses this information to warn the user if he's obviously stepping out of bounds. And the concept of VLAs doesn't generalize to user-defined types. It shows us a taste of another world, but it stops short.

Conclusion

I think dependent types get a bad reputation from anyone who encounters them, as being only of theoretical interest and only understandable to advanced mathematicians and computer science professors. The reason, I think, is because the descriptions of dependent types are very mathematical, and the implementations of dependent typed languages are quite academic. I hope I've shown you that this may be an unnecessary restriction, and we can hopefully build dependent types easily into popular languages, or at least into the next generation of popular languages. It may not be a complete solution, or a sound solution, but this is no different to most of the other features of todays popular languages.

See also: Practical Dependent Types

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.