Category: Uncategorized

Saying more than you mean

Saying more than you mean

Last week I brought up the seemingly simple question “What is a Variable?” in a new light. I concluded by saying that variables are actually a bad thing from the compiler’s perspective, because it’s difficult for the compiler to reason about the overall behavior of the program based on how variables change.

I’d like to continue this line of thought by looking at an example. How would you write some C code to sum up a bunch of numbers? You might do it like this:

int sum(int* numbers, int count)
{
    int accumulator = 0;
    for (int i = 0; i < count; i++)
        accumulator += numbers[i];
    return accumulator;
}

The objective of the function is simply to add the numbers together - but what does the code actually semantically say? What the compiler sees is basically equivalent to this:

int a(int* b, int c)
{
    int d, e;
    d = 0;
    e = 0;
f:
    if (e < c)
        d = d + *(b + e);
    else
        goto g;
    e = e + 1;
    goto f;
g:
    return e;
}

All I've done here is remove some of the things that the compiler doesn't understand in any deeper way, such as names, and the fact that the commonly seen pattern for (i = 0; i < count; i++) actually means that i is meant to iteratively be all the values from 0 to count - 1. The code above is semantically equivalent to the previous code, and the compiler doesn't understand one better than the other[1]. What do you think when you read it? Did you even read it? It's incredibly hard to figure out what this code is doing just by looking how variables are assigned and in what order they're assigned. The only reason the first piece of code makes more sense is because it uses humanly recognizable patterns and constructs (including the names of things).

What did you mean to say?

One of the problems with the above code (either of them - because they mean the same thing), is that it says too much. For example, it says that count is a 32-bit integer [2], as opposed to, say, a byte. This means it's saying to the caller "this function can add up to 2147483647 items together". Did we mean to say that?

It's also saying that the accumulator is a 32-bit integer, which means that if the mathematical sum is more than 2147483647 then the function will return some number other than the sum. Did we mean to say that? And if this was indeed intentional, do we mean this behavior to be different on a 64-bit platform rather than a 32-bit platform? If we call the function with only a few small numbers, do we still intend to use an accumulator that goes to 2 billion? What did we mean to say?

The function can only add a list that is specifically a memory-contiguous array of integers. It can't add a linked list of integers, it can't add a sequence of integers as they arrive from a network source.

Apart from the types, there is also an apparent order to the statements. The code reads like recipe - a sequence of instructions to be followed to get to the final answer (especially if you read the latter code, where a goto is like "skip to step 7" or "go back to step 3"). Did we mean to only start at the first element and not the last? Is it important that we sum the second element only after the first element[3]? Did we mean to initialize the accumulator before the loop variable?

Saying too much

So it seems it's possible the code actually says too much. That is, it says more than we intended it to say when we designed it, and it says more than it needs to say to get the job done. There are two problems with saying too much. The one problem is related to the humans who interact with the code, and the other is related to computers that interact with the code.

For those who write the code, they may be writing too much because they're filling in details that are actually irrelevant to their intended outcome. For those reading the code, they need to first translate the code into the actual intention of the original author, leaving out details that they think aren't intended or relevant to the overall picture.

A compiler has the same problem, except it can't simply ignore what the code actually says and instead try to guess what the author meant by it. It can't guess which details are relevant and which aren't. This means the compiler is incredibly restricted with what it can do when it implements the compiled output. With all of these unintended restrictions on the sum function, pretty much the only way to implement this function is by imitating the recipe step for step in machine code. The optimize may be able to omit a step if it's obviously not needed, but nothing major.

Isn't this just life?

It doesn't need to be this way. For an example, lets look at the equivalent sum function in the language of Haskell [4]:

s = foldl (+) 0 numbers

For those who don't know Haskell, the foldl does the "loop through all the numbers", the (+) means that we're adding each one as we go, and the 0 means we start the accumulator with 0. What I'd like to point out is how much this code doesn't say[5]. Almost everything it doesn't say is something that the function is not restricted by. For example, since we didn't say anywhere that the input had to be 32-bit integers, we can actually pass in an array[6] of anything that can be added together - for example floats. We also didn't say the accumulator must be 32-bit [7], or put any restrictions on the length of the array.

The Point

If we don't say what we don't mean, the code is better, cleaner, more flexible, and easier to use. It leaves more time and space to say more of the things that we actually do mean, and it leaves the compiler more room to be intelligent about how it actually implements the code at a machine level. We need to design languages that don't coerce you into saying things, and to help our fellow programmers understand what they're actually saying with the code they write.


[1] From a look at the C standard, I think there's also an issue of variable scope, but this isn't relevant to the point

[2] Obviously this is compiler dependent

[3] I don't know enough about C to say that the order of the summation in this particular case is strictly part of the semantics of the language without the array contents being marked volatile. For example, how would our function be expected to behave if another thread were to start changing items in the array in some particular order? I know that C historically has not been well defined for multi-threading situations, so it's likely the behavior is undefined.

[4] Taken from Rosetta Code

[5] Haskell isn't perfect either, and this code also implicitly says more than it should, but it's a good example.

[6] Actually it works on lists and not arrays, but lets pretend

[7] I think Haskell lets us down here because the accumulator size is probably defined by the output of the + operator and not some intelligent idea about how big the result really is

What is a Variable?

What is a Variable?

The other day I was thinking about how variables should be represented in a language. This got me down the strange path of wondering, “What, actually, is a variable?”

Most people get taught the concept of a variable very early on in programming. The analogy I hear used a lot is that of a box, or pigeon hole – it is a placeholder in which values can be put. Only a single value can be put in the placeholder at a time, so new values overwrite older values.

int x; // placeholder x
x = 5; // x has a value of 5
x = 6; // x has a value of 6
print(x);

Accessing a variable is accessing the value that was most recently put in it.

When looking at a low level language like C or C++, we can literally think of variables as memory locations. We can use the syntax &x to find out the memory location. We could literally open up our computer, and if we had a sufficiently good microscope (and one which can detect electric charges), and some crazy surgery tools, and a really good map, we could actually see the electrons in this memory location that make up the value. Changing the value of the variable will literally change the charges of the electrons or currents making that physical memory location. Of course, in a modern computer this isn’t quite as simple as I’ve made it sound – but certainly in simple embedded microcontrollers this is literally the case.

So a variable is a memory location?

The problem I have, is that I think a program can be fully defined in terms of input and output. That is, a program can be defined by how it interacts with its environment. A memory location is, generally speaking, unobservable by the “outside” – it is neither an input nor an output. Typically when a program changes a variable, I (the user outside the program) can’t tell that it changed. When you download photoshop, it’s description doesn’t say “Photoshop is a program that changes the following variables…”.

So a program is defined by its input and output, and a variable is not input nor output. So a variable is not part of the program definition. So what is it then?

A variable is a language feature. It is a part of the special language that we use to describe the program definition to the compiler.

Consider the following program:

void main()
{
    int x;
    x = 5;
    x = 6;
    printf("x: %i\n", x); // Outputs "x: 6"
}

I would say that the program definition is to output “x: 6”. Whether or not the compiler uses a memory location to do this is irrelevant – in fact it probably shouldn’t. The statement x = 6 is not part of the definition, but rather a description to the compiler to “remember for next time, that x is 6”.

So here’s a question: what happens to the statement x = 5? Is it optimized out?

No. I think it’s incorrect to think that x = 5 is optimized out. It’s better to say that x = 5 is always there in the code, but has no influence on definition of the program behavior and so won’t affect the outputted binary. x = 5 is never in the specification to start with. The description of the behavior (the code) says the program must output that last assigned value of x to the console. The last assigned value of x is 6, so “6” must be outputted to the console. The value 5 doesn’t even feature here. It doesn’t contribute to the program description, and so is as good as a blank line.

In this case there is no program input, so we’re only looking at the program output. The compiler doesn’t even need to consider x = 5 when it decides what program must do. For the sake of making the compiler itself easier to implement, it may indeed originally consider x = 5 as setting a memory location, and then optimized it out later. But this is a detail of how the compiler is implemented, and it need not be that way at all.

Why does it matter?

To the average programmer, this distinction shouldn’t matter at all. Whether a variable is a physical memory location, or whether a variable is a tool for a describing program behavior to the compiler, makes no difference most of the time.

To me it makes a difference in how you think about programs. It makes a difference about how upset you are when the compiler doesn’t generate exactly what you expected, or it “optimizes” out something that you intended it not to. These cases are not the compiler being stupid because it’s not doing the code you wrote, it’s the compiler being intelligent and implementing your description exactly as you put it. It’s actually all the other times that the compiler is being “stupid” – when it can’t think of how to implement your program except by applying the very basic fall-back rules like “make a memory location for each variable”.

This is more and more important as compilers get more and more “intelligent”. It’s also important when thinking about the design of new programming languages. One needs to think about what language features to provide that make it easiest for the for compiler to be intelligent. Variables are not one of these features – it’s very difficult to reason about the behavior of a program by looking at how variables change, and very easy to instead dumbly map variables to memory locations.

Subscribing Member Functions to C Callbacks

Subscribing Member Functions to C Callbacks

Today I’m going to talk about another complaint I have with C/C++, regarding the incompatibility between member functions and standard functions.

The synopsis goes like this. For compatibility with C, I have some C++ callback functions that look like this:

void foo(void* state, int eventCode);

That is, they have a state argument which represents any persistent state that the subscriber wishes to have passed back to it later when the event is fired. For illustration, the subscription function might look something like this:

typedef void (*CallbackFunc)(void* state, int eventCode);

void subscribe(CallbackFunc callback, void* state)
{
    // add the callback to a list of functions to be called
}

And we can subscribe to the event like this:

void* myState = ...; // Some state we want passed to the callback
subscribe(&foo, myState);

I won’t even get into the problem of state “ownership” in this post – that’s a whole different problem that drives me up the wall in C and C++. When it comes to function pointers, there is an incompatibility between the above function and this member function:

class State
{
    void foo(int eventCode);
};

In other words we can’t subscribe the member function to the same event:

// Lets say we have some state here
State* myState = new State();

// We can't subscribe like this:
subscribe(State::foo, myState);

// Or like this
subscribe(myState.foo);

If you’re a C++ programmer, which you probably are (at least a little I’m sure), you’re probably thinking I’m crazy right now – “Of course you can’t – why would you even try?”

The reason it could work is simply because of the underlying mechanism of member functions. Everyone knows that there is a “hidden” argument passed to a (non-static) member function invocation – the value for the this parameter. Exactly how this invisible argument gets passed is not part of the standard, so it doesn’t have to use the same mechanism as a normal parameter, but for the sake of allowing this kind of compatibility I would have suggested that it be standardized to be passed the same way as a normal parameter.

Another way of approaching the problem is to look at C# for inspiration – a “function pointer” (a delegate in C#) can have any state attached to it. So a member function is the same as a static function and the same as a closure or multi-cast delegate etc, provided that the parameter types fit. To me this is the opposite way of looking at it: instead of saying that a member function has an additional invisible this parameter, we now say that actually every function has a this parameter which provides the attached state, but in the case of a non-member function it is just ignored by the callee.

This is probably not strictly what happens in C#, but it is a reasonable way of looking at it. If we were to apply this to C++, then the compiler could compile a hidden this pointer to all functions (including non-member functions). The optimizer would remove it again for all functions where it isn’t needed for whatever reason (such as for those functions whose addresses aren’t taken in the program). This would make it possible to have the expression myState.foo actually be a valid expression – it would evaluate to function pointer (along with its implicit state myState).

Of course this second option adds overhead to every function pointer, because it now has to point to both the function itself, and to the attached state. I think that if I were to redesign C++ from scratch, that I would probably accept this overhead and do it anyway. Perhaps I would include a back-door option to create a “no-state” function and it’s corresponding pointer, which would be compatible with C functions. And I would specify that the calling convention used to pass the this parameter is the same as that used for the first parameter of a “no-state” function. This would make C and C++ function pointers seamlessly interoperable.

Why does it bug me?

It bugs me because stateful events, callbacks, or continuations are part of any good design, and C++ makes them incredibly difficult to implement. C++ claims to be object orientated, but yet its member function pointers don’t refer to the object they’re a member of. So C++ function pointers are really C function pointers – except that they’re type-incompatible with C function pointers. So really they aren’t useful on their own in either context! It seems to me this is a massive oversight of the original language.

Simple Dependent Types

Simple Dependent Types

What are Dependent Types?

So, you might be wondering what dependent types are in the first place, so before I go any further let me try explain what I understand them to be.

Consider the following structure in C++:

struct Foo
{
    int values[10];
};

The structure Foo has a single field, which is an array of 10 integers. This is simple enough. But the value 10 is arbitrary, and so the structure is not very useful. What if we want a Foo that can hold 11 integers, or 12? As you probably already know, C++ provides a feature called templates which allows us to do just that:

template 
struct Foo
{
    int values[NUM_VALUES];
};

Now we can use Foo<10> when we want a Foo that holds 10 values like it did before, but we can also write Foo<11> when we want a Foo that holds 11 values. For example:

Foo<2> test()
{
    Foo<2> myTwoValues;

    myTwoValues.values[0] = 25;
    myTwoValues.values[1] = 26;

    return myTwoValues;
}

Note that because the array of values is a fixed length, many compilers will actually warn you if you explicitly access value 3 in a Foo<2>. This is a really nice feature, but it doesn’t go so far in practical applications.

What happens if we don’t know at compile-time how many values we want?

Foo test(int numValues)
{
    Foo myFoo;

    for (int i = 0; i < numValues; i++)
        myFoo.values[i] = i + 25;

    return myFoo;
}

This is a common problem. In fact, I would say you almost never use a fixed-length array in C++, because in most cases you don't know how long it needs to be ahead of time. Of course in C++ the solution to this is simply to allocate the values dynamically:

struct Foo
{
    Foo(int n) : numValues(n), values(new int[n]) {  }
    ~Foo() { delete[] values; }
    int numValues;
    int* values;
}

This is a horrible solution in many ways. For one thing, you're performing an extra heap allocation (what if numValues is just 3, or what if the Foo is already on the heap - what a waste!). For another thing, you need to remember to free the extra heap allocation (or assign it to a smart pointer). And now you've lost all compile-time safety when it comes to indexing into the array.

This is where dependent types would be useful. What if C++ could do the following:

template 
struct Foo
{
    int values[n];
};

Foo test(int numValues)
{
    Foo myFoo;

    for (int i = 0; i < numValues; i++)
        myFoo.values[i] = i + 25;

    return myFoo;
}

This Foo is a dependent type: the static type depends on a runtime value. We can't say if it's a Foo<2>, a Foo<3>, a Foo<27>, etc, without looking first at the value of numValues. There is also another dependent type hiding in this example: the type of Foo::values, which depends on a value (n) that is now possibly a variable.

As a side note, the term template is really not applicable any more, since the structure is no longer specialized only by the compiler, but also by the runtime (conceptually).

How would this work?

In the above case, it seems simple enough for the compiler to understand what's going on and generate the appropriate code. The binary size of the type values[n] is no longer calculated at compile-time by substituting in the constant n, it's now a function of the run-time type parameter n.

If we recursively apply this logic, then the size of stack frame itself is a function of the local variable numValues (which happens to also be a parameter).

This really doesn't seem that difficult. Functions that are executed anyway by the compiler, are now simply executed at runtime instead of compile time (at least conceptually anyway). Sizes, field offsets, and variables offsets, which were previously static constants, now become runtime functions.

The Problem

I previously mentioned that some of the compiler's work would now occur at runtime - calculating binary layouts and such. But the compiler also does something else that's really useful: static type checking. This helps remove a whole class of errors. Consider the following:

void test(int count1, int count2)
{
    Foo foo1;
    Foo foo2 = foo1; // Is this valid?
}

The above example seems pretty simple in concept. We have two instances of Foo that are dependent on two different variables. The question is, can we assign the one to the other?

At first glance, of course not! They have different counts, so they must be incompatible, right? But what if count1 and count2 are actually the same value?

So deciding whether or not the assignment is valid requires information about the value of the variables depended on. This is quite a conundrum. The compiler essentially has three choices here:

  • it could delay the static analysis to runtime, like we have been doing for binary layout calculations, and throw an AssignmentNotValid exception or something if the runtime type checking fails
  • or it can require that all type safety is proved at compile-time only, and require that we persuade it why it should work
  • or we can just relax the type soundness in this case, and simply the require the programmer to do a cast from one type to the other, as a "signed declaration" that he accepts all responsibility if the assignment turns out to be invalid

All of these choices seem reasonable:

  • performing the "static analysis" at runtime seems like something a .NET language would do, similar to the way that it can throw an InvalidCast exception
  • requiring that the programmer prove to the compiler that the assignment can never fail seems like something that other dependently typed languages do, requiring a proof-assisting language - and this, I think, is why dependent types seem to only be for the elite of theoretical computer science
  • ignoring the possible conflict and simply requiring the user do an explicit type-cast seems like something that C and C++ would do - you can generally cast anything to anything if you believe you know more than the compiler and don't have the patience or tools to prove it through the type system.

C already has this, kinda

To some extent C already has this (as of C99), in the form of variable length arrays (VLAs). For example:

void test(int numValues)
{
    int foo[numValues];

    for (int i = 0; i < numValues; i++)
        foo[i] = i + 25;
}

Unfortunately VLAs are extremely limited and you can't use them in very many contexts. Although foo has a dependent type, there is no static analysis that I know of that actually uses this information to warn the user if he's obviously stepping out of bounds. And the concept of VLAs doesn't generalize to user-defined types. It shows us a taste of another world, but it stops short.

Conclusion

I think dependent types get a bad reputation from anyone who encounters them, as being only of theoretical interest and only understandable to advanced mathematicians and computer science professors. The reason, I think, is because the descriptions of dependent types are very mathematical, and the implementations of dependent typed languages are quite academic. I hope I've shown you that this may be an unnecessary restriction, and we can hopefully build dependent types easily into popular languages, or at least into the next generation of popular languages. It may not be a complete solution, or a sound solution, but this is no different to most of the other features of todays popular languages.

See also: Practical Dependent Types

Expressiveness vs Efficiency in C#

Expressiveness vs Efficiency in C#

I do a lot of my work in C#. I love C#, especially after drowning in C for a while, because of its strong, expressive type system, managed memory, and its mixed-paradigm nature.

There are two big problems I find in C# though. One is that it’s often verbose, but I won’t get into that today. The other is that although the language provides nice abstractions for some things (like IEnumerable which I use all the time), I find there is almost always a conflict between a succinct solution and a performant solution. A succinct solution is one that uses the high-level abstractions, while a performant solution is one that doesn’t use them because of their cost. I find often that a succinct solution uses some of the functional features of C#, while a performant solution uses the imperative features.

Lets say we want to sum all the numbers between 1 and 100 which are not multiples of 3. A direct representation of this in C# using a functional notation is this:

x = Enumerable.Range(1, 100).Where(i => i % 3 != 0).Sum();

This nice line of code reads almost exactly like the English version. Unfortunately this method is slow. We’re using the interface IEnumerable to iterate through the numbers, and it’s calling a lambda function for the predicate to filter out the multiples of 3, and then again using the interface to iterate through the numbers to sum them. I don’t know exactly how much this of this the optimizer removes, but on my machine it averages about 1500ns to execute this line.

On the other hand, here is the equivalent imperative code:

int sum = 0;
for (int i = 1; i < 101; i++)
    if (i % 3 != 0)
        sum += i;
x = sum;

This essentially means exactly the same thing: for the numbers between 1 and 100, for numbers that aren’t multiples of 3, sum them. I’ve explicitly not done anything clever here, like stepping in increments of 3 instead of 1. This code is meant to be as close to the functional code as possible. It executes in about 140ns on my computer – more than 10 times faster than the functional equivalent.

Of course there is another implementation that does the same thing:

x = 3367;

By the same timing methodology, this last implementation takes about 0.28ns on my machine, although at this level the result is pretty meaningless except to say that it is blindingly fast.

Now, I’m not an expert on optimization by any means, but I would say that both of the latter implementations are almost trivially apparent from the first implementation. The compiler or JIT optimizer could generate them just by “inlining” things that are known to hold constant for certain call.

In fact I think it’s so easy that I’m going to demonstrate it (partially). First, let me rewrite the nice concise code into what the C# compiler actually sees, by writing out the extension-method calls explicitly:

x = Enumerable.Sum(
    Enumerable.Where(
        Enumerable.Range(1, 100), 
        i => i % 3 != 0));

I don’t know how the Sum function looks, but lets say that this is a reasonable implementation:

static int Sum(IEnumerable nums)
{
    int sum = 0;
    foreach (int n in nums)
        sum += n;
    return sum;
}

(You see where I’m going with this?) Now we “inline” the Sum function:

IEnumerable nums = 
    Enumerable.Where(Enumerable.Range(1, 100), i => i % 3 != 0);
int sum = 0;
foreach (int n in nums)
    sum += n;
x = sum;

By continuing recursively substituting constant (“unchanging”, not “const“) arguments into functions, we land up with very close to our original imperative code.

To go even further, since the code doesn’t access any outside state, the compiler could also just execute it at compile-time to arrive at the final answer straight away. This could even be done without any of the previous optimizations – the original expression only calls referentially transparent functions with constant arguments, and so the result must be constant and can be precomputed to just be 3367.

In my mind this is just a case of constant propagation and function inlining, two things which are already done, but obviously not to the extent they could be done.

Premature Optimization?

Who cares that it takes 1500ns to execute the line of code? It’s still bleedingly fast on a modern computer, so why does it matter?

Well, this isn’t a hypothetical scenario. Yes, it’s true that I’ve never had so sum up these numbers like that. But on the other hand I have had to work with streams of millions of bytes, and with arrays of thousands or millions of numbers. And in these situations I feel forced by C# to write low-level imperative code, rather than using the layers of abstractions that are supposed to make my life easier.

Why doesn’t it?

So why doesn’t the C# compiler do any of these optimizations? Well I don’t pretend to know, but I put forward a few theories:

1. The specification is too restrictive

Perhaps the C# specification is too restrictive when it comes to letting the optimizer produce the most efficient code. For example, IEnumerable is a reference type, and reference types are always allocated on the heap. It may be a breach of C# contract to optimize out a heap allocation. This, of course, is pure speculation. It may not be about heap allocation at all, it may be about functions maintaining reference equality, or that the optimizer can’t inline across assembly boundaries, etc. All of these boil down to the fact that something about the language model includes details that couple it to the inflated machine code that it currently outputs.

2. The analysis is too complicated

It might be that the performance cost analysis is too complicated. In the examples I pointed out how the second version is 10 times faster than the first, but what I didn’t say was that the second version is slightly longer than the first in terms of code space. (It may not actually be longer, but lets say it is). Many optimizations have this property: they sacrifice one resource for another, rather than saving all over. And of course resources are not independent of each other. A longer program may run slower because of poorer caching.

When you start looking into the details of how to calculate whether a particular optimization is actually beneficial, you may come out with such a heavy, complicated and error-prone optimizer that it’s just not worth it to run.

3. There isn’t a demand

Although I’ve run into a few situations where these sorts of optimizations would be useful, perhaps Microsoft’s market analysis came up with different answers. Perhaps the development time involved in implementing it exceeds business value and demand. I can’t say anything further about this because I only know about my own demand, I can’t speak for anyone else.

Conclusion

Whatever the reason, I think there’s a gap in C# that needs to be filled by the next generation of popular languages, or perhaps the next generation of C# or .NET JIT optimizer. The C# compiler or JIT compiler doesn’t optimize some pretty obvious cases.

Languages in general shouldn’t force you chose between a nice implementation and a fast implementation. I’m sure there are already languages that address this, but none that checks all the other boxes that C# does (including that of being popular). If anyone is thinking about creating the next big language, they should seriously consider building the language from the ground up to support this kind of optimization.

Why Cast?

Why Cast?

Last week I was writing some C code, and as is often the case in C, I started getting frustrated with the language and thinking about how it could be better. This time my thoughts were on casting.

The problem started when I encountered some code where a function needed to be passed a pointer to a const value, but the data available was in the form const volatile. Declaring a variable as const volatile simply means that it it can change (volatile), but you cannot change it (const). It is useful for embedded microcontrollers, where there are cases where values stored in program flash ROM (const) that can actually be changed and so shouldn’t really be taken as “never changing”, but instead should be reread from program flash every time it is accessed.

But that is beside the point. The point is that I have to cast something like this:

const MyAwesomeType* myAwsomeVariable = 
    (const MyAwesomeType*)myOtherVariable;

myFunction(myAwsomeVariable);

One of my coworkers would never have used const for the function, and would just write it like this:

myFunc((void*)myVar);

Certainly the latter is more readable and bit more compact, which is valuable. On the other hand, the former is more explicit and more type safe, which is also valuable. In this example I also tried to show a difference in naming convention – I prefer more explicit names, which makes lines of code harder to read, but individual variables easier to understand, while my coworker prefers shorter variable names which make the syntax of a line more readable. There is value in both approaches and the choice is purely a choice of style and philosophy, but I’m sure you can see there’s a similarity in the reasoning behind the choices of variable names and more explicit types.

But why?

Initially I was debating about which approach is better – to be more explicit or more compact – but then my thoughts turned to “why do I have to cast at all?”. I would say there are generally two reasons to explicitly cast:

  • Either I know the type beyond what the compiler knows and therefore need to persuade it that the type is correct,
  • Or I intend to convert the value to a type that it isn’t currently

These are really very different things, and I’m only really going to talk about the first case. In the latter case of converting the type, the type-casting syntax is really just a syntactic convenience for what could have been done with intrinsic functions.

Casting, in the latter case, is used to persuade the compiler to treat some data as a type that the compiler doesn’t otherwise think the data is. In the particular example of casting volatile const to const, we are persuading the compiler that for the purposes of myAwsomeVariable, we don’t need to think about the data as volatile. Of course this is a complete lie! Either the data is volatile or it isn’t. If it isn’t volatile, then why was it declared that way? And if it is volatile, then why is it legal to say it isn’t? The code could introduce a subtle bug: the developer of myFunction assumes that the data doesn’t change, while clearly there is some good reason that we need to be able to change it. In short:

There is a conflict of expectations between the caller and callee, for which the compiler nicely produces a type error, but which we suppress with a type cast and thus introduce a subtle bug

When I say “introduces a subtle bug”, I’m not being strictly truthful. There is no bug introduced in the code, because I’ve seen the functions myself and I know that the value won’t change for the duration of the function call. But this does emphasize my point: I needed to look at the code on both sides of the function call to know that it’s safe. This is bad. I need to know that myFunction doesn’t cache the value across multiple calls (thinking it’s constant and persistent), and I need to know that the caller doesn’t change the value during the call (a threading race-condition), or during repeated calls (if myFunction also expects the value to be persistent across a longer time). The cast introduces one more piece of information that needs to be manually tracked throughout the program, coupling one part to another. In an ideal world, however, the function declaration should be enough to define the contract between the caller and the callee, when it comes to type consistency. Casting is an explicit breach of this contract, and so should never be allowed by the referee (the compiler).

Note that this situation isn’t unique to const or volatile, of course. The same thing happens in C all the time with other type casts. It happens where we know the type of something that the compiler doesn’t, or we want to keep track of the “type” of something ourselves and not give it to the compiler because it’s too stupid to let us do what we want with it.

Another simple example might be a C container type, for which one implementation would be for it to contain void*-typed items because C doesn’t have templates or generics. Once again we know more about the data than the compiler: we know what type of objects the container holds. And once again we have to perform casts to persuade the compiler of what we know. And once again, different parts of the program are coupled together by our special knowledge. That is, we have to keep track in our minds (or in the comments), so that the part that puts items into the container only puts in items that the other part of the program can take out. And once again this opens up a huge gaping whole for bugs to creep in.

What I’m saying is this: casting is not feature of the language, it’s a code-smell that says that either your code is badly designed, or that the language is conceding defeat and needs to give you a kludge because its type system has failed you.

So, whats the solution?

I think the obvious solution is add more power to the type system. In the end, the compiler should know almost as much as we do about the code, so that if a cast is actually valid then it isn’t necessary. That is, if it’s valid to cast from volatile const to const, then the compiler already knew that it wasn’t volatile at that point of the program so we don’t need to cast it.

This is what happens with generics in C# – if we read an item from a List<int>, then the compiler already knows that it’s valid to cast the item to int, so a cast becomes redundant. That is, the item already is an int, so there’s no point in casting it. Indeed in C# I find that I never need casts (except for integer conversions which aren’t the same thing). When I find myself needing a cast, its generally because I’ve done something wrong with the design – I’ve coded something that needs to act generically, but without writing generic code.

You might be wondering how generics could possibly help in the example of const volatile above. Well, it wouldn’t be sufficient for C to have generics like C# (although it would certainly help). C++ templates are half a solution, but they have many shortfalls, which I won’t go into right now.

What would such a type system look like? Well if there was an obvious answer to that then C would probably already have it. Computer science research is constantly improving the situation at a theoretical level, and I think soon we’ll have the perfect low-level type system, just in time to not need it anymore.

Aren’t Subtype and Parametric Polymorphism the Same?

Aren’t Subtype and Parametric Polymorphism the Same?

There are apparently two common types of polymorphism in software: subtype polymorphism and parametric polymorphism. I think that these are really the same thing, and I can’t any good reason why modern languages separate the two. I’m going to explain my reasoning in this post.

Firstly, what is are subtype polymorphism and parametric polymorphism? Briefly, subtype polymorphism is when you use class or interface inheritance write more generic code1. The generic code works with the interface or base class, and therefore implicitly works for all subtypes or interface implementations. Parametric polymorphism is when you code using generics or templates – the code has a type parameter which can be substituted with a concrete type for each use of the code.

But what are the claimed differences between the two? Well, I’m going to demonstrate by an example. Lets say we have an abstract type, IPet, which has two implementations, Cat and Mouse, each of which implements the IPet’s Stroke function (I’m using C# for the examples):

interface IPet 
{
    void Stroke();
}

class Cat : IPet
{
    void Stroke() { Console.WriteLine("Purr"); }
}

class Mouse : IPet
{
    void Stroke() { Console.WriteLine("Squeak"); }
}

Now lets say we want a function to repeatedly stroke a pet a given number of times, but abstracted from what type of pet it actually is. We can implement this using subtype polymorphism like this:

void StrokePet1(IPet pet, int count)
{
    for (int i = 0; i < count; i++)
            pet.stroke();
}

And here is an implementation using parametric polymorphism:

void StrokePet2<T>(T pet, int count) where T: IPet
{
    for (int i = 0; i < count; i++)
            pet.stroke();
}

So what’s the difference? Well in terms of the implementation of StrokePet1 and StrokePet2 I see no difference. Both function implementations read like this:

Given a pet of any type:
    Loop count times:
        Stroke the pet

Okay, well, in C# there is a nontrivial difference. Particularly in the way you call the function. With parametric polymorphism, you need to know the type T upfront when you call StrokePet2 2. The same thing applies generally. For example, when you declare a List<T>T needs to be known upfront when the declaration is made (or it needs to be a type parameter in itself)

There is another difference in C#: the parametric type T can be used in multiple locations to ensure that each declaration has the same type. Consider we have a function that evaluates two pets of the same species, and returns the “winner” (by some unknown criteria):

T ChooseWinner<T>(T left, T right) where T: IPet;

And now lets try implement this using subtype polymorphism:

IPet ChooseWinner(IPet left, IPet right);

Oh dear.. there’s no way to specify that left is the same pet type as right or the return type. This is the same problem we would have with using non-generic container classes: there’s no way to specify at compile time that some abstract queue should only contain elements of a certain type:

// This queue is supposed to only contain IPets, but how do we specify it?
Queue myQueue = new Queue(); 

myQueue.Enqueue(new Mouse());
myQueue.Enqueue(new Cat());
myQueue.Enqueue(new object); // Ooops. Why can't the compiler catch this? 

IPet first = myQueue.Dequeue(); // why won't it allow this?

So wait, am I saying they’re actually different? No. I’m saying that in C#, subtyping polymorphism is just parametric polymorphism but with less static information.

I don’t think this is a shortcoming of subtype polymorphism – it’s a shortcoming of the language to not be allow us to fully express the relationships between between types used in a class or function. There’s no way to specify that a C# Queue container has to be homogeneous without using generics, but that just means that C# subtyping polymorphism isn’t as expressive as its parametric polymorphism. Conversely, parametric polymorphism in C# is more strict because we need to supply more static information. Consider the following:

Queue myQueue = new Queue();

// Here myPet is a mouse, but we haven't told the compiler it must be
object myPet = new Mouse();

myQueue.Enqueue(myPet); // Why won't it allow this?

So really, it’s all about the balance between information we keep in our heads vs information we give to the compiler for static type checking.

So What?

What’s the point? Does anyone care that there may be some airy-fairy overarching concept that ties parametric and subtype polymorphism? I would say yes, and the reason is about code re-usability and performance.

Regarding performance, in C# there is a difference in the performance of using each type of polymorphism. In the case of parametric polymorphism, the underlying code can be specialized for the provided type argument, making it more efficient. Lets say we have an INumber interface, (like Java’s Number class). Should we really have to decide between the following two implementations:

INumber Add(INumber first, INumber second)
{
    return first + second;
}

TResult Add<TFirst,TSecond,TResult>(TFirst first, TSecond second)
    where TFirst: INumber
    where TSecond: INumber
    where TResult: INumber
{
    return first + second;
}

This seems like the sort of thing the compiler should sort out, akin to inline optimization there should be specialization optimization. Inlining of functions involves taking their parameters and substituting concrete arguments to specialize it for a specific call site – sound familiar?

Regarding reusability, note that functions with generic parameters are not interchangeable with functions that have virtual parameters, and so to some extent the designer of a function or class needs to think ahead about how much of it to make generic. But why? Why can’t the language make generics seamless: the coder should specify exactly the required details for the code, and no more. Everything else should just be “generic” by default. Need a number variable…? Just use INumber, not int or short. The compiler can “inline” (specialize) specific types as they become known. Whether the compiler generates an actual abstract object with dynamic dispatching at runtime, or just specializes the code to the concrete type, should just an optimization issue and not the concern of the programmer.

As a side note, there are languages which make generics much less invasive. For example, F# is a language where I think it really is feasible to make all code generic (except at the root). But I think we need this capability in more popular OOP languages like C# and Java.

In conclusion, I think that parametric and subtype polymorphism are actually the same thing in a statically typed language. In terms of program operation, subtype polymorphism is just like a parametric polymorphism that’s missing a few static features. Popular modern languages should use this make it easier to write reusable, high-performance code.


  1. The exact details actually depend on the language, so this is a bit of an over simplification 

  2. Of course if you don’t know the derived type upfront C# will happily substitute the base type into the generic, in which case you’re then using subtype polymorphism instead