# Continuations in C

There are times when you need to call a function, but you want to say “call me back when you’re done” rather than blocking the current thread.

An example might be when you’re reading a file. If you imagine for a moment that every CPU cycle is on the scale of 1 second, then disk access is in the order of days to months (take a look at the coding horror post about it). When you call a simple C function like fread, you could be blocking the current thread for millions of CPU cycles. You don’t want to be blocking the thread, because threads are a valuable resource and multithreading is a difficult skill.

## The Typical Solution

The typical way to solve this in C is to use a callback function. I’m not going to explain callback functions here, since there’s an abundance of information about them on the internet. Instead I would like to point out a convenient pattern of how to store state for the callback function.

Let’s use a concrete example. Say we have some function bar, which is expected to take a long time to execute, and a function foo which needs to call bar. The synchronous way of writing the code (non-callback way) might look like this:

The task finishes by returning some result of the long process. For the purposes of this example, we’ll say that the result is 42.

If we convert it to the asynchronous form (the callback form) it might look like this:

Note that normally bar would not call the callback itself, but instead save the callback to be called later. I’ve only called it directly from bar as a convenience in the example.

## The Problem

I’ve seen this pattern many times. But it’s flawed in a major way: if foo has some state that must be persisted across the call to bar, how does it get that state to the continue_foo function? For example, if I declare a variable in foo, how do I access the variable in continue_foo? Typically what I see is that people will simply use global variables. But that’s an awful solution for many reasons1.

## Slightly Better

A better pattern, which I’ve used myself quite often, is to for foo to tell bar, “please hold the value of XYZ for me, and when you call me back, please give XYZ back to me to remind me why I called you in the first place and help me remember where I left off”. It might look like this:

A few quick points I’d like to draw your attention to:

• Bar only sees the type void*, and not something more specific like Foo_state, because obviously bar may be called by other functions as well, not just foo
• Rather than allocating foo’s state on the heap, foo just accepts the state as a parameter, leaving it up to the caller to decide where it must be allocated. This parameter is only to say where the state should be stored, and is not expected to have any values populated by foo’s caller.

Let me emphasize that last point again: there is no heap allocation involved in this example. The state could very easily be statically allocated, or pool-allocated, or even stack allocated2. Especially, consider that foo’s caller is likely to face the same problems foo has faced with state management, and so might already have it’s own state structure which would provide the perfect home for foo’s state structure without incurring an additional heap allocation.

## The Best Solution

But we can do even better. The problem with the above example is that we’re passing two things around: the callback function pointer, and the callback state pointer3. Let’s take a look at a way of doing this while only passing one pointer:

I’ll draw your attention to the differences:

• Foo_state now contains a field called call which holds the callback function pointer. It’s important that this field is the first field in the structure so that a pointer to this field is also a pointer to the whole structure.
• The callback function signature still accepts the state as a parameter, as before.
• The call to bar no longer takes two parameters but now only takes a pointer to the callback function pointer (note the double-pointer)
• When bar needs to call the callback function, it needs to dereference it first. It also needs to pass the callback state. But since, by design, we’ve said that a pointer to the callback function [pointer] is also a pointer to the callback state, we can simply pass that pointer as the argument. This gives us the interesting syntax (*callback)(callback, result). In a sense, this is saying “call the callback, and tell it which callback we called”.

Those who are familiar with how object-orientated programming works under the hood may recognize this pattern. Typically objects are laid out in memory such that the first field in the object state is a pointer to the class vtable. When you call a virtual member function on the object, the pointer-to-the-object is treated as a pointer-to-the-vtable-pointer and is used to resolve the virtual dispatch. In our example above there is actually less complexity and overhead, since we don’t need a whole vtable but can point directly to the function.

I love this pattern because it’s really clean and quite simple. The whole callback, including the function and the state, is neatly represented by a single pointer4.

The callback pointer can be called using a very self-contained syntax. That is, it only depends on one variable, not two. This is actually not just a matter of syntax: a single variable means better use of CPU registers, and fewer accesses to memory.

The fact that the callback is represented by one small value also makes it easier to manage. There’s much less risk of calling the callback with the wrong state. It’s also lighter to pass around.

The most obvious disadvantage to me is that it uncommon. Someone looking at the code for the first time won’t just understand what’s happening straight off the bat. It also means that there’s no language support for it. C++ is in some ways an extension to C with language support for first-class objects. But there is no common language that is an extension to C with support for this kind of first-class-function (with state).

The performance of using this pattern isn’t a disadvantage in my opinion. If you’re comparing it to the performance of a “naked” function pointer, then yes, you may incur some overhead from passing the additional state argument and from double-dereferencing the function pointer. But consider that this type of function call should actually be faster than calling a virtual function in a most object orientated languages (which has a triple-dereference), because there’s no vtable lookup. And virtual function calls are in turn typically faster than interface function calls (and correspondingly virtual functions with multiple inheritance, depending on the optimizer and conditions).

I’d also like to dispel another disadvantage, not directly related to the pattern but more about using callbacks in general. At first glance it seems that there is a lot of overhead in accessing persistent variables in the state structure, because instead of saying “x” you have to say “state->x”, which implies an extra pointer deference and possibly some pointer arithmetic. But think about this: how are variables normally accessed anyway? Variables are normally stored in the stack frame, which is essentially a structure pointed to by the stack-pointer. Yes, there may be less register elevation which would affect the performance, but I think it may be less of a problem than you’d expect.

Likewise, at first glance it seems that there is extra space used to store the callback function pointer. But in reality, a stack frame also stores the “callback” function pointer anyway: we just normally refer to it as the “return address”. An important point to note in the last example, is that the very last thing foo does is call bar. This is what’s called a tail call, and it means that any half-decent optimizer will re-use foo‘s stack frame space for bar. To put it another way: while bar is active, foo doesn’t use any stack space, but it does use space in the persistent state structure (wherever that may be), and the persistent state structure has many of the same attributes as the stack frame would have had, including a pointer into code space. From this perspective, there is no extra space required to store the callback address in the state structure.

The only thing missing is hardware support. A “normal” call has hardware support for automatically populating the return address into the state structure (aka stack frame) and saving register states etc (aka saving persistent variables). And a “normal” return has built-in support for dereferencing the stack pointer to obtain the return address pointer (note the double-pointer again) and jumping to that address, all in one step. But I imagine that if this pattern became more common in usage (probably with language support), hardware support would probably follow.

Until then, I still think it’s a great pattern to use in C, and we should all add it to our toolbox of C patterns.

1. Please ask me – I’ll be happy to tell you all the reasons why it’s so horrible

2. In the less likely scenario that the caller decided to manually block the thread using thread synchronization techniques.

3. On most modern architectures this would just mean that it takes twice the space, since there are two pointers involved. But C doesn’t require function pointers to be the same size as data pointers. One embedded architecture I work with has function pointers that are twice the size of normal heap pointers – after all, RAM is more expensive per bit than ROM

4. A RAM pointer, which in some cases is smaller than a function pointer, giving it yet another advantage over the typical callback

# Sequences: Part 5

Last time, I talked about push and pull regarding sequences. We saw that it’s more convenient to write code that pulls from its inputs and pushes to its outputs. We took a look at C#’s generators, and how they enabled us to write sequence-processing functions in this way, without the need for intermediate buffers.

Let’s quickly recap generators. A generator in C# looks like a traditional function (with a signature that returns a sequence), but it can push values to the caller using the special syntax yield return, which essentially puts the generator function “on hold” until the consumer/caller asks for the next value1:

The two parties involved  here are the generator and the caller (which I’ll call the consumer since the generator is a producer).

When the consumer asks for the next value in the sequence, the generator function is temporarily “resumed”, long enough to produce the next value of the sequence. Last time we drew an analogy with freezing time to explain why it’s easier to write the generator code now that it thinks it’s pushing values to the consumer.

But it’s important here to note who is being paused an who is being resumed. When the compiler is producing IL for the consumer function and the generator function, it is the generator that gets reorganized into a form where it can be paused and resumed (it gets converted into a class which implements the IEnumerable<T> "pull" interface).

But what would happen if the next item in the sequence just wasn’t available yet. If we go back to last week’s C example of reading input from the user by pulling values from getchar (or Console.Read in C#), you can see that generators wouldn’t fix the conflict between push and pull in that case.

Let’s simplify things a bit to investigate further. Instead of considering a whole sequence of items, let’s say that there’s just one item. We can pull the item from somewhere by calling a function that returns that item:

When the consumer calls PullFromProducer to fetch the item (an integer), the caller is blocked until the PullFromProducer function returns (synchronously).

The generator syntax in C# still uses this pattern under the covers – the generator function still returns IEnumerable<T>, which as we know from our previous exploration is a pull-based iterator interface.

But what if PullFromProducer simply doesn’t yet have the value that it needs to return? For example, how do we implement the Pull function if it’s to pull from a network connection, which may not have received the value yet?

Like the C# generator makes it possible to pause the producer, wouldn’t it be nice if there was a way to pause the consumer? Obviously we can do this with threads, but wouldn’t it be nice if there was a way to do this without the overhead of threads?

It turns out that in C# there is. C# 5 introduced the concept of async functions. You’ve seen async functions before on this blog, so I won’t go into too much detail. If you aren’t too familiar, I highly recommend reading up about them (here is the MSDN introduction, and I also highly recommend Jon Skeet’s Eduasync series for really getting to know what’s going on behind the scenes2 ).

Using async we can make code that looks like this:

The magic happens in the consumer this time. The consumer function is suspended at the “await” point until the producer pushes the value to the consumer.

To emphasize what’s happening here, let’s look at a slightly different example :

If you run this3 you’ll see the output is something like this:

The interesting thing is the order of the messages. The message line “Consumed: 42” occurs directly after “Pushing value to consumer” rather than after “Consumer is about to await value”, which clearly shows that the consumer is suspended during the intermediate time. But just like with generators, it’s important to realize that the above example does not create any additional threads. Just like with generators, the async functionality is implemented by the compiler by creating a new class behind the scenes.

This solves our problem, right?

Nope.

The problem is that async only works with a single value. We can use it to push a once-off item, but not whole sequences of items.

C# is stuck with two different ways of doing things with sequences. There’s the pull-based approach with IEnumerable<T>. And there’s the push-based approach with IObservable<T> ((I won’t go into IObservable, but if you’re interested take a look at reactive extensions – they echo many of the great features of IEnumerable, such as all their extension methods, but do it for a push-based interface instead of a pull-based one).

What we need is something more like an IAsyncEnumerable<T> interface, which combines task-based asynchrony with a sequential pull-based interface. We also need language support for IAsyncEnumerable<T>, including generators and foreach statements. The combination of generators and IAsyncEnumerable would allow us to have everything we’ve been looking for so far:

• No containers required (sequences don’t have to be in memory before you can work on them)
• Zero buffering overhead (when we can process sequences as fast as they’re produced)
• Completely abstract sequence types (a sequence of user key press events can be as much a sequence as an array of integers)
• Push/pull agnostic (IAsyncEnumerable covers both push and pull cases equally)