Continuations in C
There are times when you need to call a function, but you want to say “call me back when you’re done” rather than blocking the current thread.
An example might be when you’re reading a file. If you imagine for a moment that every CPU cycle is on the scale of 1 second, then disk access is in the order of days to months (take a look at the coding horror post about it). When you call a simple C function like fread, you could be blocking the current thread for millions of CPU cycles. You don’t want to be blocking the thread, because threads are a valuable resource and multithreading is a difficult skill.
The Typical Solution
The typical way to solve this in C is to use a callback function. I’m not going to explain callback functions here, since there’s an abundance of information about them on the internet. Instead I would like to point out a convenient pattern of how to store state for the callback function.
Let’s use a concrete example. Say we have some function bar
, which is expected to take a long time to execute, and a function foo
which needs to call bar
. The synchronous way of writing the code (non-callback way) might look like this:
int bar() { // Some time-consuming task // ... return 42; } void foo() { // Some code before we call bar // ... int b = bar(); // Some code after we call bar // ... }
The task finishes by returning some result of the long process. For the purposes of this example, we’ll say that the result is 42
.
If we convert it to the asynchronous form (the callback form) it might look like this:
typedef void (*bar_callback)(int result); void bar(bar_callback callback) { // Start some long-running process. // Save the callback somewhere so that // when the long-running process is done // we can call it with the result callback(42); } void foo() { // Some code before we call bar // ... bar(continue_foo); } void continue_foo(int b) { // Some code after we call bar // ... }
Note that normally bar
would not call the callback itself, but instead save the callback to be called later. I’ve only called it directly from bar
as a convenience in the example.
The Problem
I’ve seen this pattern many times. But it’s flawed in a major way: if foo
has some state that must be persisted across the call to bar
, how does it get that state to the continue_foo
function? For example, if I declare a variable in foo
, how do I access the variable in continue_foo
? Typically what I see is that people will simply use global variables. But that’s an awful solution for many reasons1.
Slightly Better
A better pattern, which I’ve used myself quite often, is to for foo
to tell bar
, “please hold the value of XYZ for me, and when you call me back, please give XYZ back to me to remind me why I called you in the first place and help me remember where I left off”. It might look like this:
typedef void (*bar_callback)(void* state, int result); struct Foo_State { int x; // Some "variable" that foo wants to have preserved }; void bar(bar_callback callback, void* state) { // Start some long-running process. // Save the callback and state somewhere so that // when the long-running process is done we can // call it with the result and the state callback(state, 42); } void foo(struct Foo_State* state) { // Some code before we call bar // This code can save things to // the `state`. // ... bar(continue_foo, state); } void continue_foo(void* state_, int b) { struct Foo_State* state = (struct Foo_State*)state_; // Some code after we call bar. // This code also has access to variables in `state` // ... }
A few quick points I’d like to draw your attention to:
- Bar only sees the type `void*`, and not something more specific like `Foo_state`, because obviously `bar` may be called by other functions as well, not just `foo`
- Rather than allocating foo’s `state` on the heap, foo just accepts the state as a parameter, leaving it up to the caller to decide where it must be allocated. This parameter is only to say where the state should be stored, and is not expected to have any values populated by foo’s caller.
Let me emphasize that last point again: there is no heap allocation involved in this example. The state could very easily be statically allocated, or pool-allocated, or even stack allocated2. Especially, consider that foo’s caller is likely to face the same problems foo has faced with state management, and so might already have it’s own state structure which would provide the perfect home for foo’s state structure without incurring an additional heap allocation.
The Best Solution
But we can do even better. The problem with the above example is that we’re passing two things around: the callback function pointer, and the callback state pointer3. Let’s take a look at a way of doing this while only passing one pointer:
typedef void (*bar_callback)(void* state, int result); struct Foo_State { bar_callback call; int x; // Some "variable" that foo wants to have preserved }; void bar(bar_callback* callback) { // Start some long-running process. // Save the callback and state somewhere so that // when the long-running process is done we can // call it with the result and the state (*callback)(callback, 42); } void foo(struct Foo_State* state) { // Some code before we call bar // This code can save things to // the `state`. // ... state->call = continue_foo; bar(&state->call); } void continue_foo(void* state_, int b) { struct Foo_State* state = (struct Foo_State*)state_; // Some code after we call bar. // This code also has access to variables in `state` // ... }
I’ll draw your attention to the differences:
- `Foo_state` now contains a field called `call` which holds the callback function pointer. It’s important that this field is the first field in the structure so that a pointer to this field is also a pointer to the whole structure.
- The callback function signature still accepts the state as a parameter, as before.
- The call to `bar` no longer takes two parameters but now only takes a pointer to the callback function pointer (note the double-pointer)
- When `bar` needs to call the callback function, it needs to dereference it first. It also needs to pass the callback state. But since, by design, we’ve said that a pointer to the callback function [pointer] is also a pointer to the callback state, we can simply pass that pointer as the argument. This gives us the interesting syntax `(*callback)(callback, result)`. In a sense, this is saying “call the callback, and tell it which callback we called”.
Those who are familiar with how object-orientated programming works under the hood may recognize this pattern. Typically objects are laid out in memory such that the first field in the object state is a pointer to the class vtable. When you call a virtual member function on the object, the pointer-to-the-object is treated as a pointer-to-the-vtable-pointer and is used to resolve the virtual dispatch. In our example above there is actually less complexity and overhead, since we don’t need a whole vtable but can point directly to the function.
Advantages
I love this pattern because it’s really clean and quite simple. The whole callback, including the function and the state, is neatly represented by a single pointer4.
The callback pointer can be called using a very self-contained syntax. That is, it only depends on one variable, not two. This is actually not just a matter of syntax: a single variable means better use of CPU registers, and fewer accesses to memory.
The fact that the callback is represented by one small value also makes it easier to manage. There’s much less risk of calling the callback with the wrong state. It’s also lighter to pass around.
Disadvantages
The most obvious disadvantage to me is that it uncommon. Someone looking at the code for the first time won’t just understand what’s happening straight off the bat. It also means that there’s no language support for it. C++ is in some ways an extension to C with language support for first-class objects. But there is no common language that is an extension to C with support for this kind of first-class-function (with state).
The performance of using this pattern isn’t a disadvantage in my opinion. If you’re comparing it to the performance of a “naked” function pointer, then yes, you may incur some overhead from passing the additional state argument and from double-dereferencing the function pointer. But consider that this type of function call should actually be faster than calling a virtual function in a most object orientated languages (which has a triple-dereference), because there’s no vtable lookup. And virtual function calls are in turn typically faster than interface function calls (and correspondingly virtual functions with multiple inheritance, depending on the optimizer and conditions).
I’d also like to dispel another disadvantage, not directly related to the pattern but more about using callbacks in general. At first glance it seems that there is a lot of overhead in accessing persistent variables in the state structure, because instead of saying “x” you have to say “state->x”, which implies an extra pointer deference and possibly some pointer arithmetic. But think about this: how are variables normally accessed anyway? Variables are normally stored in the stack frame, which is essentially a structure pointed to by the stack-pointer. Yes, there may be less register elevation which would affect the performance, but I think it may be less of a problem than you’d expect.
Likewise, at first glance it seems that there is extra space used to store the callback function pointer. But in reality, a stack frame also stores the “callback” function pointer anyway: we just normally refer to it as the “return address”. An important point to note in the last example, is that the very last thing foo
does is call bar
. This is what’s called a tail call, and it means that any half-decent optimizer will re-use foo
‘s stack frame space for bar
. To put it another way: while bar
is active, foo
doesn’t use any stack space, but it does use space in the persistent state structure (wherever that may be), and the persistent state structure has many of the same attributes as the stack frame would have had, including a pointer into code space. From this perspective, there is no extra space required to store the callback address in the state structure.
The only thing missing is hardware support. A “normal” call has hardware support for automatically populating the return address into the state structure (aka stack frame) and saving register states etc (aka saving persistent variables). And a “normal” return has built-in support for dereferencing the stack pointer to obtain the return address pointer (note the double-pointer again) and jumping to that address, all in one step. But I imagine that if this pattern became more common in usage (probably with language support), hardware support would probably follow.
Until then, I still think it’s a great pattern to use in C, and we should all add it to our toolbox of C patterns.
Please ask me – I’ll be happy to tell you all the reasons why it’s so horrible ↩
In the less likely scenario that the caller decided to manually block the thread using thread synchronization techniques. ↩
On most modern architectures this would just mean that it takes twice the space, since there are two pointers involved. But C doesn’t require function pointers to be the same size as data pointers. One embedded architecture I work with has function pointers that are twice the size of normal heap pointers – after all, RAM is more expensive per bit than ROM ↩
A RAM pointer, which in some cases is smaller than a function pointer, giving it yet another advantage over the typical callback ↩