Month: May 2014

C#’s async is the wrong way around

C#’s async is the wrong way around

C# recently (in the last few years) introduced the asyncawait language feature. If you aren’t familiar with it, I’ll wait here while you go check it out.

Welcome back. Async is awesome right?

So lets say we have some asynchronous method, Foo, and we want to call it from the asynchronous method Bar. The syntax looks like this:

Task Foo(); // Defined somewhere else

async void Bar()
{
    await Foo();
}

So what does await Foo() mean? To quote from MSDN:

The await operator is applied to a task in an asynchronous method to suspend the execution of the method until the awaited task completes

Seems simple enough. The statement await Foo() calls Foo, suspends Bar, and resumes Bar when Foo comes back with a result. Now lets consider what the synchronousversion of the code would have looked like (assuming Foo2 is the synchronous version of whatever Foo would have done):

void Foo2(); // Defined somewhere else

void Bar2()
{
    Foo2();
}

So, what does Foo2() mean in this synchronous example? Well, to me it means, “suspend (block) execution of Bar2 while you execute Foo2, and then resume it when Foo2 comes back with the result”. This is obvious: when you call a function you don’t expect the caller to continue before the called function returned. The caller “awaits” the called function. Right?

The natural expectation is for the calling function to wait for the called function to complete. The asynchronous code should actually look like this1:

Task Foo(); // Defined somewhere else

async void Bar()
{
    Foo(); // Implicitly awaits Foo
}

If we actually intended to use the result as a task instead of awaiting it, we could do something like this:

Task Foo(); // Defined somewhere else

void Bar()
{
    Task fooTask = async Foo(); // Does not await foo
}

In the above example, the async call can be thought of as something like a “fork”, because the called function can head off asynchronously to the calling function. Not in the thread or process sense of the word “fork”, since async has very little to do with threads directly, but in the sense that control is in some way branched.

I’m not actually saying that the C# async feature was designed incorrectly. The choice to do it this way in C# is very reasonable considering backwards compatibility with previous versions, compatibility with .NET, and especially backwards compatibility with what people think a “call” actually is. Rather, what I’m demonstrating is that “async” is actually a feature that’s always existed in some way since there were first function calls – the caller always has “awaited” the callee. When people tell you what C#’s await does, they’re in a way really telling you about things that the compiler and runtime do to implement the waiting differently to normal. The way in which the caller is suspended is different between the asynchronous and synchronous code, but that shouldn’t have to change the model that we use to reason about our code.

It strikes me then, that using async and await in C# is much like using inline code in C. They’re both special syntaxes that prompt the compiler to perform specific optimizations. If you write “synchronous” code then the entire stack (and thread context) is preserved and “suspended”, while if you put in a few special words “async”, “await”, and “Task<>” then you cue the compiler to only preserve the current stack frame.

Perhaps for a while this is the only way it will be, but I imagine that slowly it will change so that async vs sync merely becomes an optimization detail. All functions, and property accessors, will work equally with sync or async, so you won’t have to choose between them 99% of the time. You won’t need two separate functions to GET from a URL asynchronously vs synchronously. You simply call the function, and the compiler figures out the best way calling convention (synchronous vs asynchronous).


  1. Actually, if I was designing the language from scratch, the asynchronous code would look identical to the synchronous code above, since they both do the same thing 

Rethinking goto, Part 2

Rethinking goto, Part 2

A while back I discussed some ideas for new branching constructs to compliment the if-statement and switch-statement in a low-level language. Today I’m going to tell you about another pattern which I’ve seen commonly used to represent complicated control flows without `goto`s.

First, lets come up with some example of a control flow that doesn’t fit well into the standard branching constructs of `switch`, `if`, `while` and `for`. The example I’ve devised is deserializing fields from a binary stream. Let me explain the hypothetical situation before I go into the solution. In the stream we have a byte that represents the field type: perhaps a 0x01 represents a “short string” type, a 0x02 represents a “long string” type, and a 0x03 represents a 32bit integer type. If the field type is a short string, then we’ll say the next byte in the stream is a length of the string field, and the bytes after that are the characters in the string field. If the field type is a long string, then the next 4 bytes are a 32bit length followed by that number of characters. If the field type is an integer, then the next 4 bytes are simply the integer value. Here’s an example stream that has a short string “AB”, followed by a long string “C”, followed by the integer 421:

Example-stream

If the field is a string, then after we know how long it is we need to check that all the characters are valid string characters (by some specification we don’t care much about right now), and if the string passes validation then we copy it into wherever it is going. Here is a reasonable flowchart illustrating these steps as directly as possible:

ReadFieldFlowchart

First, lets represent this as directly as we can in C code. The first branch is clearly a switch statement, but  the most direct way to represent the remaining control flow is using `goto`s.

void readField()
{
    switch (readFieldType()) {
        case SHORT_STRING:
            // ...
            goto validate_chars;
        case LONG_STRING: 
            // ...
            goto validate_chars;
        case INT: 
            // ...
            goto assign_to_field;
    }

validate_chars:
    //...
    if (isValid)
        goto skip_n_bytes;

skip_n_bytes:
    // ...
    goto next_field;

assign_to_field:
    // ...
    goto next_field;

next_field:
    // ...
}

I would argue that this is actually the most readable way of representing it, particularly if people maintaining the code can refer back to the flow diagram. It’s easy to reason about, since each block starts with a label and ends in a `goto`. I’ve chosen to order it in a similar way to the flow chart, so that control only  flows downward. This code is no more “spaghetti-like” than the diagram is. In fact, it reads very much like a document or book would: each label is a heading marking the section as a stand-alone unit.

But goto’s are bad, right? Well this is an endless debate, but let’s assume that they are and that we need an alternative. How about this:

void readField() {
    switch (readFieldType()) {
        case SHORT_STRING:
            // ...
            validate_chars();
        case LONG_STRING: 
            // ...
            validate_chars();
        case INT: 
            // ...
            assign_to_field();
    }
}

void validate_chars() {
    // ...
    if (isValid)
        skip_n_bytes();
}
void skip_n_bytes() {
    // ...
    next_field();
}
void assign_to_field() {
    // ...
    next_field();
}
void next_field() {
    // ...
}

It satisfies our requirement of not using `goto`s. Of course now we need to elevate some of our local variables to global variables, but nobody complains about global variables as much as they do about `goto`s, so this is an improvement, right?

Well, no. From a readability perspective nothing much has changed. The “headings” are now function names rather than labels, and the `goto`s are now function calls, but the code otherwise looks the same.

Bad code using `goto`s can trivially be represented as equally bad code without `goto`s – or possibly worse code because it’s now more spread out and pollutes the global namespace with extra functions and persistent variables. I’m not saying this is bad code – it may or may not be, but that’s beside the point. The point is that it’s not just possible, but trivially easy to write “spaghetti code” without `goto`s, since I believe any goto-based code can be similarly converted to function calls2.

Goto-Call

This brings me to another, more subtle point. The above example is an abuse of the idea of a “function call”. This may be subjective, but I think that the idea of a call (at least in an imperative language) is to perform a sub-task and then come back to the calling task to complete whatever operation it was on. There is an implicit hierarchy here: the called function is a “sub-function” in terms of its contribution of the caller. This is physically manifested in the call stack, where you can see that the calling function still has an activation record on the stack while the called function is executing, in anticipation of re-acquiring its temporarily delegated control.

This is not the way I’m using the “function call” feature in the above example. I’m instead intending it as a “fire-and-forget kinda call”. The caller isn’t saying “I need your help, can you please do this for me and get back to me later”, it’s saying “I’m done, you’re next, I grant you control from now on”. The latter idea sounds familiar – permanently passing control one way from one part of the program to another. It’s called a `goto`. And I’ll use the term “goto-call” to mean a call to a function in such a way.

An example that comes to mind of where I’ve seen this intention in a function call is in raising events in C#. I’ll take an example from the MSDN guidelines for raising an event in C#:

protected virtual void OnThresholdReached(EventArgs e)
{
    EventHandler handler = ThresholdReached;
    if (handler != null)
    {
        handler(this, e);
    }
}

What does the call `handler(this, e)` on line 6 mean? Does it mean “I need your help, get back to me when you’ve got an answer”, or does it mean “goto the place I’ve called `handler` and I don’t care if you come back to me”3? It means the latter. It’s a “goto” in disguise.4

In a high level language this doesn’t matter. Using the feature of “calling a function” for “going to another part of the program” is fine. We waste a little stack space keeping the caller state when we don’t need it, incur some minor overhead preserving registers and allocating the new frame, and make it harder to read the call stack when we’re debugging, but the language is less complicated than it would be if we had to distinguish a “goto call” from a “normal call”.5

In an embedded microcontroller environment, and in a very imperative, stateful environment, I don’t think this is the case any more. I really think that low level languages like C should support a “goto-call” feature which is like a call (control moves to another function and you can pass arguments) but is intended never to return the caller, or is intended to return to the caller’s caller.

From the programmer’s perspective the “goto-call” would be a mechanism of communicating the intent of the code to other programmers. It tells other people reading the code that this is not a call “stack” operation, but a call “queue” operation – it’s one thing happening after another rather than one thing happening during another. It also tells the compiler “I don’t intend control to come back here”, so the compiler can helpfully produce an error if the “goto call” is not a valid tail call.6

Conclusion

I’ve shown that the problems with using `goto` are not unique to the `goto` feature, since there is a relatively trivial translation from goto-style programming to a form that uses function calls instead. I’ve used this as another argument as to why goto’s are not intrinsically bad, since you can write code that is at least as bad using function calls, and we do not consider function calls to be intrinsically bad.

I’ve also suggested that since calls can be used in this way, that sometimes we conflate the idea of a “call” with the idea of a “goto-call”, and suggested that if some imperative languages distinguished between the two by supporting a new “goto-call” feature then it would not only make the intent of code clearer to its readers, but also enable additional static checking and performance optimizations. I’ve given two concrete examples of where this would be useful: the example of reading a hand-crafted format from a binary stream in C using functions, and the example of event dispatching in C#.


  1. Assuming little-endian integers and ASCII character encoding 

  2. I’ve glossed over some other differences between the two ways of doing it. If you use function calls you don’t have control “falling through” to another function by default if you forget to call the next function. Also, it’s much more difficult to combine traditional control structures such as while loops with this “function call” form. I think neither of these factors decrease the “spaghetti-ness” of the function-based code. However functions have some additional flexibility: we can call functions indirectly through pointers, we can put them physically wherever we want, and we have more access to the individual states from “outside”. Whether or not these are good things depends on the situation. 

  3. Of course it does need to come back to somewhere, but it could come back to the caller of `OnThesholdReached`, like a tail-optimized call 

  4. Another example that comes to mind is in continuation-passing style calls, where you typically execute a continuation as “I’m done, you’re next” and not “do this and get back to me”. Keeping the caller of the continuation on the stack is the moral equivalent of keeping the “returner” of a non-CPS function on the stack 

  5. For those who’ve worked with C# async, wouldn’t it be wonderful if the continuation of an “awaiting” function didn’t have a gazillion frames in the call stack, especially with recursive algorithms like “await the previous item in a long list” 

  6. Perhaps ironically, in attempting to justify a more imperative style of programming using goto’s, we’re actually encouraging a more functional style of programming using mutual tail recursion to “loop” through a stream.