Tag: Ideas

Rethinking goto, Part 2

Rethinking goto, Part 2

A while back I discussed some ideas for new branching constructs to compliment the if-statement and switch-statement in a low-level language. Today I’m going to tell you about another pattern which I’ve seen commonly used to represent complicated control flows without `goto`s.

First, lets come up with some example of a control flow that doesn’t fit well into the standard branching constructs of `switch`, `if`, `while` and `for`. The example I’ve devised is deserializing fields from a binary stream. Let me explain the hypothetical situation before I go into the solution. In the stream we have a byte that represents the field type: perhaps a 0x01 represents a “short string” type, a 0x02 represents a “long string” type, and a 0x03 represents a 32bit integer type. If the field type is a short string, then we’ll say the next byte in the stream is a length of the string field, and the bytes after that are the characters in the string field. If the field type is a long string, then the next 4 bytes are a 32bit length followed by that number of characters. If the field type is an integer, then the next 4 bytes are simply the integer value. Here’s an example stream that has a short string “AB”, followed by a long string “C”, followed by the integer 421:

Example-stream

If the field is a string, then after we know how long it is we need to check that all the characters are valid string characters (by some specification we don’t care much about right now), and if the string passes validation then we copy it into wherever it is going. Here is a reasonable flowchart illustrating these steps as directly as possible:

ReadFieldFlowchart

First, lets represent this as directly as we can in C code. The first branch is clearly a switch statement, but  the most direct way to represent the remaining control flow is using `goto`s.

void readField()
{
    switch (readFieldType()) {
        case SHORT_STRING:
            // ...
            goto validate_chars;
        case LONG_STRING: 
            // ...
            goto validate_chars;
        case INT: 
            // ...
            goto assign_to_field;
    }

validate_chars:
    //...
    if (isValid)
        goto skip_n_bytes;

skip_n_bytes:
    // ...
    goto next_field;

assign_to_field:
    // ...
    goto next_field;

next_field:
    // ...
}

I would argue that this is actually the most readable way of representing it, particularly if people maintaining the code can refer back to the flow diagram. It’s easy to reason about, since each block starts with a label and ends in a `goto`. I’ve chosen to order it in a similar way to the flow chart, so that control only  flows downward. This code is no more “spaghetti-like” than the diagram is. In fact, it reads very much like a document or book would: each label is a heading marking the section as a stand-alone unit.

But goto’s are bad, right? Well this is an endless debate, but let’s assume that they are and that we need an alternative. How about this:

void readField() {
    switch (readFieldType()) {
        case SHORT_STRING:
            // ...
            validate_chars();
        case LONG_STRING: 
            // ...
            validate_chars();
        case INT: 
            // ...
            assign_to_field();
    }
}

void validate_chars() {
    // ...
    if (isValid)
        skip_n_bytes();
}
void skip_n_bytes() {
    // ...
    next_field();
}
void assign_to_field() {
    // ...
    next_field();
}
void next_field() {
    // ...
}

It satisfies our requirement of not using `goto`s. Of course now we need to elevate some of our local variables to global variables, but nobody complains about global variables as much as they do about `goto`s, so this is an improvement, right?

Well, no. From a readability perspective nothing much has changed. The “headings” are now function names rather than labels, and the `goto`s are now function calls, but the code otherwise looks the same.

Bad code using `goto`s can trivially be represented as equally bad code without `goto`s – or possibly worse code because it’s now more spread out and pollutes the global namespace with extra functions and persistent variables. I’m not saying this is bad code – it may or may not be, but that’s beside the point. The point is that it’s not just possible, but trivially easy to write “spaghetti code” without `goto`s, since I believe any goto-based code can be similarly converted to function calls2.

Goto-Call

This brings me to another, more subtle point. The above example is an abuse of the idea of a “function call”. This may be subjective, but I think that the idea of a call (at least in an imperative language) is to perform a sub-task and then come back to the calling task to complete whatever operation it was on. There is an implicit hierarchy here: the called function is a “sub-function” in terms of its contribution of the caller. This is physically manifested in the call stack, where you can see that the calling function still has an activation record on the stack while the called function is executing, in anticipation of re-acquiring its temporarily delegated control.

This is not the way I’m using the “function call” feature in the above example. I’m instead intending it as a “fire-and-forget kinda call”. The caller isn’t saying “I need your help, can you please do this for me and get back to me later”, it’s saying “I’m done, you’re next, I grant you control from now on”. The latter idea sounds familiar – permanently passing control one way from one part of the program to another. It’s called a `goto`. And I’ll use the term “goto-call” to mean a call to a function in such a way.

An example that comes to mind of where I’ve seen this intention in a function call is in raising events in C#. I’ll take an example from the MSDN guidelines for raising an event in C#:

protected virtual void OnThresholdReached(EventArgs e)
{
    EventHandler handler = ThresholdReached;
    if (handler != null)
    {
        handler(this, e);
    }
}

What does the call `handler(this, e)` on line 6 mean? Does it mean “I need your help, get back to me when you’ve got an answer”, or does it mean “goto the place I’ve called `handler` and I don’t care if you come back to me”3? It means the latter. It’s a “goto” in disguise.4

In a high level language this doesn’t matter. Using the feature of “calling a function” for “going to another part of the program” is fine. We waste a little stack space keeping the caller state when we don’t need it, incur some minor overhead preserving registers and allocating the new frame, and make it harder to read the call stack when we’re debugging, but the language is less complicated than it would be if we had to distinguish a “goto call” from a “normal call”.5

In an embedded microcontroller environment, and in a very imperative, stateful environment, I don’t think this is the case any more. I really think that low level languages like C should support a “goto-call” feature which is like a call (control moves to another function and you can pass arguments) but is intended never to return the caller, or is intended to return to the caller’s caller.

From the programmer’s perspective the “goto-call” would be a mechanism of communicating the intent of the code to other programmers. It tells other people reading the code that this is not a call “stack” operation, but a call “queue” operation – it’s one thing happening after another rather than one thing happening during another. It also tells the compiler “I don’t intend control to come back here”, so the compiler can helpfully produce an error if the “goto call” is not a valid tail call.6

Conclusion

I’ve shown that the problems with using `goto` are not unique to the `goto` feature, since there is a relatively trivial translation from goto-style programming to a form that uses function calls instead. I’ve used this as another argument as to why goto’s are not intrinsically bad, since you can write code that is at least as bad using function calls, and we do not consider function calls to be intrinsically bad.

I’ve also suggested that since calls can be used in this way, that sometimes we conflate the idea of a “call” with the idea of a “goto-call”, and suggested that if some imperative languages distinguished between the two by supporting a new “goto-call” feature then it would not only make the intent of code clearer to its readers, but also enable additional static checking and performance optimizations. I’ve given two concrete examples of where this would be useful: the example of reading a hand-crafted format from a binary stream in C using functions, and the example of event dispatching in C#.


  1. Assuming little-endian integers and ASCII character encoding 

  2. I’ve glossed over some other differences between the two ways of doing it. If you use function calls you don’t have control “falling through” to another function by default if you forget to call the next function. Also, it’s much more difficult to combine traditional control structures such as while loops with this “function call” form. I think neither of these factors decrease the “spaghetti-ness” of the function-based code. However functions have some additional flexibility: we can call functions indirectly through pointers, we can put them physically wherever we want, and we have more access to the individual states from “outside”. Whether or not these are good things depends on the situation. 

  3. Of course it does need to come back to somewhere, but it could come back to the caller of `OnThesholdReached`, like a tail-optimized call 

  4. Another example that comes to mind is in continuation-passing style calls, where you typically execute a continuation as “I’m done, you’re next” and not “do this and get back to me”. Keeping the caller of the continuation on the stack is the moral equivalent of keeping the “returner” of a non-CPS function on the stack 

  5. For those who’ve worked with C# async, wouldn’t it be wonderful if the continuation of an “awaiting” function didn’t have a gazillion frames in the call stack, especially with recursive algorithms like “await the previous item in a long list” 

  6. Perhaps ironically, in attempting to justify a more imperative style of programming using goto’s, we’re actually encouraging a more functional style of programming using mutual tail recursion to “loop” through a stream.