Category: Uncategorized

C++ vs JS
Binary Type Coupling

C++ vs JS
Binary Type Coupling

I was chatting to some of my colleagues at work recently about the potential benefits of a JavaScript compiler vs a C compiler, when targeting an embedded MCU. A concern that came up multiple times regarding different features, is

…but you can already do that in C, so how is JS better?

This is a good point. If you’re given a problem, you can solve it in JS, or you can solve it in C (or C++). Is JavaScript really any better? Of course, you know my answer, which is probably pretty biased. But let’s dig into a few of the details.

Firstly, let’s put aside the question of performance for a moment. I argue that JS is going to yield a more performant program1, while I’m sure that most C programmers will argue that C will perform better. Probably both are right, under different circumstances. Let’s just side step this issue for the moment.

There is at least one very significant reason why I think JS has a lot more to offer than C or C++, and that’s the ability to write reusable code. This is the gateway feature that opens the door to a 10x improvement in productivity, because it means that you can leverage highly customizable libraries to do most of the heavy lifting, rather writing the code yourself. Yes, in C and C++ you can use third party libraries. But in JS you can do it an order of magnitude faster. There are many, many times, where it’s taken me literally less than 5 minutes to go from “I wonder if there’s a library that already does this”, to using the library and moving on to the next task.

The same cannot be said for any third party C or C++ library or code file that I’ve ever used, which typically take days or weeks of integration work, reading documentation, figuring out why it doesn’t compile2, and then debugging cryptic issues.

So why is this the case? Is it the package managers? Is it the community? Is it the language?

I’m guessing it’s multiple reasons, but I’m going to focus on one in particular in this post: binary coupling (at least that’s what I’m calling it).

An Example: a Sum Function

Let’s take a dummy example. Consider a function that adds together an collection of numbers (sound familiar?). In JavaScript, you can be certain it will almost definitely have a signature like this:

We can use TypeScript to make things even more unambiguous, although this doesn’t change the meaning of the code:

This function logically takes a collection of numbers. In JS, collections of this nature are represented as arrays. The output is the sum, which is obviously also a number.

What might this look like in C or C++? Here are some options:

Forgive me if I have some or all of these wrong. In my life, I’ve programmed much more in C++ than in JavaScript, but C++ always remains a little bit too complicated for me to remember all the subtleties.

What’s the difference between all of these options? In my mind, they are all logically the same thing. They represent a function to which you pass an “array” of numbers, and it give you the result.

The difference comes mainly down to how a client interacts with these functions at a binary level. What calling convention is used? How is the array represented in memory? What parameters or aspects of the function are available at compile time vs runtime?

If sum was a third party library, which of these forms would it take?

We have at least one good answer to this question, since the functionality is already part of the STL in C++. It takes the following form:

There’s nothing surprising about this. Since we’re talking about the standard template library, you expect this to be a function template. It’s a fairly generic solution, but certainly it won’t suit everyone. Because it’s a template, you can’t take the pointer of it, you can’t pass it around as an argument to non-template code, you can’t export it directly as a library function from an object file, etc. For example, try to translate the following JavaScript code into C++:

The useSum and arr value might or might not be available at compile time — a reusable piece of code shouldn’t assume one or the other.

How C++ Usually Solves This

The solution to this in C++ is normally to shunt everything to runtime when you need reusability. As a case in point, take a look at the GPIO methods that the mbed library provides. To set a single bit that represents a GPIO pin, you would use a function with the following signature:

This makes for fairly reusable code. But it comes at a cost. It invokes a function call3, using a runtime value. The function then needs to use indirect accesses and masks to figure out what bit to set, internally using a structure that looks like this:

The absolute cost here is a low – a function call is not a big deal, masking and indirection are quick, and passing an integer argument at runtime is not a big deal. But the relative cost is high — if value and the specific port happened to be known at compile time, this whole thing could have been one machine instruction. Now it has to be many more, and extra memory overhead.

But surely in JavaScript this is even worse? Isn’t everything at runtime in JavaScript?

I would argue “no”. In JavaScript you don’t specify what gets done at compile time vs runtime — you leave it up to the JavaScript engine to decide. This is why I use the term “binary coupling”. In C++, your code is coupled to the binary representation of the resulting machine code — you are forced to make certain decisions, and those decisions force the C++ compiler to use certain representations. This makes C++ a glorified assembly language — it’s a way for you to write machine code in a somewhat abstracted way.

While in JavaScript, you don’t write machine code. You specify the behavior of the program in unambiguous terms, and leave it up to the engine or compiler to implement the specification in terms of machine code.

You get a small taste of this in C++ with micro-optimizations and inlining. You write a function, and in some circumstances the compiler is free to chose not to represent that as an actuall CALL instruction in the output binary. This is an example that breaks the usual binary coupling, since the code does not dictate the binary interface. But this kind of behavior is necessarily very limited in C++.

The discussion here is not primarily about performance, it’s about reusuability. What I’m demonstrating is that in C++, it’s impossible to have a library API that doesn’t also dictate the binary ABI that is used to access that library. As a result you’re much less likely to find an existing library that serves your specific purposes, since you need to match not only the logical interface, but the binary interface as well (and the type interface, which is another story).


  1. Not referring to interpreted or JIT-compiled JavaScript of today, but a hypothetical bare-metal compiler 

  2. Did you leave out a macro configuration flag somewhere? did you use the right make file for your platform? are the include files out of date? were you supposed to use the .dll files, or the .o files, or the .a files? the instructions are for GCC, how do I do this in VC++? the instructions are for version 1.34 but only version 1.32b is ported to my system, and for some reason the make file doesn’t run 

  3. The class itself is a small inline wrapper around the HAL, but the HAL is implemented as separate object files 

Coding on Paper in an Interview

Coding on Paper in an Interview

My company is in the process of hiring a back end or full stack software engineer. During the interview process, I often get complaints from candidates when I bring out an old fashioned pad of paper and pencil and ask them to write some code for me (or I offer for them to code on the whiteboard).

This is 2017, and we have syntax highlighters, auto-complete, Stack Overflow, etc. Why would I want to test a candidate’s ability to code on paper?? This seems barbaric and absurd. Like asking a racing car driver to ride around a track on a bicycle to see how well he handles the corners. Surely we’re past this antiquated form of testing in this day and age?

Well, no. Let me share my side of the story.

I need people who can code

The position is for a software engineer. I need to know that you can code1. It’s quite simple.

We can have a good chat about Azure, about how you parted the red sea, or how you helped get a man on the moon. But in the end, you’re here to code, so you need to show me that you can.

Hiring a coder without seeing them code, is like hiring a chef without tasting their food.

The number one best way to test your real-world coding ability would be to have you work with me for 3 months, but that’s not a practical way to screen people for a new position.

Another way for me to establish your level of skill would be for me to look at work you’ve done in the past. AKA your public contributions to GitHub, Stack Overflow, blogs, etc. I am perhaps one of the few managers who will actually clone your GitHub repos and look through your code, your commit history, your documentation, your blog posts2, your answers on forums, etc.

A quick side note on this: remember that I’m also a software engineer, like you. I understand perfectly well that there are times when you need to throw a heap of “poor” code together, just to get something done quickly, or because it’s just an experimental throw-away project that you were working on in your spare time. I won’t judge you on this. The signature of good style and craftsman ship will still shine through, even in your darkest code, so I still read it.

But, I can’t just look at past work you’ve already done. I also need to look inside your head. I need to see those cogwheels spinning when you’re working on a problem. I don’t just want to see the final outcome, I want to know what thoughts went through your mind on the journey to get there. The best way to do this is for us to work on a piece of code together, in an interview. Here you can explain to me what you’re thinking, and ask questions3

The code itself is the easy part

In the past, I used to ask two coding questions in an interview. I started with an easy one, and then would progress to the harder question. But I became tired of the emotional distress of watching people struggle through the harder question, and often eventually give up.

So, now I just ask the easy question4.

In fact, here it is. I’ll give it to you.

Write a function that adds up a collection of numbers.

That it.

That’s it?

Yes. That’s it.

In fact, I’ll give you the5 answer too6 :

Here’s another one:

Side note: One person has pointed out that they would just use the Sum function that .net provides. Well done. Ten brownie points for you7. But I then I will follow on by saying something like “Somebody on the .net development team had to write that builtin Sum function. Pretend it was you. How would you write it?”

There are obviously many different ways to write some code that satisfies the initial requirements statement of this test question. This is true in the real world as well — feature requests come in many different forms, to varying degrees of specification, and you need to be able to dig in and understand the problem that the user is trying to solve, and explore different solutions and the effectiveness of each.

Be resourceful. Dig in.

I don’t care whether or not you remember the syntax for generics or extension methods in C#. I don’t care if you remember what the C# equivalent to a fold/reduce function is. These are questions that Google or Stack Overflow can answer in the real world.

I don’t care about your spelling. This is something a spell checker would check in the real world.

I want to see how you engage a problem. When you’re actually working on the job, you’re going to have all sorts of weird and wonderful requests thrown your way. How do you deal with them? Do you blindly start pecking at the keyboard, or do you dig into what’s actually being asked?

If you need some tool to help you, do you say “this company is so stupid, they don’t even provide xyz!”, or do you say “how do we get xyz?”.

The same is true in an interview. Do what you need to do to get the job done. If you need to understand the context more, then ask me. If you need auto-complete and an IDE, or Google, ask for a laptop (actually I typically offer one).

At the very least, I expect you to ask me what kind of numbers we’re adding together.

If you need help remembering the way something works, or the syntax for something, ask me. Let’s work through it together. I want to know that you can work effectively with me and others on the team. I want to know that you feel comfortable asking for advice, that you’re not the kind of person to suffer in silence.

If you think that the whole thing is stupid, let me know. I want to know that the people on my team can challenge me and can stand up to being challenged themselves.

It’s about why

It’s not about what your answer looks like. It’s about why you made certain choices. Were there performance concerns you were worried about? What usage scenarios were you planning for — and did you ask? Is one language construct better than another from a maintainability standpoint? What approaches did you consider, and how did you weigh up each?

If you’re [un]lucky enough to interview with me, keep in mind that the main thing I’m looking for is your reasoning and approach when encountering a problem. Talk me through your thinking.

It’s a toy

Yes, this coding exercise is a toy example. And yes, I’ll make stuff up as I go. If you ask whether the numbers need to be integers or floats, I’ll make the answer up as a I go, and it might be different every time I do an interview.

But put yourself in my shoes for a moment. In an interview setting, we only have time to play with a toy example. We don’t have time for me to explain a real problem and have you craft a real solution for me. I need to know how you code, and a toy example is the only tool in the box that can get the job done effectively in a short span of time.

Hopefully by now you understand that it really won’t help you to use a laptop in my interview. The things I’m looking for are things where a laptop won’t help you. Sorry. But do whatever makes you feel comfortable.


  1. …and that you can communicate effectively, and work well with my team, and have the relevant experience. But all of this is pointless if you can’t code. 

  2. I don’t care what the blog posts are about actually — I will evaluate you based on passion, ability to communicate clearly, etc. Bonus points if your blog is about coding, and I can get a two-for-one evaluation on communication and coding. 

  3. Like maybe “WTF are we doing this for?” 

  4. [Edit:] Since writing this post, I’ve changed my mind about the second, “hard” question. I now ask the hard question as well, depending on how much time is available. I might also ask a “time-boxed” version of the hard question — saying something like “do as much as you can of this in 10 minutes”. Part of the reason is that I want to give you the opportunity to present your skills on structurally different questions. 

  5. *an 

  6. Since the position is for an Azure developer, the most common language of choice for this question is C# 

  7. Even if you didn’t know that it existed, I’d give you points for saying “I’m sure this function already exists in some library somewhere. I would probably first work out if I can use that, rather than reinventing the wheel” 

Why Types Introduce Coupling

Why Types Introduce Coupling

Last time I gave a long post about why C is complicated and JavaScript can make your code simpler and less buggy. Today I want to touch on another idea: that statically typed languages introduce a level of coupling in your program that’s unnecessary and makes maintenance harder.

Imagine that I ask you to write a function to simply iterate through a list of integers and output them to the console. In C, you might write the following code:

But wait a second. I didn’t tell you that the list was going to be presented a contiguous array! What if it was a linked list? Then the code might look like the following, assuming that we had previously declared the types LinkedList and LinkedListNode appropriately:

The problem in both of these is that the function we’ve written is intimately coupled to the type of the parameters it accepts. There’s no way in C to express the idea of “I want my function to accept any list-like type”. This means that we cannot reuse the above code in other circumstances where the type is even slightly different.

C++

In C++, we can address this problem to some extent by using templates. We could write the above function something like this in C++:

There are still some of the same issues in this code. For example, do we pass an iterator pair to the function (as many STL functions do), or do we pass the container itself?

To me this is a non-solution. For one thing, it’s saying that if you want to write reusable code that’s decoupled from the types of its parameters, you need to write everything in your program as templates. This is clearly not an option for many reasons, including compilation time, error messages, complexity, etc.

All that C++ templates give us is a taste of a world without explicit types. The above C++ function is decoupled from the caller simply because it does not state what type it expects.

It’s true that there are languages that perform type inference, and allow you to write pretty generic code without losing static type safety, but as with C++, the error messages and language complexity associated with this type of coding makes it difficult for beginner-to-intermediate level programmers.

C#

At first glance, you may thing that higher level statically-typed languages solve this with better abstraction mechanisms. Consider the C# code for the above function:

The above function says that it accepts any argument that is enumerable, meaning anything that can be iterated. This is fantastic from a reusability and code maintenance perspective, so it does address one side of the issue. But there is another issue lurking behind the scenes. One that may be more noticeable to people working in very resource-constrained environments like in embedded firmware.

Although the above C# code does not couple itself to the specific type of integer list, and so can be used throughout the program, it does actually still have a level of coupling at the binary level: the function accepts something that has a binary signature of IEnumerable<int> . It is not the same as a C++ template that is decoupled by having the template instantiated for multiple binary signatures.

This machine-level coupling is invisible to the programmer, which is convenient and important when it comes to managing large and complex codebases. But remember that interfaces as a language feature are normally implemented at the binary level by passing control through several levels of indirection. I won’t go into detail – you can look it up yourself if you’re interested – but here is a brief summary to illustrate my point.

The machine code must first look up a virtual dispatch table from the interface pointer, and then look up the correct entry in the dispatch table corresponding to the method you want to call, and then perform an indirect call through that entry in the dispatch table. The indirect call doesn’t go straight to the function implementation, but first through a thunk that adjusts the this  pointer from the interface’s this  pointer to the underlying object’s this  pointer (which are different because objects can implement multiple interfaces), and then the thunk redirects to the actual function. This is to perform a single function call on an abstract interface.

Although the C# language doesn’t dictate these details, you have to be aware of them anyway. The logical types are still coupled to the binary types, and so the choices you make about how generic you want your code really do still have an effect on the performance and overheads involved in your program.

JavaScript

So what’s different about JavaScript? Simply put, JavaScript doesn’t require you to specify these type details at all.  This not only brings a level of decoupling to the logical program, where you gain benefits of reusability and maintainability, but also unshackles the compiler to theoretically be able to apply a whole new range of optimizations. As compilers are getting better and better, I think this will be more and more the case, and soon the performance of a program will be more related to how few details you dictate, rather than how many, and the performance of JavaScript programs will exceed those of the C and C++.

Why JavaScript

Why JavaScript

People who have known me for a while, and long-time readers of this blog, will know that I never used to be a fan dynamically typed languages. The static type system available in C# and C++ can help to prevent many bugs and runtime errors, and allow the programmer to add extra clarity to the code.

But in recent years, I’ve come to acquire a taste for JavaScript. So much so that I’m actually on a mission to write a JavaScript compiler for embedded firmware, so that I can bring the bliss of JavaScript to the nightmare of embedded programming.

For those who don’t already know, JavaScript is a completely different language to Java. Please, do not use the word Java to refer to JavaScript, or vice-versa. They are as unrelated to each other as C++ is to Java. I know this is stated many times, but I’m expecting that some of the readers of this post are coming from a purely C/C++ background, and might not be aware of this fact.

Many people have written about the pros and cons of JavaScript, so what I’m saying here is not unique, but I’m going to approach it from the perspective of an embedded firmware programmer, who typically works in C or C++ on a daily basis. Some of what I say will also apply to those of you in other arenas.

Familiarity

One thing that JavaScript has going for it, is that the syntax and structure will be familiar to most embedded programmers (and many other programmers). It uses curly-braces to denote blocks of code. Function calls are done using parentheses with comma-separated arguments, as in foo(1,2,3). Most of the C/C++ operators work as expected, such as the logical operators &amp;&amp; and ||, and the bitwise operators &amp; and |.

These may seem like obvious things, if you’re from a background of C/C++, C#, or Java, or some related language. But bear in mind that not all languages are like this, and so the familiarity of JavaScript eases the learning curve associated when coming to JavaScript from languages like C/C++. Compare these two snippets of code, one in C, and the other in JavaScript, that implement the Fizz-Buzz challenge:

If you’re a C programmer, then hopefully the above similarity appeals to you.

Simplicity

Perhaps the number one thing I like about JavaScript is simplicity. With JavaScript you can often write very clean and readable code, because you’re focusing on the algorithm and behavior, rather than adding bloat associated with types and ABI’s and memory concerns.

In C++, you’re writing code to appease three different rulers: the type system, the machine, and the reader/maintainer. Most C++ code seems to be a compromise between these, but in JavaScript your code can focus on the reader/maintainer, and let the compiler worry about the machine. You worry more about clearly describing, in code, the behavior of the program, rather than worrying about how the compiler/interpreter will implement that behavior in terms of instructions and memory access, or worrying about constructing elaborate types to describe the meta-rules about what the program is allowed to do.

As a thumb-suck estimate, I probably spend at least half my time in C++ trying to appease the static type checker. And I would say that 90% of that work is spent on false-positives – cases where the type checker notes an inconsistency, but where the same code code in a dynamically typed language would not have had a bug in it. Inconsistencies in the type system do not always equate to real program inconsistencies.

At the risk of making this post far too long, I’m going to give you an example, albeit somewhat contrived. Let’s say that we have a binary tree structure: a node in the tree is either a leaf-node with a value, or an internal node with no value but with left and right subnodes/children. Now let’s say that we want a function that flattens the tree, returning an ordered sequence of only the leaf values.

In JavaScript

In JavaScript, I can imagine a function that looks like this:

It uses a stack to iterate the tree. I could have made a much more succinct solution using generators and recursion, but I’m appealing to those of you who are coming from strictly imperative, procedural programming backgrounds, so that’s why I chose this approach.

In C

Let’s write the equivalent code in C.

Firstly, a node can be either a single value, or branches into two subnodes. How do we represent this?

Here is one way. A node can be either a leaf value or an internal node, and to tell the difference, we probably need some kind of flag (or tag). It also needs something to store the contents of the node, which is a union between the two different options:

Side note: the style of the above code might not be what you’re familiar with. Or or it might be. I don’t know because C (and C++) doesn’t come with a standard style, which is one of the ways in which I think JavaScript is better: there is a single generally-accepted style to writing JavaScript code.

The contents of an internal node is that it has a left and right sub-node:

Hmm. There’s another question here that’s unanswered. Who owns these nodes? Does the tree own them? Could there be multiple trees that share the same nodes? Is it even the same between one tree and the next, or within all the nodes of a tree? It’s not specified here, but it’s yet another “bloaty” detail to figure out.

We said that a leaf node is “a value”. But what value exactly? Is it an integer? Another struct? Is the value type the same for all nodes in a tree? Should the memory for the value embedded into the node, or should the node point to the value? If it points to the value, then who owns it? If it’s embedded, then how big is it, and are there any rules we have to follow when copying it or moving it to different locations in memory (is it pointed to by anything else, or has ownership of anything else)? So many questions. So many details that aren’t relevant to the problem at hand.

One way here is just to say that a leaf node has a pointer to the value, and that we don’t know anything further about what type it is:

I could save on some of the bloat by using an anonymous union, but I’d argue that not a whole is gained in terms of simplifying real complexity.

How much time have we wasted so far? Let’s recap what we’ve done:

  • We’ve defined one possible implementation of a tree node
  • We’ve coupled the implementation to issues of memory layout, such as whether nodes point to their children and values or have them embedded in the same memory.
  • We’ve opened many cans of worms regarding ownership, style, the type of contents, etc.
  • We haven’t even started writing the actual function yet.

Now for the actual function (oh dear, this is going to be a long post). We need a way to pass the tree to the function, and a way to retrieve the list back. We’ve already defined a type for the tree, there are other questions to be answered when it comes to passing it to the function:

  • Should the root node be passed by value, or by pointer?
  • Is the function be expected to mutate, or free the tree passed to it? For example, can the output list reuse the memory from the input tree?
  • Should the tree be passed using a shared global variable, or an actual parameter?1

There are similar questions about getting the output from the function, with all the same concerns that we’ve already discussed about representing trees. Should the result be a contiguous array or a linked list? Should it be allocated by the caller or callee? Should it be a shared global variable? And any number of other considerations.

I’m going to try my hand at a direct implementation in C, trying to sidestep issues of memory allocation by having everything passed to the function itself:

This code makes me cringe.

It doesn’t really match the spec, because it has a hard limit on how deep a tree can be. To get around that we would need some dynamic memory allocation, which would add a whole lot more bloat and complexity. This code also requires that the caller have some idea of the maximum size of the resulting list, which may or may not be easy to know.

The complexity is ridiculous. How many times do you need to look at *list++ = node->leafContents  before you can be sure that you’re incrementing the pointer to a list of pointers, and not incrementing the pointer in the list. Maybe we need to add some more bloat to encapsulate these: more functions for managing the list so we only have to write that kind of code once. Don’t even get me started!

So let’s see how well our beloved type system did. I’m going to compile the above code and see what errors come up.

Here are the list of mistakes I made. I’m categorizing each as either true-positive (the compile error saved my skin), false-positive (using a dynamic type system I would not have had a runtime  bug), or false-negative (I found a bug by looking actually the compiler didn’t catch it).

  • I forgot to #include <stdbool.h>  … arguably a false-positive, since stdbool is only needed if you have static types.
  • I forgot to include stdio.h. True-positive: I forgot to include a module that was relevant to the program behavior.
  • In creating the example tree,  the line tree[0].internalNodeContents.left = tree[1] , I was missing an & sign. I’m going to say this is also a false-positive. I was assigning one value to another, and the fact that one value is typed by pointer and the other by value is not a concern related to the algorithm or code logic.
  • To pop a value off the stack, I used stack[stackSize--] instead of stack[--stackSize]. This is a false-negative. The type system did bugger-all to protect me from accessing invalid memory. Luckily the hardware caught it and issued a seg-fault, but on embedded hardware you aren’t always so lucky! What’s more is that code that caused the issue is unrelated to the algorithm that the function was supposed to be implementing. In a sense, it’s the implementation of a completely different algorithm (the algorithm for pop stacks). So the bug in the code was not just not-noticed by the C compiler, but it was in a real sense caused by the limitations of the C language.
  • In printf("%d", list[i]), I was logically printing an integer to the console, since the list is a list of integers, but actually the integers are physically stored as references (pointers), so it should have been printf("%d", *((int*)list[i])). Pretty, ain’t it? This is a false-negative. There was a bug, but the type checker failed to find it. Instead it just printed out crap to the console. On GCC with the default settings2, there was no warning about this.
  • I’m not returning a “success” code, or checking the return code when the function is called. This caused no error in this case, but might cause strange behavior if there was something that did check the result error code, or a case where the error code was necessary (a failure). I’d call this a true-negative in this particular case. The function acts unexpectedly, but doesn’t explicitly say otherwise so actually there’s no spec that it’s defying. What’s more is that it doesn’t introduce a bug into this particular program.

So how does that compare with JavaScript?

Well, what happened when I ran the JavaScript program? Exactly what I expected to happen. It ran. It output the correct result. No errors. No bugs.

This is not because I’m an expert in JavaScript. I have many more years’ experience in C than JavaScript. It’s because simple, bloat-free code is easy to reason about, and so less likely to contain bugs.

Conclusion: please use JavaScript instead of C. You will have fewer bugs because your code is simpler. It will also cost less to develop because there are fewer concerns to worry about, and it will be easier to maintain because the code is clear and easy to understand.

In C++

I’m not going to implement the above in C++, but instead I’m going to say, in a hand-wavy way, that I don’t think it’s much better. In C++, you could write something that looks similar to the JavaScript version, using a stack from the STL to implement the stack variable. But the problem with this is similar to the problem with C: the implementations are coupled to the machine in a way that means when you bring in your favourite container, you’re forcing the compiler’s hand when it comes to implementing the code in terms of machine instructions and memory. The result is essentially bloat in a different kind of way. It get’s messy, and to make a solution that is as generic as the JavaScript one would require a ton of code, and with it a ton of bugs.

That’s all I’m going to say for the moment. If you come from the land of C++ and want to hear my opinion in more detail, leave a comment or send me an email, and perhaps I’ll make another post about it. This one is well long enough that I should be moving on to my last point.

Safety

The above C example leads me to another great thing about JavaScript: safety.  What I mean by safety (or lack thereof) is:

  • How easy is it to introduce bugs?
  • How bad are the bugs, and how difficult are they to fix?

C is awful in this respect. The simple, real, bug in the above code where I dereferenced a pointer that wasn’t assigned, leaves the program open to the most hideous kinds of erroneous behaviors – those that are non-deterministic, and can affect anything in the program. Once you’ve crossed the line of undefined behavior, things in completely unrelated parts of your program can start failing for no apparent reason, not matter how well you wrote them. This is not just a door for bugs, but also for malicious attackers.

In JavaScript , there is no such thing as undefined behavior. There are a few things that are implementation-defined, meaning that different JavaScript engines will execute them differently, but there is nothing like C or C++’s undefined behavior, where a line of code can have literally any effect, including damaging the data or even functions in unrelated parts of the program. When you want behavior to be well-defined, use JavaScript instead of C/C++.

JavaScript is also a very good abstraction. Programs execute in isolation from the rest of the system, which is great if you have safety-critical or security-critical applications which need to guarantee some sort of behavior.

Conclusion

I could go on and on about the benefits of JavaScript, and perhaps I will in later episodes, but for the moment I hope that in this extraordinarily long post I’ve convinced you that there is some hope to JavaScript, even to the point of using it in embedded firmware development.


  1. Most people would say using a parameter is preferable, but as I’ve said before: in C you’re appeasing multiple gods. The choice of whether to use a global variable or a parameter is not just about what is easier to reason about or better for code reuse, it’s also about the function ABI and the machine instructions generated. 

  2. and the -std=c99 flag 

Continuations in C

Continuations in C

There are times when you need to call a function, but you want to say “call me back when you’re done” rather than blocking the current thread.

An example might be when you’re reading a file. If you imagine for a moment that every CPU cycle is on the scale of 1 second, then disk access is in the order of days to months (take a look at the coding horror post about it). When you call a simple C function like fread, you could be blocking the current thread for millions of CPU cycles. You don’t want to be blocking the thread, because threads are a valuable resource and multithreading is a difficult skill.

The Typical Solution

The typical way to solve this in C is to use a callback function. I’m not going to explain callback functions here, since there’s an abundance of information about them on the internet. Instead I would like to point out a convenient pattern of how to store state for the callback function.

Let’s use a concrete example. Say we have some function bar, which is expected to take a long time to execute, and a function foo which needs to call bar. The synchronous way of writing the code (non-callback way) might look like this:

The task finishes by returning some result of the long process. For the purposes of this example, we’ll say that the result is 42.

If we convert it to the asynchronous form (the callback form) it might look like this:

Note that normally bar would not call the callback itself, but instead save the callback to be called later. I’ve only called it directly from bar as a convenience in the example.

The Problem

I’ve seen this pattern many times. But it’s flawed in a major way: if foo has some state that must be persisted across the call to bar, how does it get that state to the continue_foo function? For example, if I declare a variable in foo, how do I access the variable in continue_foo? Typically what I see is that people will simply use global variables. But that’s an awful solution for many reasons1.

Slightly Better

A better pattern, which I’ve used myself quite often, is to for foo to tell bar, “please hold the value of XYZ for me, and when you call me back, please give XYZ back to me to remind me why I called you in the first place and help me remember where I left off”. It might look like this:

A few quick points I’d like to draw your attention to:

  • Bar only sees the type void*, and not something more specific like Foo_state, because obviously bar may be called by other functions as well, not just foo
  • Rather than allocating foo’s state on the heap, foo just accepts the state as a parameter, leaving it up to the caller to decide where it must be allocated. This parameter is only to say where the state should be stored, and is not expected to have any values populated by foo’s caller.

Let me emphasize that last point again: there is no heap allocation involved in this example. The state could very easily be statically allocated, or pool-allocated, or even stack allocated2. Especially, consider that foo’s caller is likely to face the same problems foo has faced with state management, and so might already have it’s own state structure which would provide the perfect home for foo’s state structure without incurring an additional heap allocation.

The Best Solution

But we can do even better. The problem with the above example is that we’re passing two things around: the callback function pointer, and the callback state pointer3. Let’s take a look at a way of doing this while only passing one pointer:

I’ll draw your attention to the differences:

  • Foo_state now contains a field called call which holds the callback function pointer. It’s important that this field is the first field in the structure so that a pointer to this field is also a pointer to the whole structure.
  • The callback function signature still accepts the state as a parameter, as before.
  • The call to bar no longer takes two parameters but now only takes a pointer to the callback function pointer (note the double-pointer)
  • When bar needs to call the callback function, it needs to dereference it first. It also needs to pass the callback state. But since, by design, we’ve said that a pointer to the callback function [pointer] is also a pointer to the callback state, we can simply pass that pointer as the argument. This gives us the interesting syntax (*callback)(callback, result). In a sense, this is saying “call the callback, and tell it which callback we called”.

Those who are familiar with how object-orientated programming works under the hood may recognize this pattern. Typically objects are laid out in memory such that the first field in the object state is a pointer to the class vtable. When you call a virtual member function on the object, the pointer-to-the-object is treated as a pointer-to-the-vtable-pointer and is used to resolve the virtual dispatch. In our example above there is actually less complexity and overhead, since we don’t need a whole vtable but can point directly to the function.

Advantages

I love this pattern because it’s really clean and quite simple. The whole callback, including the function and the state, is neatly represented by a single pointer4.

The callback pointer can be called using a very self-contained syntax. That is, it only depends on one variable, not two. This is actually not just a matter of syntax: a single variable means better use of CPU registers, and fewer accesses to memory.

The fact that the callback is represented by one small value also makes it easier to manage. There’s much less risk of calling the callback with the wrong state. It’s also lighter to pass around.

Disadvantages

The most obvious disadvantage to me is that it uncommon. Someone looking at the code for the first time won’t just understand what’s happening straight off the bat. It also means that there’s no language support for it. C++ is in some ways an extension to C with language support for first-class objects. But there is no common language that is an extension to C with support for this kind of first-class-function (with state).

The performance of using this pattern isn’t a disadvantage in my opinion. If you’re comparing it to the performance of a “naked” function pointer, then yes, you may incur some overhead from passing the additional state argument and from double-dereferencing the function pointer. But consider that this type of function call should actually be faster than calling a virtual function in a most object orientated languages (which has a triple-dereference), because there’s no vtable lookup. And virtual function calls are in turn typically faster than interface function calls (and correspondingly virtual functions with multiple inheritance, depending on the optimizer and conditions).

I’d also like to dispel another disadvantage, not directly related to the pattern but more about using callbacks in general. At first glance it seems that there is a lot of overhead in accessing persistent variables in the state structure, because instead of saying “x” you have to say “state->x”, which implies an extra pointer deference and possibly some pointer arithmetic. But think about this: how are variables normally accessed anyway? Variables are normally stored in the stack frame, which is essentially a structure pointed to by the stack-pointer. Yes, there may be less register elevation which would affect the performance, but I think it may be less of a problem than you’d expect.

Likewise, at first glance it seems that there is extra space used to store the callback function pointer. But in reality, a stack frame also stores the “callback” function pointer anyway: we just normally refer to it as the “return address”. An important point to note in the last example, is that the very last thing foo does is call bar. This is what’s called a tail call, and it means that any half-decent optimizer will re-use foo‘s stack frame space for bar. To put it another way: while bar is active, foo doesn’t use any stack space, but it does use space in the persistent state structure (wherever that may be), and the persistent state structure has many of the same attributes as the stack frame would have had, including a pointer into code space. From this perspective, there is no extra space required to store the callback address in the state structure.

The only thing missing is hardware support. A “normal” call has hardware support for automatically populating the return address into the state structure (aka stack frame) and saving register states etc (aka saving persistent variables). And a “normal” return has built-in support for dereferencing the stack pointer to obtain the return address pointer (note the double-pointer again) and jumping to that address, all in one step. But I imagine that if this pattern became more common in usage (probably with language support), hardware support would probably follow.

Until then, I still think it’s a great pattern to use in C, and we should all add it to our toolbox of C patterns.


  1. Please ask me – I’ll be happy to tell you all the reasons why it’s so horrible 

  2. In the less likely scenario that the caller decided to manually block the thread using thread synchronization techniques. 

  3. On most modern architectures this would just mean that it takes twice the space, since there are two pointers involved. But C doesn’t require function pointers to be the same size as data pointers. One embedded architecture I work with has function pointers that are twice the size of normal heap pointers – after all, RAM is more expensive per bit than ROM 

  4. A RAM pointer, which in some cases is smaller than a function pointer, giving it yet another advantage over the typical callback 

Moonwalk debugging

Moonwalk debugging

How many times have you run into this situation:

You’ve just finished a large section of code. You launch your program in the debugger. You wait for it to start up, and then try it out a bit. But then… it crashes. Your debugger eagerly jumps up tell you there’s a problem. You look through the call stack and all the program variables, and you find something strange. A value that’s just clearly wrong. Why is it 25 and not 5? Why didn’t it remove the item from the list correctly? Why is it not sorted? You look through the code to try figure out the problem, but your conclusion is that, really, it should just be right. Your code looks fine – why did it get the wrong answer?  The only way to find out is to go back in time and breakpoint it at the point where it should have been generating the correct answer, but didn’t. So you set some breakpoints, and restart the whole program. You go through all the steps to reproduce the problem again (if you can even remember them). Finally you hit the breakpoint. You step slowly through your code, and finally find the problem. Or more likely, you find some other strange thing that may have lead to the first strange thing, and have to repeat the process. Or in haste you accidentally step over the most important line of code instead of stepping into it, and have to start again.

This used to happen to me all the time. In fact, it used to be my primary method of debugging: a binary search through the code, each time rerunning it to reproduce the problem. Wouldn’t it be easier to just step backwards through the code? The program breaks when it enters some invalid state. By the law of causality, the bug must have been in code that was before the point where the strange state occurred. Stepping forwards seems like the least useful thing you can do.

Nowadays I don’t have this problem as much, and I’ll tell you the secret: Write programs in a pure functional style. Write pure functions. That is, write functions that always return the same value if called with the same arguments, and that don’t have side effects that change the state of the system when the function is called. Write code that doesn’t mutate state.

Let me give you a real example. Recently I’ve been developing a graphics library for an embedded microcontroller. The previous graphics system was stateful: you had a mutable image buffer, and then you called different subroutines, such as drawLine, to affect a change to the pixels in the image buffer. I was tasked with creating a new graphics system – in part because we now needed to render graphics to an area many times larger than the size of all the microcontroller’s RAM.

The design I produced is almost completely stateless. For example, instead of calling a rotate function to rotate a graphic, I call a rotated function , which returns a completely new graphic object representing a rotated version of the original. Did you catch that? The word “rotate” is imperative: it commands the machine to do something. The result of what it does is known its side effect1. A function like this would probably return void, or a result code to confirm its success, and the rotation would be visible as a change in the original image.

On the other hand, the word “rotated” is, in this case, an adjective2. It describes not what the function does, but what its result is. If the argument is a picture of a dog, then the result is a rotated picture of a dog. It always will be. Every time. A rotated(dog) is always a rotated dog. The unrotated dog is still there, and still unchanged.

Then there are similar functions for creating a shifted graphic from an existing one, a composite graphic from other graphics, and a line, a polygon etc. As a quick implementation note: you may have guessed that these graphic functions can’t possibly return objects with all the pixels of the graphic they represent. The way this is resolved is with laziness – the graphic objects are only implemented as lightweight descriptions of what the resulting graphic should look like relative to their input. When it comes time to produce actual pixels, there is some short-circuiting magic that combines all the transformations into one ultra-efficient transformation, and then produces the pixels on-demand with almost no buffering required3.

Moonwalk the debugger

MoonwalkThe great thing is that doing things this way plays really well with the debugger. These techniques work in other environments, but I’m specifically going to talk about my experience in Visual Studio. When your program stops at a breakpoint or exception, you can rewind the stack one frame at a time (in Visual Studio 2013 this is Shift+F11), and you can rewind within a particular frame by simply jumping directly to that line of code (“Set next statement”, or Ctrl+Shift+F10). If your program is stateful, this is sure to break something. Typically when rerunning the same code twice you might not expect it to do the same thing the second time because the conditions may have changed since the first time it, and so the program may be in a different state, and thus produce different results. This isn’t a problem at all if you use immutable data structures and your functions have no state or side effects. Now you can step backwards! 

In addition to being able to step backwards, you can also evaluate arbitrary expressions in the debugger. For example, I have a function like saveGraphic(graphic, filename), which takes a graphic and saves it to a bitmap file. This is technically not a side-effect-free function since it writes to file, but for the purposes of debugging it has no side-effect which modify the state of the program in any observable way, and so is “pure” for practical purposes. In Visual  Studio, I can add a watch expression like saveGraphic(dog, "debug.bmp"), and the debugger will happily execute the function. Now I can see the value of my immutable graphic object! Since rendering doesn’t mutate program state, I can use “pure” functions like this to visualize any complex objects. I can even evaluate transformation functions in the watch window, such as saveGraphic(rotated(dog), "debug.bmp"), knowing confidently that my program state hasn’t changed as a result of evaluating the watch.

I hope you enjoyed this post, and learned something new that will help you with your bug battles, and challenge the way you think about coding.


  1. Whether or not a function’s “effect” is its primary purpose, it is still technically known as a “side effect” 

  2. I believe the term is deverbal adjective 

  3. I would probably have normally used matrices for this, but this microcontroller has no integer multiply or divide instructions, and no floating point anything

Finish before you Start

Finish before you Start

I find a lot of the coding I do is completely different, unlike anything I’ve done before. The reason is that if it’s anything like something I’ve done before then I’ve failed to recognize the pattern and abstract it out. As a software engineer I want to never be doing repetitive tasks, because that’s what software is for (repeating tasks). If you’re repeating yourself in any way then you’re not fully leveraging your time. Every pattern, every similarity, every “oh, I know how to do this because I did something similar with xyz” should be under careful scrutiny.

After a certain point you may run up against the limitations of a programming language, since many design patterns are simply language features that are missing. I come across this all the time in embedded C programming, where a “linked list” is not a container but the design pattern of having a next pointer in a structure (and all the algorithms that go with it), and a “virtual class” is not a data type but the pattern having a vtable pointer at the beginning of a structure.

At the point where the language betrays you, your next port of call could be writing your own domain specific language to target the problems you’re working on. In C# this may include the use of reflection to interpret existing C# data structures in new ways (such as interpreting a class as a list of serializable members), using expression trees to interpret C# expressions in new ways, or using reflection.emit to dynamically generate bytecode.

But despite the huge advantages of automating the mundane programming tasks and patterns, the problem arises: if you never repeat yourself, then everything is new. Everything is uncharted territory1. How do you know how long it will take? How do you know if it will work? Will it really be useful?

The solution is simple: finish before you start. Do whatever it takes to get the results of your experimentation before you actually perform the experiment. In software this means developing proof of concepts. Hack the result together using whatever means necessary. Spare no time for optimization or good coding technique. Don’t worry about corner cases and details. Get to the finish line by cutting across the sports field. Get to the finish line by cheating. By redefining the race. By faking the results. Calculate them manually. Hand-draw the GUI. Fake the input data. Brute-force the algorithm. Anything it takes.

It helps to clearly define the interface of whatever you’re coding: How will it interact with the rest of the system? If it’s a web page, the interface is the GUI. If it’s a library, the interface is the publicly exposed API. If it’s a DSL, the interface is a syntax or spec. Once the interface is there, it doesn’t matter if implementation is a birds-nest of cheating and killing. All that matters is that the interface offers just – just – enough to give you a little hint of what the future will look like. What it will feel like? Get a tangible product you can play with2.

Once you’ve seen the end, then you can start. A clearer picture of where you’re going will help not only in producing a better product or code, but it might change your direction completely. Once your fortune is told, it may bring news of triumph, or of peril. Perhaps when you see the end you realize it actually isn’t so great there. It isn’t as easy to use as you thought. It’s more complicated than anticipated. Perhaps the whole principle is flawed in some horrible way. There are many things that are hard to anticipate without getting a closer look. Or a hands-on experience. Perhaps you need to reconsider some of your assumptions, or abandon the whole idea altogether.

It’s better to know before you start.


  1. This is an exaggeration and oversimplification 

  2. I use the word “tangible” very loosely. A visual interface is “tangible” if you can see it, and a library interface is tangible if you can call it, etc. 

Everything you don’t say

Everything you don’t say

Recently I was reading an old blog post I came across about productivity variations among software developers, where he says that “researchers have found 10-fold differences in productivity and quality between different programmers with the same levels of experience and also between different teams working within the same industries“. My immediate instinct as I was reading it was that it makes perfect sense, and it’s supported by my own experience, and from what I hear from others in the software industry.

But at the end of the post, he talks about a study which compares productivity of the teams for Excel and Lotus, and makes between their productivity in lines of code:

 Excel’s team produced about 13,000 lines of code per staff year. Lotus’s team produced 1,500 lines of code per staff year. 

It’s interesting the huge difference in lines of code per staff-year, but is that really a measure of productivity? It’s not clear from the post whether a high LOC-per-year is considered high or low productivity, but it seems to mean high productivity in this context. If this is so, then I am probably the least productive person in my team. I make an effort to write as few lines of code as possible.

In fact for the last few months my average LOC per day is probably negative, if you count deletion of lines as negative lines produced1. That sounds terrible, but it’s because I not only try to write as few lines as possible, but I also simplify other people’s verbose code as I go through it. It’s not uncommon for me to rewrite existing code using 5 or 10 times fewer lines (typically simpler lines).

Let me clarify though. Writing fewer lines of code is not my goal, it’s a side effect. What I try to do is write simpler code. Code that’s easier to reason about. Code that’s more pleasant to maintain. Code that has fewer bugs just because it’s simpler. What generally results from this is fewer lines of code. I do consider fewer lines of code to be a goal, but only in service of readability.

Leverage

Another reason why fewer lines of code can be a good sign, is when you consider how much leverage each line of code has. I don’t know if leverage is the right word here, so let me explain. What I mean is simply a small piece of code that has a big effect, because of some amplification mechanism. For example, a bad coder might repeat almost the same line of code 20 times, while a good coder will instead write a for-loop which amplifies his single line of code to cover 20 cases2. A good programmer will use abstractions like this (and polymorphism, generics, frameworks, code generators, DSLs, etc) to amplify small pieces of code to great effects. A good programmer is not one who can write 1000 lines of code, but one who knows how to write 1 line of code in just the right way that it can be reused 1000 times. If he continues to write another 999 lines of this kind of code, he will have produced 1000 times more than the man who wrote the 1000 lines which had only one use34.

This sort of leverage is not linear, because good software engineers can leverage the leverage. They can abstract the abstractions. Any time you find yourself repeating something, you write code to do it for you. The more abstractly you can recognize these patterns, the more generally the solutions will apply and the less code you’ll need to write to get things done (although you have to be careful to not make code more complicated in all the abstraction).

Conclusion

I do understand that getting a quantitative measurement of programming productivity can be incredibly difficult. If it was easy, I could just give potential employers my objective productivity “score” in various areas of expertise, and they could simply hire me according to how this fits their objectives. It would also make it easier to pay software engineers what they’re individually worth. But I think “number of lines of code” is terrible variable to proxy for productivity. Especially when you look at the extremes of the spectrum, where people are really productive or where people get very little done.

This doesn’t change the fact that, at least for me, a better solution is one with fewer lines of code. Every line of code you don’t write is valuable, and often this value outweighs the cost of planning how to not write that line.


  1. If you didn’t count deletion as negative, then the best way to be productive is simply to delete and rewrite the same function every day for the rest of your life 

  2. You might think this is a bad example because nobody would write the non-for-loop code, but it’s exactly the situation I found just the other day 

  3. It’s difficult to qualify the word “use”, since this doesn’t need to refer to the programmer himself reusing the code, but could also just refer to the number of times it’s executed, like in the example of the for-loop 

  4. You might think 1000-times reusability is ridiculous, but there are number of places where I’ve achieved this level of “amplification”. They seem to normally manifest themselves as frameworks 

Thoughts about Random Access

Thoughts about Random Access

Recently I’ve been thinking about what it means for a data structure to support random access (aka “direct access”). This may seem so obvious that it’s not worth thinking about: an array supports “random access” because you can access any element with O(1) complexity by just calculating the elements position and going straight there. A linked list doesn’t have this property because the only way to get to an element is through other elements that it’s linked to. But let’s explorer this idea further.

Here’s one way Wikipedia explains random access quite well:

As a rule the assumption is that each element can be accessed roughly as easily and efficiently as any other, no matter how many elements may be in the set, nor how many coordinates may be available for addressing the data

Let me add another note to this which is perhaps obvious but not normally specified: we’re talking about runtime complexity. If we had a linked list completely defined at compile-time then I would still classify it as a “random access”, even though linked lists are normally not considered to be random access1.

So what is it about an array gives us random access? Let’s take a guess: it’s because the elements of the array are stored contiguously in memory.

ContiguousElements

This seems like a reasonable first guess. Because “element 2” is known to be after “element 1” which is after “element 0”, we can calculate where “element 2” is. To get the position of element 2, we multiply the size of an element by 2, and move to that binary position in the array.2

But what if the elements aren’t all the same size?3

ContiguousVariableElements

If element 1 in the array might be larger than another then we can’t multiply anymore so we need to somehow sequentially access the elements, right? Well, no. If the size of element 1 is known at compile time then we can simply offset this amount when calculating the positions of any element that comes after it.

Perhaps I deceived you when I said all the elements aren’t the same size in this example: I didn’t say whether or not there was static information about the sizes of elements. Lets say that we have an array, which we’ll define such that every second element is 4 bytes big, and every other element is 8 bytes big. To calculate the binary position of an even element in the array we just use index / 2 * 12 + (start of array) and to calculate the position of an odd element in the array we just say index / 2 * 12 + (start of array + 4) (assuming some implicit integer truncation). These are both constant time operations.

So to sum up, it’s not about whether the array elements are the same size, or how they are laid out in memory. It’s about what we know at compile time vs what calculations needs to be delayed until runtime, and how complicated those runtime calculations need to be. If we go back to our original definitions then this isn’t anything new – it’s just a restatement of our definition of random access. But it does make me think a bit differently about it. 


  1. This would be quite difficult to do in most languages. You could do it in C++ with template meta programming 

  2. Actually it’s not the size of the element that counts, but the pitch of the array – there might be padding bytes between elements to help properly align them. But let’s just imagine the “size” includes the padding between the elements 

  3. I don’t know any compiler that will generate arrays of elements that don’t all have the same inline value size, but we’re not so much talking about what compilers do as about what could be done 

Rethinking goto, Part 2

Rethinking goto, Part 2

A while back I discussed some ideas for new branching constructs to compliment the if-statement and switch-statement in a low-level language. Today I’m going to tell you about another pattern which I’ve seen commonly used to represent complicated control flows without gotos.

First, lets come up with some example of a control flow that doesn’t fit well into the standard branching constructs of switch, if, while and for. The example I’ve devised is deserializing fields from a binary stream. Let me explain the hypothetical situation before I go into the solution. In the stream we have a byte that represents the field type: perhaps a 0x01 represents a “short string” type, a 0x02 represents a “long string” type, and a 0x03 represents a 32bit integer type. If the field type is a short string, then we’ll say the next byte in the stream is a length of the string field, and the bytes after that are the characters in the string field. If the field type is a long string, then the next 4 bytes are a 32bit length followed by that number of characters. If the field type is an integer, then the next 4 bytes are simply the integer value. Here’s an example stream that has a short string “AB”, followed by a long string “C”, followed by the integer 421:

Example-stream

If the field is a string, then after we know how long it is we need to check that all the characters are valid string characters (by some specification we don’t care much about right now), and if the string passes validation then we copy it into wherever it is going. Here is a reasonable flowchart illustrating these steps as directly as possible:

ReadFieldFlowchart

First, lets represent this as directly as we can in C code. The first branch is clearly a switch statement, but  the most direct way to represent the remaining control flow is using gotos.

I would argue that this is actually the most readable way of representing it, particularly if people maintaining the code can refer back to the flow diagram. It’s easy to reason about, since each block starts with a label and ends in a goto. I’ve chosen to order it in a similar way to the flow chart, so that control only  flows downward. This code is no more “spaghetti-like” than the diagram is. In fact, it reads very much like a document or book would: each label is a heading marking the section as a stand-alone unit.

But goto’s are bad, right? Well this is an endless debate, but let’s assume that they are and that we need an alternative. How about this:

It satisfies our requirement of not using gotos. Of course now we need to elevate some of our local variables to global variables, but nobody complains about global variables as much as they do about gotos, so this is an improvement, right?

Well, no. From a readability perspective nothing much has changed. The “headings” are now function names rather than labels, and the gotos are now function calls, but the code otherwise looks the same.

Bad code using gotos can trivially be represented as equally bad code without gotos – or possibly worse code because it’s now more spread out and pollutes the global namespace with extra functions and persistent variables. I’m not saying this is bad code – it may or may not be, but that’s beside the point. The point is that it’s not just possible, but trivially easy to write “spaghetti code” without gotos, since I believe any goto-based code can be similarly converted to function calls2.

Goto-Call

This brings me to another, more subtle point. The above example is an abuse of the idea of a “function call”. This may be subjective, but I think that the idea of a call (at least in an imperative language) is to perform a sub-task and then come back to the calling task to complete whatever operation it was on. There is an implicit hierarchy here: the called function is a “sub-function” in terms of its contribution of the caller. This is physically manifested in the call stack, where you can see that the calling function still has an activation record on the stack while the called function is executing, in anticipation of re-acquiring its temporarily delegated control.

This is not the way I’m using the “function call” feature in the above example. I’m instead intending it as a “fire-and-forget kinda call”. The caller isn’t saying “I need your help, can you please do this for me and get back to me later”, it’s saying “I’m done, you’re next, I grant you control from now on”. The latter idea sounds familiar – permanently passing control one way from one part of the program to another. It’s called a goto. And I’ll use the term “goto-call” to mean a call to a function in such a way.

An example that comes to mind of where I’ve seen this intention in a function call is in raising events in C#. I’ll take an example from the MSDN guidelines for raising an event in C#:

What does the call handler(this, e) on line 6 mean? Does it mean “I need your help, get back to me when you’ve got an answer”, or does it mean “goto the place I’ve called handler and I don’t care if you come back to me”3? It means the latter. It’s a “goto” in disguise.4

In a high level language this doesn’t matter. Using the feature of “calling a function” for “going to another part of the program” is fine. We waste a little stack space keeping the caller state when we don’t need it, incur some minor overhead preserving registers and allocating the new frame, and make it harder to read the call stack when we’re debugging, but the language is less complicated than it would be if we had to distinguish a “goto call” from a “normal call”.5

In an embedded microcontroller environment, and in a very imperative, stateful environment, I don’t think this is the case any more. I really think that low level languages like C should support a “goto-call” feature which is like a call (control moves to another function and you can pass arguments) but is intended never to return the caller, or is intended to return to the caller’s caller.

From the programmer’s perspective the “goto-call” would be a mechanism of communicating the intent of the code to other programmers. It tells other people reading the code that this is not a call “stack” operation, but a call “queue” operation – it’s one thing happening after another rather than one thing happening during another. It also tells the compiler “I don’t intend control to come back here”, so the compiler can helpfully produce an error if the “goto call” is not a valid tail call.6

Conclusion

I’ve shown that the problems with using goto are not unique to the goto feature, since there is a relatively trivial translation from goto-style programming to a form that uses function calls instead. I’ve used this as another argument as to why goto’s are not intrinsically bad, since you can write code that is at least as bad using function calls, and we do not consider function calls to be intrinsically bad.

I’ve also suggested that since calls can be used in this way, that sometimes we conflate the idea of a “call” with the idea of a “goto-call”, and suggested that if some imperative languages distinguished between the two by supporting a new “goto-call” feature then it would not only make the intent of code clearer to its readers, but also enable additional static checking and performance optimizations. I’ve given two concrete examples of where this would be useful: the example of reading a hand-crafted format from a binary stream in C using functions, and the example of event dispatching in C#.


  1. Assuming little-endian integers and ASCII character encoding 

  2. I’ve glossed over some other differences between the two ways of doing it. If you use function calls you don’t have control “falling through” to another function by default if you forget to call the next function. Also, it’s much more difficult to combine traditional control structures such as while loops with this “function call” form. I think neither of these factors decrease the “spaghetti-ness” of the function-based code. However functions have some additional flexibility: we can call functions indirectly through pointers, we can put them physically wherever we want, and we have more access to the individual states from “outside”. Whether or not these are good things depends on the situation. 

  3. Of course it does need to come back to somewhere, but it could come back to the caller of OnThesholdReached, like a tail-optimized call 

  4. Another example that comes to mind is in continuation-passing style calls, where you typically execute a continuation as “I’m done, you’re next” and not “do this and get back to me”. Keeping the caller of the continuation on the stack is the moral equivalent of keeping the “returner” of a non-CPS function on the stack 

  5. For those who’ve worked with C# async, wouldn’t it be wonderful if the continuation of an “awaiting” function didn’t have a gazillion frames in the call stack, especially with recursive algorithms like “await the previous item in a long list” 

  6. Perhaps ironically, in attempting to justify a more imperative style of programming using goto’s, we’re actually encouraging a more functional style of programming using mutual tail recursion to “loop” through a stream.