Why Types Introduce Coupling

Last time I gave a long post about why C is complicated and JavaScript can make your code simpler and less buggy. Today I want to touch on another idea: that statically typed languages introduce a level of coupling in your program that’s unnecessary and makes maintenance harder.

Imagine that I ask you to write a function to simply iterate through a list of integers and output them to the console. In C, you might write the following code:

But wait a second. I didn’t tell you that the list was going to be presented a contiguous array! What if it was a linked list? Then the code might look like the following, assuming that we had previously declared the types LinkedList and LinkedListNode appropriately:

The problem in both of these is that the function we’ve written is intimately coupled to the type of the parameters it accepts. There’s no way in C to express the idea of “I want my function to accept any list-like type”. This means that we cannot reuse the above code in other circumstances where the type is even slightly different.


In C++, we can address this problem to some extent by using templates. We could write the above function something like this in C++:

There are still some of the same issues in this code. For example, do we pass an iterator pair to the function (as many STL functions do), or do we pass the container itself?

To me this is a non-solution. For one thing, it’s saying that if you want to write reusable code that’s decoupled from the types of its parameters, you need to write everything in your program as templates. This is clearly not an option for many reasons, including compilation time, error messages, complexity, etc.

All that C++ templates give us is a taste of a world without explicit types. The above C++ function is decoupled from the caller simply because it does not state what type it expects.

It’s true that there are languages that perform type inference, and allow you to write pretty generic code without losing static type safety, but as with C++, the error messages and language complexity associated with this type of coding makes it difficult for beginner-to-intermediate level programmers.


At first glance, you may thing that higher level statically-typed languages solve this with better abstraction mechanisms. Consider the C# code for the above function:

The above function says that it accepts any argument that is enumerable, meaning anything that can be iterated. This is fantastic from a reusability and code maintenance perspective, so it does address one side of the issue. But there is another issue lurking behind the scenes. One that may be more noticeable to people working in very resource-constrained environments like in embedded firmware.

Although the above C# code does not couple itself to the specific type of integer list, and so can be used throughout the program, it does actually still have a level of coupling at the binary level: the function accepts something that has a binary signature of IEnumerable<int> . It is not the same as a C++ template that is decoupled by having the template instantiated for multiple binary signatures.

This machine-level coupling is invisible to the programmer, which is convenient and important when it comes to managing large and complex codebases. But remember that interfaces as a language feature are normally implemented at the binary level by passing control through several levels of indirection. I won’t go into detail – you can look it up yourself if you’re interested – but here is a brief summary to illustrate my point.

The machine code must first look up a virtual dispatch table from the interface pointer, and then look up the correct entry in the dispatch table corresponding to the method you want to call, and then perform an indirect call through that entry in the dispatch table. The indirect call doesn’t go straight to the function implementation, but first through a thunk that adjusts the this  pointer from the interface’s this  pointer to the underlying object’s this  pointer (which are different because objects can implement multiple interfaces), and then the thunk redirects to the actual function. This is to perform a single function call on an abstract interface.

Although the C# language doesn’t dictate these details, you have to be aware of them anyway. The logical types are still coupled to the binary types, and so the choices you make about how generic you want your code really do still have an effect on the performance and overheads involved in your program.


So what’s different about JavaScript? Simply put, JavaScript doesn’t require you to specify these type details at all.  This not only brings a level of decoupling to the logical program, where you gain benefits of reusability and maintainability, but also unshackles the compiler to theoretically be able to apply a whole new range of optimizations. As compilers are getting better and better, I think this will be more and more the case, and soon the performance of a program will be more related to how few details you dictate, rather than how many, and the performance of JavaScript programs will exceed those of the C and C++.

Why JavaScript

People who have known me for a while, and long-time readers of this blog, will know that I never used to be a fan dynamically typed languages. The static type system available in C# and C++ can help to prevent many bugs and runtime errors, and allow the programmer to add extra clarity to the code.

But in recent years, I’ve come to acquire a taste for JavaScript. So much so that I’m actually on a mission to write a JavaScript compiler for embedded firmware, so that I can bring the bliss of JavaScript to the nightmare of embedded programming.

For those who don’t already know, JavaScript is a completely different language to Java. Please, do not use the word Java to refer to JavaScript, or vice-versa. They are as unrelated to each other as C++ is to Java. I know this is stated many times, but I’m expecting that some of the readers of this post are coming from a purely C/C++ background, and might not be aware of this fact.

Many people have written about the pros and cons of JavaScript, so what I’m saying here is not unique, but I’m going to approach it from the perspective of an embedded firmware programmer, who typically works in C or C++ on a daily basis. Some of what I say will also apply to those of you in other arenas.


One thing that JavaScript has going for it, is that the syntax and structure will be familiar to most embedded programmers (and many other programmers). It uses curly-braces to denote blocks of code. Function calls are done using parentheses with comma-separated arguments, as in foo(1,2,3). Most of the C/C++ operators work as expected, such as the logical operators &amp;&amp; and ||, and the bitwise operators &amp; and |.

These may seem like obvious things, if you’re from a background of C/C++, C#, or Java, or some related language. But bear in mind that not all languages are like this, and so the familiarity of JavaScript eases the learning curve associated when coming to JavaScript from languages like C/C++. Compare these two snippets of code, one in C, and the other in JavaScript, that implement the Fizz-Buzz challenge:

If you’re a C programmer, then hopefully the above similarity appeals to you.


Perhaps the number one thing I like about JavaScript is simplicity. With JavaScript you can often write very clean and readable code, because you’re focusing on the algorithm and behavior, rather than adding bloat associated with types and ABI’s and memory concerns.

In C++, you’re writing code to appease three different rulers: the type system, the machine, and the reader/maintainer. Most C++ code seems to be a compromise between these, but in JavaScript your code can focus on the reader/maintainer, and let the compiler worry about the machine. You worry more about clearly describing, in code, the behavior of the program, rather than worrying about how the compiler/interpreter will implement that behavior in terms of instructions and memory access, or worrying about constructing elaborate types to describe the meta-rules about what the program is allowed to do.

As a thumb-suck estimate, I probably spend at least half my time in C++ trying to appease the static type checker. And I would say that 90% of that work is spent on false-positives – cases where the type checker notes an inconsistency, but where the same code code in a dynamically typed language would not have had a bug in it. Inconsistencies in the type system do not always equate to real program inconsistencies.

At the risk of making this post far too long, I’m going to give you an example, albeit somewhat contrived. Let’s say that we have a binary tree structure: a node in the tree is either a leaf-node with a value, or an internal node with no value but with left and right subnodes/children. Now let’s say that we want a function that flattens the tree, returning an ordered sequence of only the leaf values.

In JavaScript

In JavaScript, I can imagine a function that looks like this:

It uses a stack to iterate the tree. I could have made a much more succinct solution using generators and recursion, but I’m appealing to those of you who are coming from strictly imperative, procedural programming backgrounds, so that’s why I chose this approach.

In C

Let’s write the equivalent code in C.

Firstly, a node can be either a single value, or branches into two subnodes. How do we represent this?

Here is one way. A node can be either a leaf value or an internal node, and to tell the difference, we probably need some kind of flag (or tag). It also needs something to store the contents of the node, which is a union between the two different options:

Side note: the style of the above code might not be what you’re familiar with. Or or it might be. I don’t know because C (and C++) doesn’t come with a standard style, which is one of the ways in which I think JavaScript is better: there is a single generally-accepted style to writing JavaScript code.

The contents of an internal node is that it has a left and right sub-node:

Hmm. There’s another question here that’s unanswered. Who owns these nodes? Does the tree own them? Could there be multiple trees that share the same nodes? Is it even the same between one tree and the next, or within all the nodes of a tree? It’s not specified here, but it’s yet another “bloaty” detail to figure out.

We said that a leaf node is “a value”. But what value exactly? Is it an integer? Another struct? Is the value type the same for all nodes in a tree? Should the memory for the value embedded into the node, or should the node point to the value? If it points to the value, then who owns it? If it’s embedded, then how big is it, and are there any rules we have to follow when copying it or moving it to different locations in memory (is it pointed to by anything else, or has ownership of anything else)? So many questions. So many details that aren’t relevant to the problem at hand.

One way here is just to say that a leaf node has a pointer to the value, and that we don’t know anything further about what type it is:

I could save on some of the bloat by using an anonymous union, but I’d argue that not a whole is gained in terms of simplifying real complexity.

How much time have we wasted so far? Let’s recap what we’ve done:

  • We’ve defined one possible implementation of a tree node
  • We’ve coupled the implementation to issues of memory layout, such as whether nodes point to their children and values or have them embedded in the same memory.
  • We’ve opened many cans of worms regarding ownership, style, the type of contents, etc.
  • We haven’t even started writing the actual function yet.

Now for the actual function (oh dear, this is going to be a long post). We need a way to pass the tree to the function, and a way to retrieve the list back. We’ve already defined a type for the tree, there are other questions to be answered when it comes to passing it to the function:

  • Should the root node be passed by value, or by pointer?
  • Is the function be expected to mutate, or free the tree passed to it? For example, can the output list reuse the memory from the input tree?
  • Should the tree be passed using a shared global variable, or an actual parameter?1

There are similar questions about getting the output from the function, with all the same concerns that we’ve already discussed about representing trees. Should the result be a contiguous array or a linked list? Should it be allocated by the caller or callee? Should it be a shared global variable? And any number of other considerations.

I’m going to try my hand at a direct implementation in C, trying to sidestep issues of memory allocation by having everything passed to the function itself:

This code makes me cringe.

It doesn’t really match the spec, because it has a hard limit on how deep a tree can be. To get around that we would need some dynamic memory allocation, which would add a whole lot more bloat and complexity. This code also requires that the caller have some idea of the maximum size of the resulting list, which may or may not be easy to know.

The complexity is ridiculous. How many times do you need to look at *list++ = node->leafContents  before you can be sure that you’re incrementing the pointer to a list of pointers, and not incrementing the pointer in the list. Maybe we need to add some more bloat to encapsulate these: more functions for managing the list so we only have to write that kind of code once. Don’t even get me started!

So let’s see how well our beloved type system did. I’m going to compile the above code and see what errors come up.

Here are the list of mistakes I made. I’m categorizing each as either true-positive (the compile error saved my skin), false-positive (using a dynamic type system I would not have had a runtime  bug), or false-negative (I found a bug by looking actually the compiler didn’t catch it).

  • I forgot to #include <stdbool.h>  … arguably a false-positive, since stdbool is only needed if you have static types.
  • I forgot to include stdio.h. True-positive: I forgot to include a module that was relevant to the program behavior.
  • In creating the example tree,  the line tree[0].internalNodeContents.left = tree[1] , I was missing an & sign. I’m going to say this is also a false-positive. I was assigning one value to another, and the fact that one value is typed by pointer and the other by value is not a concern related to the algorithm or code logic.
  • To pop a value off the stack, I used stack[stackSize--] instead of stack[--stackSize]. This is a false-negative. The type system did bugger-all to protect me from accessing invalid memory. Luckily the hardware caught it and issued a seg-fault, but on embedded hardware you aren’t always so lucky! What’s more is that code that caused the issue is unrelated to the algorithm that the function was supposed to be implementing. In a sense, it’s the implementation of a completely different algorithm (the algorithm for pop stacks). So the bug in the code was not just not-noticed by the C compiler, but it was in a real sense caused by the limitations of the C language.
  • In printf("%d", list[i]), I was logically printing an integer to the console, since the list is a list of integers, but actually the integers are physically stored as references (pointers), so it should have been printf("%d", *((int*)list[i])). Pretty, ain’t it? This is a false-negative. There was a bug, but the type checker failed to find it. Instead it just printed out crap to the console. On GCC with the default settings2, there was no warning about this.
  • I’m not returning a “success” code, or checking the return code when the function is called. This caused no error in this case, but might cause strange behavior if there was something that did check the result error code, or a case where the error code was necessary (a failure). I’d call this a true-negative in this particular case. The function acts unexpectedly, but doesn’t explicitly say otherwise so actually there’s no spec that it’s defying. What’s more is that it doesn’t introduce a bug into this particular program.

So how does that compare with JavaScript?

Well, what happened when I ran the JavaScript program? Exactly what I expected to happen. It ran. It output the correct result. No errors. No bugs.

This is not because I’m an expert in JavaScript. I have many more years’ experience in C than JavaScript. It’s because simple, bloat-free code is easy to reason about, and so less likely to contain bugs.

Conclusion: please use JavaScript instead of C. You will have fewer bugs because your code is simpler. It will also cost less to develop because there are fewer concerns to worry about, and it will be easier to maintain because the code is clear and easy to understand.

In C++

I’m not going to implement the above in C++, but instead I’m going to say, in a hand-wavy way, that I don’t think it’s much better. In C++, you could write something that looks similar to the JavaScript version, using a stack from the STL to implement the stack variable. But the problem with this is similar to the problem with C: the implementations are coupled to the machine in a way that means when you bring in your favourite container, you’re forcing the compiler’s hand when it comes to implementing the code in terms of machine instructions and memory. The result is essentially bloat in a different kind of way. It get’s messy, and to make a solution that is as generic as the JavaScript one would require a ton of code, and with it a ton of bugs.

That’s all I’m going to say for the moment. If you come from the land of C++ and want to hear my opinion in more detail, leave a comment or send me an email, and perhaps I’ll make another post about it. This one is well long enough that I should be moving on to my last point.


The above C example leads me to another great thing about JavaScript: safety.  What I mean by safety (or lack thereof) is:

  • How easy is it to introduce bugs?
  • How bad are the bugs, and how difficult are they to fix?

C is awful in this respect. The simple, real, bug in the above code where I dereferenced a pointer that wasn’t assigned, leaves the program open to the most hideous kinds of erroneous behaviors – those that are non-deterministic, and can affect anything in the program. Once you’ve crossed the line of undefined behavior, things in completely unrelated parts of your program can start failing for no apparent reason, not matter how well you wrote them. This is not just a door for bugs, but also for malicious attackers.

In JavaScript , there is no such thing as undefined behavior. There are a few things that are implementation-defined, meaning that different JavaScript engines will execute them differently, but there is nothing like C or C++’s undefined behavior, where a line of code can have literally any effect, including damaging the data or even functions in unrelated parts of the program. When you want behavior to be well-defined, use JavaScript instead of C/C++.

JavaScript is also a very good abstraction. Programs execute in isolation from the rest of the system, which is great if you have safety-critical or security-critical applications which need to guarantee some sort of behavior.


I could go on and on about the benefits of JavaScript, and perhaps I will in later episodes, but for the moment I hope that in this extraordinarily long post I’ve convinced you that there is some hope to JavaScript, even to the point of using it in embedded firmware development.

  1. Most people would say using a parameter is preferable, but as I’ve said before: in C you’re appeasing multiple gods. The choice of whether to use a global variable or a parameter is not just about what is easier to reason about or better for code reuse, it’s also about the function ABI and the machine instructions generated. 

  2. and the -std=c99 flag 

JavaScript Corners – Part 1

Recently I’ve been trying to write a simple JavaScript compiler, and it’s lead me to think more deeply about some JavaScript behavior that I previously would not have thought about, and I’d like to share that with my JavaScript readers.

Take a look at the following code JavaScript code, and try to figure out what it outputs to the console (I’ll give you a hint: it doesn’t output any errors):

The function code first calls foo – but which foo?

Perhaps it executes the foo 1. After all, there can only be one function called foo, and so the others may not bind correctly after the first one is declared, so the first remains the “real one”.

On the other hand, perhaps it’s not the first definition of foo that “wins”, but the last one. So could it be foo 5?

But the last definition of foo a variable named “foo”, not a function. So which wins when it comes to binding a symbol: variables or functions? Or are they treated equally?

If variable are somehow considered “second prize” to functions when it comes to finding which value matches which name, then it wouldn’t be foo 5 that wins, but rather foo 4, since foo 4 actually  has a function named foo, whereas foo 5 is only a variable named foo that holds an anonymous function.

But foo 4 is not declared at the outer scope of the function. Could it be that declarations at the outer scope win against declarations that are declared in some kind of nested scope? Perhaps then the answer might then be foo 2, since foo 2 is the last function that is actually named foo and declared in the outer block?

The only one we’ve completely ruled out, that it can’t possibly be, is foo 3. The function foo 3 is not the first foo, and not the last. It is not declared in the outer block (and if non-outer-block declarations could win then foo 4 would win). It is also nested inside a block that is only executed on a condition, and the condition is always false so the block is clearly never executed, and thus the function is never declared anyway.

So it can’t be foo 3, but it could be any of the others. Which do you think is it?

You may have guessed it based on my harpings-on. The answer, when I execute the script using node, is indeed foo 3.

In JavaScript, the nested scope is not really a nested scope. All local var and function declarations are at the function scope of the whole parent function. If you’ve been working in JavaScript for a while, you probably already know this. The fact that it’s in a “false” condition block has nothing to do with it, since function declarations are not “statements” to be executed in the sequence of the program (and if they were, then this whole snippet would fail).

What was interesting to me is that it must be the case when function and variable names clash, the functions seem to always win.

The foo 4 function is a little misleading. Just because the function has a name, doesn’t mean that it’s attached to the function scope, because in the case of foo 4 the function is a function expression. These are not even part of the function’s namespace, as you can see if you execute the following snippet, and see that it gives you an error:

Does "use strict" cure this strange behavior, and somehow give a parse failure? On my machine when I change the function to have "use strict", it doesn’t give any errors, but it does change the output to foo 2 instead of foo 3. I found this quite unexpected.

This is all interesting behavior. These things don’t affect everyday JavaScript much, but they are fun and interesting corner cases to consider, and help us understand the language more deeply.

Let me know what you think about this in the comments. Hopefully I’ll continue this series with more of the interesting corner cases that I find along the way.


Be a multiplier

You may have heard the analogy that some software engineers add productivity, while some multiply productivity. Today I’d like to dig a little deeper into this and share my own thoughts.

What does it mean?

For those who haven’t heard the phrase before, let me try to unpack my understanding of it. Consider a tale of two programmers – let’s call them Alice and Bob, to pick two arbitrary names. Alice’s boss gives her a task to do: she is told to add a new thingamajig to the whatchamacallit code. She’s diligent, hardworking, and knows the programming language inside out. She’s had many years of experience, and especially knows how to add thingamajigs to whatchamacallits, because she’s done it many times before at this job and her last job. In fact, she was hired in part because of her extensive experience and deep knowledge with whatchamacallits, and at this company alone must have added over a hundred thingamajigs.

Because of her great skill and experience, she gives her boss an accurate time estimate: it’s going to take her one week. She knows this because she did almost exactly the same thing two weeks ago (as many times before), so she’s fully aware of the amount of work involved. She knows all the files to change, all the classes to reopen, and all the gotcha’s to watch out for.

One week later, she’s finished the task exactly on schedule. Satisfied with the new thingamajig, her boss is happy with the value she’s added to the system. Her boss is so grateful for hiring her, because she’s reliable, hard working, and an expert at what she’s doing.

Unfortunately for the company, Alice’s great experience gets her head-hunted by another company, where she’s offered a significantly higher salary and accepts immediately. The company mourns the loss of one of their greatest, who soon gets replaced by the new guy – Bob.

Bob is clearly wrong for the job by all standards, but some quirk of the job market and powers-that-be landed him up taking Alice’s place. He has no prior experience with whatchamacallits, let alone thingamajigs. And he doesn’t really know the programming language either (but he said he knows some-weird-list-processing-language-or-something-I-don’t-remember-what-he-said-exactly, and said that he’d catch on quickly). His new boss is very concerned and resents hiring him, but the choice was out of his hands.

On his first week, his boss asks him to add a thingamajig to the whatchamacallit code, as Alice had done many times. He asks Bob how long it will take, but Bob can’t give a solid answer – because he’s never done it before. It takes bob an abysmal 2 weeks just to figure out what thingamajigs are exactly, and why the business needs them. He keeps asking questions that seem completely unnecessary, digging into details that are completely irrelevant to the task. Then he goes to his boss and says it will take him 3 weeks to do it properly. “3 Weeks! OMG, what I have I done? Why did we hire this idiot”.

There’s not much to be done except swallow the bitter pill. “OK. 3 weeks”. It’s far too long. The customers are impatient. But, “oh well, what can you do?”

3 weeks later Bob is not finished. Why? Well again, he’s never done this before. He’s stressed. He’s missing deadlines in his first months on the job, and everyone’s frustrated with him. When all is said and done, and all the bugs are fixed, it takes him 2 months to get this done.

By now there is a backlog of 5 more thingamajigs to add. His boss is ready to fire him, but he optimistically dismisses the 2 months as a “learning curve”, and gives Bob another chance. “Please add these 5 thingamajigs. How long will it take you?”

Bob can’t give a solid answer. He swears it will be quicker, but can’t say how long.

The next day Bob is finished adding the 5 more thingamajigs. It took him 30 minutes to add each one, plus a few hours debugging some unexpected framework issues. What happened? What changed?

What happened is that the first 10 weeks that Bob was spending at his new job, he immediately noticed a big problem. There were 150 thingamajigs in the whatchamacallit codebase, and they all had a very similar pattern. They all changed a common set of files, with common information across each file. The whole process was not only repetitive, but prone to human error because of the amount of manual work required. Bob did the same thing he’s always done: he abstracted out the repetition, producing a new library that allows you just to define the minimal essence of each thingamajig, rather than having to know or remember all the parts that need to be changed manually.

To make things even better, another employee who was also adding thingamajigs, Charlie, can also use the same library and achieves similar results, also taking about 30 minutes to add one thingamajig. So now Charlie can actually handle the full load of thingamajig additions, leaving Bob to move on to other things.

Don’t do it again

The development of the new library took longer than expected, because Bob never done it before. This is the key: if you’ve done something before, and so you think you have an idea of the work involved in doing it again, this may be a “smell” – a hint that something is wrong. It should light a bulb in your mind: “If I’ve done this before, then maybe I should be abstracting it rather than doing almost the same thing again!”

You could say, in a way, that the best software engineers are the ones that have no idea what they’re doing or how long it will take. If they knew what they were doing, it means they’ve done it before. And if they’ve done it before then they’re almost by definition no longer doing it – because the best software engineers will stop repeating predictable tasks and instead get the machine to repeat it for them1.

Adding and Multiplying

In case you missed the link to adding and multiplying, let’s explore that further. Let’s assign a monetary value to the act of adding a thingamajig. As direct added value to the customer, let’s say the task is worth $10k, to pick a nice round number ($1k of that goes to Alice, and the rest goes to running expenses of the company, such as paying for advertising). Every time Alice completed the task, which took her a week, she added $10k of value. This means that Alice was adding productive value to the company at a rate of $250 per hour.

Now Bob doesn’t primarily add value by doing thingamajigs himself, but instead develops a system that reduces an otherwise 40 hour task to 30 minutes. After that, every time a thingamajig is added, by anyone, $10k of value is added in 30 minutes. Bob has multiplied the productivity of thingamajig-adders by 80 times. In a couple more weeks, Bob would be able to add more value to the company than Alice did during her entire career2.

Is it unrealistic?

The short answer is “no”. Although the numbers are made up, the world is full of productivity multipliers, and you could be one of them. Perhaps most multipliers don’t add 7900% value, but even a 20% value increase is a big difference worth striving for.

The laws of compound interest also apply here. If every week you increase 10 developers’ productivity by just 1%, then after 2 years you’d be adding the equivalent value of 6 extra developers’ work every day.

The alternative

What happens if Bob was never hired? Would the company crash?

Perhaps, but perhaps not. What might happen is that Microsoft, or some big open source community, would do the multiplying for you. They would release some fancy new framework that does thingamajigging even better than the way Bob did it, because they dedicate many more people to the task of developing the library. The company will take 5 years before they decide to start using the fancy new framework, in part because nobody on the team knew about it, and in part because they now have 250 thingamajigs to migrate and the expected risks are too high for management to accept. But in the end, most companies will catch on to new trends, even they lag behind and get trodden on by their competitors.

Final notes

In the real world, it’s hard to tell Alice from Bob. They’re probably working on completely different projects, or different parts of the same project, so they often can’t be directly compared.

From the outside it just looks like Bob is unreliable. He doesn’t finish things on time. A significant amount of his work is a failure, because he’s pushing the boundaries on the edge of what’s possible. The work that is a success contributes to other people’s success as much as his own, so he doesn’t appear any more productive relative to the team. He also isn’t missed when he leaves the company, because multiplication happens over time. When he leaves, all his previous multiplicative tools and frameworks are still effective, still echoing his past contributions to the company by multiplying other people’s work. Whereas when an adder leaves the company, things stop happening immediately.

Who do you want to be – an adder or a multiplier?

  1. This is not entirely true, since there is indeed some pattern when it comes to abstracting code to the next level, and those who have this mindset will be able to do it better. Tools that should come to mind are those such as the use of generics, template metaprogramming, proper macro programming, reflection, code generators, and domain specific languages 

  2. How much more do you think Bob should be paid for this? 

Async in C

C# has an amazing feature called async, which we’ve talked about many times before on this blog, which allows a programmer to write functions that are non-blocking, without needing to use threads. What would it look like to have async functionality in C?

I’ve been working on an experimental “mini-programming-language” which does just that. It doesn’t work the same way as in C#, because the needs in C are completely different. In C you don’t want all the hidden overhead costs that exist in C# related to setting up tasks and delegates and execution context. In C, things should be more or less as they seem.

What does it look like?

In this experimental language, you can declare functions as async, to say that they don’t complete immediately in a blocking fashion, but instead may complete asynchronously. When functions are declared async, some interesting things happen. One thing is that, in the context of the async function, the word return actually refers to a function, which can be called and saved like any other function. For example, here is an async function foo which returns the value 1.

Of course this function actually returns synchronously, even though it’s declared async. The only difference is that it’s returning using continuation passing style instead of a direct return. But using this feature we could actually delay the return to another point in time:

Now we’ve saved the return continuation and only triggered it when some event returns. We discussed last time what a continuation might actually look like in C, so this week we’ll just elide the type details and say that a continuation is of type Continuation<T>, where T is the return type of the function calling the continuation. Values of this type are each physically the size of a single pointer, and can be executed using the same syntax as a function.

Now comes the interesting bit. Say we have a function, bar, which calls foo. In this experimental language, you can simply define bar like this in this experimental language:

Now clearly bar must be asynchronous as well, since it calls foo, and depends on the result of foo before it can continue. But the magic is that we don’t need to declare the asynchrony explicitly. The experimental language compiler not only infers the asynchrony, but does the corresponding conversion to CPS automatically.

This is more than just a minor convenience. Imagine code like this:

This is a relatively simple function, but if we had to write the asynchrony out explicitly in C we would have code like the following1:

This is beginning to look like unstructured code. The for-loop construct is now completely hidden, and looks more like the old days of conditional-branch-and-jump. We’re also worrying about things that the compiler really should be sorting out for you. Like passing in the return address, passing in a space for storage of local variables, etc. These are all things you generally don’t have to worry about, so why now?

The experimental language I’m working on handles all of this asynchrony automatically. If the above examples are anything to go by, then certain types of code will be reduced to a quarter of their size and be orders of magnitude easier to read, if they were written in this experimental language instead of C. I would perhaps go as far as saying that a fair amount of multithreaded code could instead be written in this async style to run on a single thread, and would as a result be much easier to reason about.

  1. again, using the continuation style defined in my last post 

Continuations in C

There are times when you need to call a function, but you want to say “call me back when you’re done” rather than blocking the current thread.

An example might be when you’re reading a file. If you imagine for a moment that every CPU cycle is on the scale of 1 second, then disk access is in the order of days to months (take a look at the coding horror post about it). When you call a simple C function like fread, you could be blocking the current thread for millions of CPU cycles. You don’t want to be blocking the thread, because threads are a valuable resource and multithreading is a difficult skill.

The Typical Solution

The typical way to solve this in C is to use a callback function. I’m not going to explain callback functions here, since there’s an abundance of information about them on the internet. Instead I would like to point out a convenient pattern of how to store state for the callback function.

Let’s use a concrete example. Say we have some function bar, which is expected to take a long time to execute, and a function foo which needs to call bar. The synchronous way of writing the code (non-callback way) might look like this:

The task finishes by returning some result of the long process. For the purposes of this example, we’ll say that the result is 42.

If we convert it to the asynchronous form (the callback form) it might look like this:

Note that normally bar would not call the callback itself, but instead save the callback to be called later. I’ve only called it directly from bar as a convenience in the example.

The Problem

I’ve seen this pattern many times. But it’s flawed in a major way: if foo has some state that must be persisted across the call to bar, how does it get that state to the continue_foo function? For example, if I declare a variable in foo, how do I access the variable in continue_foo? Typically what I see is that people will simply use global variables. But that’s an awful solution for many reasons1.

Slightly Better

A better pattern, which I’ve used myself quite often, is to for foo to tell bar, “please hold the value of XYZ for me, and when you call me back, please give XYZ back to me to remind me why I called you in the first place and help me remember where I left off”. It might look like this:

A few quick points I’d like to draw your attention to:

  • Bar only sees the type void*, and not something more specific like Foo_state, because obviously bar may be called by other functions as well, not just foo
  • Rather than allocating foo’s state on the heap, foo just accepts the state as a parameter, leaving it up to the caller to decide where it must be allocated. This parameter is only to say where the state should be stored, and is not expected to have any values populated by foo’s caller.

Let me emphasize that last point again: there is no heap allocation involved in this example. The state could very easily be statically allocated, or pool-allocated, or even stack allocated2. Especially, consider that foo’s caller is likely to face the same problems foo has faced with state management, and so might already have it’s own state structure which would provide the perfect home for foo’s state structure without incurring an additional heap allocation.

The Best Solution

But we can do even better. The problem with the above example is that we’re passing two things around: the callback function pointer, and the callback state pointer3. Let’s take a look at a way of doing this while only passing one pointer:

I’ll draw your attention to the differences:

  • Foo_state now contains a field called call which holds the callback function pointer. It’s important that this field is the first field in the structure so that a pointer to this field is also a pointer to the whole structure.
  • The callback function signature still accepts the state as a parameter, as before.
  • The call to bar no longer takes two parameters but now only takes a pointer to the callback function pointer (note the double-pointer)
  • When bar needs to call the callback function, it needs to dereference it first. It also needs to pass the callback state. But since, by design, we’ve said that a pointer to the callback function [pointer] is also a pointer to the callback state, we can simply pass that pointer as the argument. This gives us the interesting syntax (*callback)(callback, result). In a sense, this is saying “call the callback, and tell it which callback we called”.

Those who are familiar with how object-orientated programming works under the hood may recognize this pattern. Typically objects are laid out in memory such that the first field in the object state is a pointer to the class vtable. When you call a virtual member function on the object, the pointer-to-the-object is treated as a pointer-to-the-vtable-pointer and is used to resolve the virtual dispatch. In our example above there is actually less complexity and overhead, since we don’t need a whole vtable but can point directly to the function.


I love this pattern because it’s really clean and quite simple. The whole callback, including the function and the state, is neatly represented by a single pointer4.

The callback pointer can be called using a very self-contained syntax. That is, it only depends on one variable, not two. This is actually not just a matter of syntax: a single variable means better use of CPU registers, and fewer accesses to memory.

The fact that the callback is represented by one small value also makes it easier to manage. There’s much less risk of calling the callback with the wrong state. It’s also lighter to pass around.


The most obvious disadvantage to me is that it uncommon. Someone looking at the code for the first time won’t just understand what’s happening straight off the bat. It also means that there’s no language support for it. C++ is in some ways an extension to C with language support for first-class objects. But there is no common language that is an extension to C with support for this kind of first-class-function (with state).

The performance of using this pattern isn’t a disadvantage in my opinion. If you’re comparing it to the performance of a “naked” function pointer, then yes, you may incur some overhead from passing the additional state argument and from double-dereferencing the function pointer. But consider that this type of function call should actually be faster than calling a virtual function in a most object orientated languages (which has a triple-dereference), because there’s no vtable lookup. And virtual function calls are in turn typically faster than interface function calls (and correspondingly virtual functions with multiple inheritance, depending on the optimizer and conditions).

I’d also like to dispel another disadvantage, not directly related to the pattern but more about using callbacks in general. At first glance it seems that there is a lot of overhead in accessing persistent variables in the state structure, because instead of saying “x” you have to say “state->x”, which implies an extra pointer deference and possibly some pointer arithmetic. But think about this: how are variables normally accessed anyway? Variables are normally stored in the stack frame, which is essentially a structure pointed to by the stack-pointer. Yes, there may be less register elevation which would affect the performance, but I think it may be less of a problem than you’d expect.

Likewise, at first glance it seems that there is extra space used to store the callback function pointer. But in reality, a stack frame also stores the “callback” function pointer anyway: we just normally refer to it as the “return address”. An important point to note in the last example, is that the very last thing foo does is call bar. This is what’s called a tail call, and it means that any half-decent optimizer will re-use foo‘s stack frame space for bar. To put it another way: while bar is active, foo doesn’t use any stack space, but it does use space in the persistent state structure (wherever that may be), and the persistent state structure has many of the same attributes as the stack frame would have had, including a pointer into code space. From this perspective, there is no extra space required to store the callback address in the state structure.

The only thing missing is hardware support. A “normal” call has hardware support for automatically populating the return address into the state structure (aka stack frame) and saving register states etc (aka saving persistent variables). And a “normal” return has built-in support for dereferencing the stack pointer to obtain the return address pointer (note the double-pointer again) and jumping to that address, all in one step. But I imagine that if this pattern became more common in usage (probably with language support), hardware support would probably follow.

Until then, I still think it’s a great pattern to use in C, and we should all add it to our toolbox of C patterns.

  1. Please ask me – I’ll be happy to tell you all the reasons why it’s so horrible 

  2. In the less likely scenario that the caller decided to manually block the thread using thread synchronization techniques. 

  3. On most modern architectures this would just mean that it takes twice the space, since there are two pointers involved. But C doesn’t require function pointers to be the same size as data pointers. One embedded architecture I work with has function pointers that are twice the size of normal heap pointers – after all, RAM is more expensive per bit than ROM 

  4. A RAM pointer, which in some cases is smaller than a function pointer, giving it yet another advantage over the typical callback 

Sequences: Part 5

Last time, I talked about push and pull regarding sequences. We saw that it’s more convenient to write code that pulls from its inputs and pushes to its outputs. We took a look at C#’s generators, and how they enabled us to write sequence-processing functions in this way, without the need for intermediate buffers.

Let’s quickly recap generators. A generator in C# looks like a traditional function (with a signature that returns a sequence), but it can push values to the caller using the special syntax yield return, which essentially puts the generator function “on hold” until the consumer/caller asks for the next value1:

The two parties involved  here are the generator and the caller (which I’ll call the consumer since the generator is a producer).

GeneratorConsumerWhen the consumer asks for the next value in the sequence, the generator function is temporarily “resumed”, long enough to produce the next value of the sequence. Last time we drew an analogy with freezing time to explain why it’s easier to write the generator code now that it thinks it’s pushing values to the consumer.

But it’s important here to note who is being paused an who is being resumed. When the compiler is producing IL for the consumer function and the generator function, it is the generator that gets reorganized into a form where it can be paused and resumed (it gets converted into a class which implements the IEnumerable<T> "pull" interface).

But what would happen if the next item in the sequence just wasn’t available yet. If we go back to last week’s C example of reading input from the user by pulling values from getchar (or Console.Read in C#), you can see that generators wouldn’t fix the conflict between push and pull in that case.

Let’s simplify things a bit to investigate further. Instead of considering a whole sequence of items, let’s say that there’s just one item. We can pull the item from somewhere by calling a function that returns that item:

When the consumer calls PullFromProducer to fetch the item (an integer), the caller is blocked until the PullFromProducer function returns (synchronously).

The generator syntax in C# still uses this pattern under the covers – the generator function still returns IEnumerable<T>, which as we know from our previous exploration is a pull-based iterator interface.

But what if PullFromProducer simply doesn’t yet have the value that it needs to return? For example, how do we implement the Pull function if it’s to pull from a network connection, which may not have received the value yet?

Like the C# generator makes it possible to pause the producer, wouldn’t it be nice if there was a way to pause the consumer? Obviously we can do this with threads, but wouldn’t it be nice if there was a way to do this without the overhead of threads?

It turns out that in C# there is. C# 5 introduced the concept of async functions. You’ve seen async functions before on this blog, so I won’t go into too much detail. If you aren’t too familiar, I highly recommend reading up about them (here is the MSDN introduction, and I also highly recommend Jon Skeet’s Eduasync series for really getting to know what’s going on behind the scenes2 ).

Using async we can make code that looks like this:

The magic happens in the consumer this time. The consumer function is suspended at the “await” point until the producer pushes the value to the consumer.

To emphasize what’s happening here, let’s look at a slightly different example :

If you run this3 you’ll see the output is something like this:

The interesting thing is the order of the messages. The message line “Consumed: 42” occurs directly after “Pushing value to consumer” rather than after “Consumer is about to await value”, which clearly shows that the consumer is suspended during the intermediate time. But just like with generators, it’s important to realize that the above example does not create any additional threads. Just like with generators, the async functionality is implemented by the compiler by creating a new class behind the scenes.

This solves our problem, right?


The problem is that async only works with a single value. We can use it to push a once-off item, but not whole sequences of items.

C# is stuck with two different ways of doing things with sequences. There’s the pull-based approach with IEnumerable<T>. And there’s the push-based approach with IObservable<T> ((I won’t go into IObservable, but if you’re interested take a look at reactive extensions – they echo many of the great features of IEnumerable, such as all their extension methods, but do it for a push-based interface instead of a pull-based one).

What we need is something more like an IAsyncEnumerable<T> interface, which combines task-based asynchrony with a sequential pull-based interface. We also need language support for IAsyncEnumerable<T>, including generators and foreach statements. The combination of generators and IAsyncEnumerable would allow us to have everything we’ve been looking for so far:

  • No containers required (sequences don’t have to be in memory before you can work on them)
  • Zero buffering overhead (when we can process sequences as fast as they’re produced)
  • Completely abstract sequence types (a sequence of user key press events can be as much a sequence as an array of integers)
  • Push/pull agnostic (IAsyncEnumerable covers both push and pull cases equally)
  • No thread blocking
  • All functions can be written in a form where they both pull input and push output

I apologize to those who aren’t comfortable in C#, since I did originally say that this was going to be a language-agnostic investigation but we landed up in C# anyway. Unfortunately this is because it seems that C# is the only popular language that’s made it this far in providing a solution that fits all these criteria. It just needs to take the last step (although I’ve mentioned in the past that I think that async is flawed in a way that only a new language can cure). C and C++ are simply not well suited to this kind of coding at all.

This brings us to the end of our series on sequences. We started with the most simple C example, which required a buffer on both the input and output side, could not be suspended at all, and provided no abstraction on what the form the input and output could be. We considered ways to improve it, and in doing so investigated how sequence-processing functions can be composed/layered, and the differences between push and pull. At each stage of improvement we ruled out newer and newer old-languages, until we landed up with only a theoretical research-based extension to the latest C#, which seems not to have made it into the mainstream despite it being investigated more than 4 years ago.

  1. This is very limited description. For more detail take a look at my previous post and read up about it online 

  2. Although his series is a bit old now, much of what he said still applies, and it is the most insightful writing I’ve seen on the topic 

  3. I admit, I don’t have a C# compiler installed right now, so I can’t actually confirm this. If you see a mistake please let me know. 

Sequences: Part 4 – Push and Pull

Welcome back! This week in our exploration of sequences we’re going to do something a little different. Up until now we’ve been looking at a single running example of a function that interacts with sequences, and progressively developing its interface and considering the consequences of each choice. But this week I’d like to delve a little deeper into the meaning of these strange concepts of push and pull, as they apply to sequences.

What is push and pull?

Wikipedia doesn’t talk directly about “push” and “pull” as coding patterns, but does talk about the related concept of a push technology, which it says is a

…style of Internet-based communication where the request for a given transaction is initiated by the publisher or central server.

There is also a corresponding Wikipedia entry for pull technology, where requests are initiated by the client.

So here it seems that these terms are, in a sense, used to say which side of communication interface has control, or initiates an action. If the server has control, and is the one to initiate or “cause” the transaction, then the server is pushing to the client.

We can apply the same principles to software design at a smaller scale. When your code calls a function, it could be said to be a client of that function. When your code uses an object, it could be said to be a client of that object. So our server-client relationship applies in these cases as well.


Consider the following simple piece of C code which might be used to print a sequence of values to the console:

Loop Pull

This code uses the values from the sequence, so it’s a client of the sequence. It accesses items from the sequence by calling produceValue, which pulls the next item from the sequence. The pulled value is available to be used on the very next line of code, where in turn it’s pushed to the console output. Whatever is producing the sequences responds to the pull by providing the pulled values as they’re requested. This is the same iterator pattern that we’ve been looking at in past posts1.


What happens if the producer of the sequence needs to provide these values by pushing them to the consumer?  Now our consumer code might look like this:


All the same elements of the code are here: the declaration of int i, the initialization of i to 0, printing the values to console by printf, and incrementing i after every value is consumed. The producer just has to call initConsumer before it starts producing values, and it has to call consumeValue every time it produces a new value. So here the producer is in control, and decides when new values are available.

Strangely enough, by writing the consumer in such a way that it doesn’t pull values, the producer code might look much like our original consumer code:

It’s clearly much easier to write producer code that pushes, and consumer code that pulls. But these two things seem to be at odds with each other – either the producer or the consumer is “in charge”, and the other one loses out. Many sequence-manipulating functions both produce and consume sequences, since they have an input and an output. It’s much easier to write these kinds of functions if they can pull from a data source, and push to a data sink.

C++ seems to deal with this by dividing “sequence processors” broadly into 2 categories: on the one hand we have containers (and their corresponding iterators) which are happy to be both pushed-to and pulled-from. On the other hand we have algorithms and application-specific functions, which both push and pull from containers in order to get their job done. These application-specific functions can be written more simply because now they don’t have to worry about being pushed-to or pulled-from.

C++ Pattern

It’s as if the containers act as peace-makers, mediating between all these functions that all want to be in charge, that push and pull the data whenever they find it most convenient. But as I’ve mentioned before, containers almost always come at a cost, and it would be better if they weren’t the default solution to the problem of manipulating sequences.

Harmony between Push and Pull

But consider what happens if we take our first example, and replace getValue with the getchar from the stdio library:

This code instead reads the values from the console2. This is still a pull, since the code appears to be actively causing the value to be acquired, and just like before, the value is available to be used on the very next line of code.

But yet we also know where these values ultimately come from: they come when the user pushes keys on the keyboard (this is a “push” in two senses of the word). How can it be that both sides of the interface think that they initiated the transaction?

Push and Pull Keys

In many ways I think it’s equally true that both sides did really initiate the transfer. This example shows that it’s possible for the parties on both sides of the interface to feel like they’re both in complete control, calling the shots and deciding when the next value should appear. In this particular case what’s happening is that the thread is suspended or blocked until the value becomes available.

Let’s step out of reality for a moment, and imagine a strange world with different laws of physics and time. Imagine that I decide I want to talk to my friend, and after dialing his number on my phone, the phone freezes me in time, without forwarding the call to my friend. There I am, suspended in time in my room for days, not knowing that the world is continuing around me. A few weeks later, of his own accord, my friend also spontaneously decides it would be a good time to give me a call. After dialing my number, time unfreezes for me, and I hear his voice as if no time had passed at all for me. To me, I feel like I called him, but to him, he feels like he called me. In fact both are true: we both initiated the call.

This is very convenient for both my friend and myself, since we both waited till the perfect moment to call each other. Neither of us were interrupted by “receiving” the other person’s call (possibly while we were in an inconvenient state, such as in the shower or on another call).

The same is true in the world of code. Code which pushes and pulls is much easier to write than code which is pulled-from or pushed-to. State management is much easier, because code that is pushed-to has to be prepared to be interrupted (hear: race-conditions and re-entrancy problems).

Just like freezing time is difficult in the real world, multithreading in the software is not easy either (although it’s obviously much easier to suspend thread-time than to suspend physical time!). But lets take a look at another example, this time in C#:

The example is simple. In our main function we have a loop that pulls values from a sequence of values returned from the Values function. The Values function which produces the data also has a loop, and pushes values the consumer (Main). Don’t let the keyword “return” deceive you into thinking that that the Values function is not pushing, since the act of returning is initiated by the called function, and the function progresses to the next line of code when the consumer comes back for the next value. The code would look pretty much the same if yield return i was instead produceValue(). Or to put it another way, the code would look horribly different if Values was actually just responding to pulls and not structured in such a way that it’s pushing.

Neither the Main function nor the Values function is more in control than the other – they are like a pair of coroutines, cooperatively yielding control to each other. Both are written as if they’re in control, which makes the code much simpler to write. And best of all, it does this without using any extra threads! This is the power of the generator – to be able to write functions  which produce values by pushing them, while their consumers are able to pull them. The best of both worlds.

  1. Except that the producer state is statically allocated, which may not be a good thing in general but it makes for a simpler example 

  2. Or other standard input 

Linked list of Koalas

Sequences: Part 3

I’m taking a bit longer than usual to write up new blog posts recently since I’m in the process of moving from San Diego to Melbourne, Australia. Hence the photo of the linked-list sequence of Koalas above1. Things should get back to normal in a couple of posts from now, and I’ll let you know how the move goes!


But enough about my life – you’re here because software is awesome! And together we’re exploring the best ways of working with sequences. Today we’re going to try to write a function2 that decodes UTF-8, without using the heap, and keeping the input “shape” the same as the output shape, so that multiple similar functions can be stacked together (something I touched on at the end of the previous post).

Let me quickly try to describe what I mean having the input “shape” match the the output. I’ll take last week’s example of having a “decompress” function which feeds bytes into our beloved “decodeUtf8” function, which feeds characters into a “parse” function. Often what we land up with something like the following situation:


That is, the “stack” of functions doesn’t fit together on it’s own. One function wants to push data to the next function (it wants to be in control), while the next function wants to pull from the previous (it wants to be in control instead). What we land up needing is something in between each layer of the stack. Something that doesn’t mind being pushed to and pulled from. Something doesn’t doesn’t take any control. This is normally a container, such as a list or buffer:


Each of our two attempts so far has manifested this problem slightly differently. In our first attempt, the function pulled bytes from an array, and pushed the resulting characters to an array. In that case the buffer was built into the function itself, so this push-pull conflict was absorbed, but its memory inefficiencies and lack of asynchrony were still an issue.

Our second attempt could be said to have rightfully had a “pull-pull” shape – as we want – since it pulled out of the input array, and the caller pulled characters from it one-at-a-time by calling it. The shape mismatch in that case was simply that the caller was forced to provide an array input, while the output was definitely not array. What was the output exactly?

This takes us into the land of iterators.


For reference, here’s last week’s function again.

And let’s consider how we might use it:

This code assumes that we have some function “init_decodeUtf8” that gives you the initial cursor state for some document. Notice that our code here doesn’t interact directly with the value of the cursor state, it only interacts with the functions init_decodeUtf8 and decodeUtf8_attempt2. This is intentional, and is done to encapsulate the state of the cursor. That is to say, the state of the cursor is managed only by those two functions, which limits the number of places in the code you need to consider when you think “what state can the cursor be in?”. Although in this case the encapsulation is manually enforced (we have to just “know” not to interact with the cursor state outside those two functions), if we upgrade our example to C++ we can get the compiler to enforce the encapsulation and data hiding:

This C++ class has a public interface (aka “surface area”) that exposes two functions: the constructor to initialize the object state, and a “next” function to progress the state to the next element and retrieve that element. The class encapsulates the state of the cursor, and prevents clients of the class from accidentally modifying the cursor.3

This class is an iterator. It provides a way for users of the function/class to iterate through the output sequence of characters. I would say that we haven’t made it into an iterator by making the class, but that we’ve just revealed the true nature of the original decodeUtf8_attempt2 function: it always was an iterator.

For those who are familiar with C++, you’ll probably notice the similarity between the Utf8Decoder class and a standard C++ input iterator.

For those who aren’t that familiar with C++, you may notice the similarity with Java’s Iterator<T> (and corresponding Iterable<T>), or C#’s IEnumerator<T> (and corresponding IEnumerable<T>). These are each codifications of the pattern that we described in part 2.

Attempt 3

So, we said that we wanted to try get the input and output “shapes” to be the same. Since we’ve now said that the output surface area is an iterator, we can be more specific and say that we want the input to also be accessed via an iterator, rather than directly passing it a whole array of input bytes.

This is actually quite a challenging problem, and I’m going to first choose C# as my tool of choice to represent the solution:

This is perhaps the most elegant solution we’ve encountered so far. It’s the most direct representation of the original problem statement, and works entirely using iterators. The input and output are the same “shape”, and it would be very easy to pipe the result of one function into another that accepts a sequence of that type.

Now let’s also take a look at how we might do this in C. What we want to do is abstract the input to decodeUtf8 function, so that while the input could be an array, but it could also be another iterator. We also want this function itself to be an iterator of the same shape. What about this:

This is quite awful, and requires some explanation. Firstly, the decodeUtf8_attempt3 looks very much the same as it did in attempt 2. This new decodeUtf8 function is expected to yield a new character every time it’s called, the same as before. The significant difference is that now the cursor state isn’t statically typed (it just uses void* to represent “any type”), and that it holds some sort of abstracted state (the input_state field). State does have a runtime type, and for this to work the state must be of type decodeUtf8_state. Why is it typed void if it must be decodeUtf8_state? It’s because the caller of decodeUtf8_attemp3 doesn’t know that it’s calling this specific function, but instead could be calling any function that produces characters while maintaining state.

The input to the iterator is provided when we initialize it, by calling init_decodeUtf8. We tell it what state to initialize, and where it must get its input data from. It must get its input data from another iterator function, and that function itself requires some iterator state which decodeUtf8_attempt3 needs to provide, so we pass that in.

This is quite awful, and if it doesn’t make complete sense to you, don’t worry. The point is that it gets incredibly difficult to write code in C that has abstract dependencies. Not only is the abstraction apparent at runtime, since every byte needs to be read through an indirection function call accessing indirect state data, but it’s also just less readable and really hard to get right.

C++ is only marginally better. It provides standard containers with iterators, but this doesn’t solve the problem of chaining functions together since most functions that act on sequences must pull from an input iterator and push to an output iterator. Most often you then need to have a container as a buffer to be able to “fit” these functions together. This can be good, but if you’re operating under tight memory constraints or dealing with asynchronous data then this typical approach can be a problem4.

C++ also provides template programming, which could allow you to have an abstract iterator input to a function, without the runtime overhead. But this is not easy to do, and although I would always suggest having functions that depend on abstractions, I would never recommend writing all your functions using C++ template programming to get those abstractions.

C# provided a much better solution to the eye, although at runtime there are many similarities between our C implementation and the C# one. For example, both will be using indirect function calls, and both provide a level of runtime abstraction.

We may have run out of options on this one. The languages have just let us down. There seems to be no way to get the efficiency, abstraction, and syntactic simplicity in the same package.

But that’s not the end of our journey. This pull-pull pattern is only one answer to dealing with sequences. Next time, we’ll turn the problem on its head and consider how to deal with sequences that are asynchronous. That is, sequences where you can’t pull data from the source, but instead the source pushes data to your function. For example, when you’re processing data from a network stream, you don’t want to have to wait for all the data to be present before starting to operate on it.

  1. Which, by the way, I do not have rights to, and could not track down its original source. No copyright infringement is intended, so if the photo is yours, please let me know. 

  2. As before, we’re don’t care about the implementation of the function, but more about writing a good interface to the function 

  3. You’ll note that the class is a little bit more verbose than it needs to be, because I’ve intentionally kept the init_decodeUtf8 and decodeUtf8_attempt2 functions as similar as possible to the original forms to show the equivalence between the object orientated way of looking at it and the functional way of looking at it. 

  4. Newer versions of C++ may be starting to deal with these problems, but it still isn’t nearly as neat as it could be 


Sequences: Part 2

In this series we’re looking at different ways of designing interfaces that interact with sequences. To investigate different interface design choices we’re using an example function which decodes UTF-8 encoded text – one that consumes a sequence of bytes, and produces the corresponding sequence of Unicode characters. Last time we considered a very simple design where the function interface simply accepted a null-terminated, heap-allocated byte array as an input argument, and returned a null-terminated, heap-allocated character array as output. Here it is again for reference:


Remember that we’re only looking at the interface of the function, since that’s the most important part when it comes to modularity and maintainability. Last week we considered some of the problems with the design of this function’s interface. One of things we said was a problem is that the output sequence is passed as a fully populated heap-allocated array. This meant that our function would probably have to use the heap, which would add inefficiencies and possibly duplicated code for a double-pass over the input data. It also raises the concern of pointer ownership, and coupling the function caller to unnecessary implementation details.

So let’s try again with our second attempt.

Attempt 2

What happens if, instead of returning the whole output sequence at once in an array1, we instead return the output sequence one element at a time. For example, we might do this:

Again, since this is a language-agnostic investigation, I’d like to just clarify some points for those who might be a little rusty with C/C++. The double asterisk in const uint8_t** means that nextCursor is an output parameter2. Both consts still mean that the input data is unchanged by the function.

So the function essentially accepts one argument: a pointer to the first byte of the UTF-8 data we wish to decode. It returns two outputs: the Unicode character represented by a wchar_t, and a pointer const uint8_t*. To decode a whole document or stream of data we would call the function multiple times – once for each Unicode character.

Although this function has changed a little since now it returns only one character at a time, it hasn’t really changed in essence. The new function interface itself is still just a particular implementation of our overarching conceptual interface:

sequence of bytes -> sequence of Unicode code characters

That is to say, we can still think of it as a function that accepts a sequence of inputs and returns a sequence of outputs – because that was our original requirement and this function fulfills that requirement. The state of the iteration is now contained outside the function itself, which is why we have the extra parameter, but the function still manages that state (calculating the next cursor and moving through the input bytes).

For those of you who are unconvinced about the idea of it still returning a sequence when it appears to return only one item, consider how this function could be seen as a generator. Each time it’s called, it will generate the next item in the output sequence. The parameters it requires are simply for persisting state between generator calls, and could be seen as “private” to the generator.

We could say that the data representing the sequence is no longer associated with a sequence of contiguous memory, but is instead “stored” in a more mysterious form. Something like a chronological sequence of return values.

So, is it better?

This function now doesn’t need to do any heap allocation at all, which could improve its performance. It also alleviates the problem of pointer ownership for the returned sequence, since there is no pointer because there isn’t any heap allocation.

But now the function is called many more times for the same sequence. Will this be a problem? Well, function calls on their own aren’t a problem, since the optimizer can inline many calls that aren’t necessary. For example if the caller was indeed outputting directly into some container or array in a tight loop, then the optimizer might inline the whole decodeUtf8 function. Of course it might not, so it may be a consideration for you. But word on the street is that most modern compilers are probably better than us humans at figuring out when a call should be inlined, so I think of this as a win.

There’s also a nice separation of concerns with this implementation. Since the function doesn’t loop, the number of test cases required to verify its behavior is much smaller. If it operates correctly on one character, and sets up the state correctly for the next character, then by induction it must work correctly for all following characters in the sequence.

So, we’re done?

Nope. This second attempt is much better than the first. But it leaves a lot to be desired.

For one thing, the input sequence must still be represented by a contiguous block of memory, which gives similar problems to what we thought we just solved.

Another problem with the input being a solid block of memory, which may not be immediately evident, is that the input and output sequences use inconsistent representations. The output is pulled by the caller “on demand”, while the input must already be there and waiting for use. This would be a problem if we wanted to stack multiple such functions together.

What if the input bytes come from a decompression function, while the output characters go so some parser function?


Now we have a problem. Since the output of one function doesn’t match the representation of the input to the next function (assuming that each layer looks a lot like our decodeUtf8_attempt2), we will again need containers to act as buffers between the functions.

What we need is a way to get the input and output to use the same philosophy, but without forcing the implementation of the function to use the heap as in our first attempt. This is what we’ll be looking at next time.

  1. Or, in other languages, most other container types such as lists, queues vectors, etc 

  2. The exact details are more complicated if you aren’t familiar with pointers, and in different situations it will mean different things.