Category: Uncategorized

Immutable.js vs Immer

Immutable.js vs Immer

This week I came across the library Immer as a convenient way of manipulating immutable data in JavaScript. After reading this Reddit thread that raves about how much better Immer is than Immutable.Js, I was worried I’d made the wrong decision to use Immutable.js in Microvium. But some performance tests quickly cleared up my concerns.

Immer certainly seems convenient to use. It uses a combination of proxies and copy-on-write to allow you to use the normal mutating syntax in JavaScript and it handles the underlying immutability automatically.

But “copy-on-write” raised a red flag in my mind. When dealing with small changes to large collections, as Microvium does, I can’t possibly imagine how copying the whole collection on every write can be efficient. And indeed, a quick performance test shows how bad the situation is:

import { produce } from "immer";
import immutable from 'immutable';

const count = 100;

// Test Immutable.JS
let map1 = immutable.Map();
for (let i = 0; i < count; i++)
  map1 = map1.set(i, i);

// Test Immer
let map2 = new Map();
for (let i = 0; i < count; i++)
  map2 = produce(map2, m => m.set(i, i));

For different values of count, here are my results:

100100010k100k1M10M
Immer11 ms94 ms8.8 sec17 min 47 sec??????
Immutable.JS3 ms9 ms47 ms257 ms2.4 s41 s
Builtin Map37 µs180 µs2 ms18 ms170 ms3.4 s

Insertion into the map using Immer is unsurprisingly O(n) relative to the size of the map, making the whole test O(n²). Even though I had a hunch that this was going to be the case, it was worth checking that Immer wasn’t doing some other clever tricks under the hood to improve the performance.

I’ve never actually tested the performance of Immutable.JS before, so I’m quite pleased to see how well it scales. It seems only slightly worse than O(1) insertion time.

Both libraries are quite a bit slower than using the built-in Map class, so my conclusions are as follows:

  • Use Immer if readability and ease-of-use is more important than performance, which is often the case. But be aware that it’s not a simple process to switch to Immutable.JS later because of how different the API is.
  • Use Immutable.JS if you need high-performance persistent collections that scale well to large sizes, if it’s worth the decrease in readability, type safety, and increased verbosity of the code.
  • Use the mutable built-in collections for performance-critical code that doesn’t strictly need immutability for the algorithm.

TC39 is looking to add built-in immutable collections to the JavaScript standard (see the records and tuples proposal). I’m excited to see how those perform.

Node’s require doesn’t always return the same value

Node’s require doesn’t always return the same value

This is just a curious edge case in node.js I came across while looking at creating the module system for microvium.

The node.js documentation says:

… every call to require('foo') will get exactly the same object returned, if it would resolve to the same file.

But this doesn’t seem entirely true. In the following code, a module imports itself twice, getting a different object each time:

// script.js 

const a = {};
const b = {};

module.exports = a;
const a2 = require('./script');
module.exports = b;
const a3 = require('./script');
console.log(a2 === a3); // false

Hughes List

Hughes List

Inspired by Eric Lippert’s recent post on a data structure called a Hughes list, I thought I’d play around with it by writing my own equivalent in JavaScript.

Side note before I get started. For my usual readers who are following the progress of my JavaScript-to-native compiler (MetalScript), I’ll hopefully continue blogging about it soon. I’ve just been in the middle of an international move, along with some other projects, and haven’t had much time to work on MetalScript. My flight out of the country leaves in a few days, and then hopefully my schedule will open up and I’ll continue on it. For those who don’t know what I’m talking about but want to know more, also check out my last post where I link to a presentation on what MetalScript is, or any other posts in the category.

Getting back to the point of this post…

The key benefit I see about Hughes list structure is that it is a persistent (immutable) list that allows you to concatenate, prepend, and append to the list in O(1) time. It does so by lazily accumulating the operations to be performed until the list contents are actually needed, when it then finally performs a single O(n) pass to build the final “real” list (at least this is my interpretation of how it works).

For those who want to cut right to the chase, see this GitHub repo for the final code. It’s not a lot of code. test.mjs is the entry point, and hughes-list.mjs is the implementation of the data structure.

I suggest reading Eric’s series first for a full explanation of the underlying principles before comparing it to mine. I’m not here to explain it (which Eric has done perfectly) but rather to provide a twist and an implementation in a different programming language.

I deviated quite a lot from Eric’s implementation, in the following ways:

  • My Hughes list is built up by composing imperative procedures that each mutate a JavaScript array, rather than working with an underlying immutable SimpleList like Eric uses. This produces the same logically-immutable list structure and achieves similar performance characteristics but saves on implementing an additional SimpleList type.
  • I omitted the wrapper class. If you inspect the list in a debugger, the list will actually be the function, rather than containing the function. I wouldn’t do this in production code but it leads to a nice simple implementation here.
  • As a superficial detail, I used the traditional JavaScript function names push and unshift instead of append and push respectively.

Before jumping into the detail of the implementation, it’s probably best to take a look at how this list structure is used. These are the unit tests I have for it (see the full file for more context).

checkList(HL.empty                               , []);
checkList(HL.push(list1, 4)                      , [1, 2, 3,   4]);
checkList(HL.concat(list1, list2)                , [1, 2, 3,   4, 5, 6]);
checkList(HL.unshift(list1, 0)                   , [0,   1, 2, 3]);
checkList(HL.concat(HL.push(list1, 10), list2)   , [1, 2, 3,   10,   4, 5, 6]);
checkList(HL.concat(HL.unshift(list1, 10), list2), [10,   1, 2, 3,   4, 5, 6]);
checkList(HL.concat(list1, HL.push(list2, 10))   , [1, 2, 3,   4, 5, 6,   10]);
checkList(HL.concat(list1, HL.unshift(list2, 10)), [1, 2, 3,   10,   4, 5, 6]);

These should all look pretty obvious. If you’re not familiar with persistent data structures, the most important thing to point out here is that a call like HL.unshift(list1, 0) does not change list1, but rather returns a new list that is the like list1 but with 0 at the front. This is also good for me to highlight because I said that my implementation of Hughes list composes imperative procedures that mutate the underlying JavaScript array – so it’s worth getting your head around how this immutable list structure is implemented on top of array mutations.

My implementation of the Hughes list data structure comes out to something like the following:

export const empty = () => {};
export const single = x => xs => xs.push(x);
export const concat = (ls1, ls2) => xs => { ls1(xs); ls2(xs) };
export const push = (ls, x) => xs => { ls(xs); xs.push(x) };
export const unshift = (ls, x) => xs => { xs.push(x); ls(xs) };

The things to highlight above are:

  1. The only operation used on JavaScript arrays here is push, even in the implementation of unshift. This is the key benefit of the Hughes list. Depending on the implementation of arrays in the JavaScript engine, the native unshift on arrays may be O(n) (on an array of length n), while push is typically O(1).
  2. As previously noted, this implementation performs mutations, so rather than ls2(ls1(xs)), we do ls1(xs); ls2(xs). The former might have created a new array/list from ls1 that is passed to ls2, while the latter simply invokes both mutations one after the other on the same mutable array.
  3. You’ll notice that the implementation is backwards from Eric’s implementation, in that the definition of a list is a function that appends its items to the end of a given array, not the beginning. This again is just because appending an item to the end of an array in JavaScript is the most efficient.

I personally think this implementation still captures the essence of what makes the Hughes list structure useful, while also interplaying well with the king of JavaScript list-like structures, the array. The key effect is that it allows us to concatenate, prepend, or append items in any order with almost no cost, and let the list methods build the compound procedure such that the operations are performed in the optimal order for performance.

C++ vs JS
Binary Type Coupling

C++ vs JS
Binary Type Coupling

I was chatting to some of my colleagues at work recently about the potential benefits of a JavaScript compiler vs a C compiler, when targeting an embedded MCU. A concern that came up multiple times regarding different features, is

…but you can already do that in C, so how is JS better?

This is a good point. If you’re given a problem, you can solve it in JS, or you can solve it in C (or C++). Is JavaScript really any better? Of course, you know my answer, which is probably pretty biased. But let’s dig into a few of the details.

Firstly, let’s put aside the question of performance for a moment. I argue that JS is going to yield a more performant program1, while I’m sure that most C programmers will argue that C will perform better. Probably both are right, under different circumstances. Let’s just side step this issue for the moment.

There is at least one very significant reason why I think JS has a lot more to offer than C or C++, and that’s the ability to write reusable code. This is the gateway feature that opens the door to a 10x improvement in productivity, because it means that you can leverage highly customizable libraries to do most of the heavy lifting, rather writing the code yourself. Yes, in C and C++ you can use third party libraries. But in JS you can do it an order of magnitude faster. There are many, many times, where it’s taken me literally less than 5 minutes to go from “I wonder if there’s a library that already does this”, to using the library and moving on to the next task.

The same cannot be said for any third party C or C++ library or code file that I’ve ever used, which typically take days or weeks of integration work, reading documentation, figuring out why it doesn’t compile2, and then debugging cryptic issues.

So why is this the case? Is it the package managers? Is it the community? Is it the language?

I’m guessing it’s multiple reasons, but I’m going to focus on one in particular in this post: binary coupling (at least that’s what I’m calling it).

An Example: a Sum Function

Let’s take a dummy example. Consider a function that adds together an collection of numbers (sound familiar?). In JavaScript, you can be certain it will almost definitely have a signature like this:

function sum(numbers);

We can use TypeScript to make things even more unambiguous, although this doesn’t change the meaning of the code:

function sum(numbers: number[]): number;

This function logically takes a collection of numbers. In JS, collections of this nature are represented as arrays. The output is the sum, which is obviously also a number.

What might this look like in C or C++? Here are some options:

int sum(const int* numbers, size_t count);

int sum(const int* numbers, size_t count) __attribute__3;

// Null terminated
uint32_t sum(const uint_32_t** numbers);

int sum(const std::vector<int>& numbers);

template <int Length>
int sum(int numbers[Length]);

template <int Length>
constexpr int sum(const int &numbers[Length]);

// Using an STL-like input iterator (see std::accumulate)
template <typename Iterator>
int sum(Iterator begin, Iterator end);

// Is this even possible? (passing values at compile time)
template<int ...values>
int sum();

Forgive me if I have some or all of these wrong. In my life, I’ve programmed much more in C++ than in JavaScript, but C++ always remains a little bit too complicated for me to remember all the subtleties.

What’s the difference between all of these options? In my mind, they are all logically the same thing. They represent a function to which you pass an “array” of numbers, and it give you the result.

The difference comes mainly down to how a client interacts with these functions at a binary level. What calling convention is used? How is the array represented in memory? What parameters or aspects of the function are available at compile time vs runtime?

If sum was a third party library, which of these forms would it take?

We have at least one good answer to this question, since the functionality is already part of the STL in C++. It takes the following form:

template< class InputIt, class T >
T accumulate( InputIt first, InputIt last, T init );

There’s nothing surprising about this. Since we’re talking about the standard template library, you expect this to be a function template. It’s a fairly generic solution, but certainly it won’t suit everyone. Because it’s a template, you can’t take the pointer of it, you can’t pass it around as an argument to non-template code, you can’t export it directly as a library function from an object file, etc. For example, try to translate the following JavaScript code into C++:

function aggregate(useSum, arr) {
  return (useSum ? sum : average)(arr);
}

The useSum and arr value might or might not be available at compile time — a reusable piece of code shouldn’t assume one or the other.

How C++ Usually Solves This

The solution to this in C++ is normally to shunt everything to runtime when you need reusability. As a case in point, take a look at the GPIO methods that the mbed library provides. To set a single bit that represents a GPIO pin, you would use a function with the following signature:

// Set the output, specified as 0 or 1 (int) 
void DigitalOut::write(int value);

This makes for fairly reusable code. But it comes at a cost. It invokes a function call4, using a runtime value. The function then needs to use indirect accesses and masks to figure out what bit to set, internally using a structure that looks like this:

typedef struct {
    PinName  pin;
    uint32_t mask;

    __IO uint32_t *reg_dir;
    __IO uint32_t *reg_set;
    __IO uint32_t *reg_clr;
    __I  uint32_t *reg_in;
} gpio_t;

The absolute cost here is a low – a function call is not a big deal, masking and indirection are quick, and passing an integer argument at runtime is not a big deal. But the relative cost is high — if value and the specific port happened to be known at compile time, this whole thing could have been one machine instruction. Now it has to be many more, and extra memory overhead.

But surely in JavaScript this is even worse? Isn’t everything at runtime in JavaScript?

I would argue “no”. In JavaScript you don’t specify what gets done at compile time vs runtime — you leave it up to the JavaScript engine to decide. This is why I use the term “binary coupling”. In C++, your code is coupled to the binary representation of the resulting machine code — you are forced to make certain decisions, and those decisions force the C++ compiler to use certain representations. This makes C++ a glorified assembly language — it’s a way for you to write machine code in a somewhat abstracted way.

While in JavaScript, you don’t write machine code. You specify the behavior of the program in unambiguous terms, and leave it up to the engine or compiler to implement the specification in terms of machine code.

You get a small taste of this in C++ with micro-optimizations and inlining. You write a function, and in some circumstances the compiler is free to chose not to represent that as an actuall CALL instruction in the output binary. This is an example that breaks the usual binary coupling, since the code does not dictate the binary interface. But this kind of behavior is necessarily very limited in C++.

The discussion here is not primarily about performance, it’s about reusuability. What I’m demonstrating is that in C++, it’s impossible to have a library API that doesn’t also dictate the binary ABI that is used to access that library. As a result you’re much less likely to find an existing library that serves your specific purposes, since you need to match not only the logical interface, but the binary interface as well (and the type interface, which is another story).


  1. Not referring to interpreted or JIT-compiled JavaScript of today, but a hypothetical bare-metal compiler 

  2. Did you leave out a macro configuration flag somewhere? did you use the right make file for your platform? are the include files out of date? were you supposed to use the .dll files, or the .o files, or the .a files? the instructions are for GCC, how do I do this in VC++? the instructions are for version 1.34 but only version 1.32b is ported to my system, and for some reason the make file doesn’t run 

  3. stdcall 

  4. The class itself is a small inline wrapper around the HAL, but the HAL is implemented as separate object files 

Coding on Paper in an Interview

Coding on Paper in an Interview

My company is in the process of hiring a back end or full stack software engineer. During the interview process, I often get complaints from candidates when I bring out an old fashioned pad of paper and pencil and ask them to write some code for me (or I offer for them to code on the whiteboard).

This is 2017, and we have syntax highlighters, auto-complete, Stack Overflow, etc. Why would I want to test a candidate’s ability to code on paper?? This seems barbaric and absurd. Like asking a racing car driver to ride around a track on a bicycle to see how well he handles the corners. Surely we’re past this antiquated form of testing in this day and age?

Well, no. Let me share my side of the story.

I need people who can code

The position is for a software engineer. I need to know that you can code1. It’s quite simple.

We can have a good chat about Azure, about how you parted the red sea, or how you helped get a man on the moon. But in the end, you’re here to code, so you need to show me that you can.

Hiring a coder without seeing them code, is like hiring a chef without tasting their food.

The number one best way to test your real-world coding ability would be to have you work with me for 3 months, but that’s not a practical way to screen people for a new position.

Another way for me to establish your level of skill would be for me to look at work you’ve done in the past. AKA your public contributions to GitHub, Stack Overflow, blogs, etc. I am perhaps one of the few managers who will actually clone your GitHub repos and look through your code, your commit history, your documentation, your blog posts2, your answers on forums, etc.

A quick side note on this: remember that I’m also a software engineer, like you. I understand perfectly well that there are times when you need to throw a heap of “poor” code together, just to get something done quickly, or because it’s just an experimental throw-away project that you were working on in your spare time. I won’t judge you on this. The signature of good style and craftsman ship will still shine through, even in your darkest code, so I still read it.

But, I can’t just look at past work you’ve already done. I also need to look inside your head. I need to see those cogwheels spinning when you’re working on a problem. I don’t just want to see the final outcome, I want to know what thoughts went through your mind on the journey to get there. The best way to do this is for us to work on a piece of code together, in an interview. Here you can explain to me what you’re thinking, and ask questions3

The code itself is the easy part

In the past, I used to ask two coding questions in an interview. I started with an easy one, and then would progress to the harder question. But I became tired of the emotional distress of watching people struggle through the harder question, and often eventually give up.

So, now I just ask the easy question4.

In fact, here it is. I’ll give it to you.

Write a function that adds up a collection of numbers.

That it.

That’s it?

Yes. That’s it.

In fact, I’ll give you the5 answer too6 :

int Sum(int[] numbers)
{
    int counter = 0;
    foreach (int number in numbers)
        counter += number;
    return counter;
}

Here’s another one:

public static double MySum(this IEnumerable<double> numbers)
{
    return numbers.Aggregate(0.0, (acc, x) => acc + x);
}

Side note: One person has pointed out that they would just use the Sum function that .net provides. Well done. Ten brownie points for you7. But I then I will follow on by saying something like “Somebody on the .net development team had to write that builtin Sum function. Pretend it was you. How would you write it?”

There are obviously many different ways to write some code that satisfies the initial requirements statement of this test question. This is true in the real world as well — feature requests come in many different forms, to varying degrees of specification, and you need to be able to dig in and understand the problem that the user is trying to solve, and explore different solutions and the effectiveness of each.

Be resourceful. Dig in.

I don’t care whether or not you remember the syntax for generics or extension methods in C#. I don’t care if you remember what the C# equivalent to a fold/reduce function is. These are questions that Google or Stack Overflow can answer in the real world.

I don’t care about your spelling. This is something a spell checker would check in the real world.

I want to see how you engage a problem. When you’re actually working on the job, you’re going to have all sorts of weird and wonderful requests thrown your way. How do you deal with them? Do you blindly start pecking at the keyboard, or do you dig into what’s actually being asked?

If you need some tool to help you, do you say “this company is so stupid, they don’t even provide xyz!”, or do you say “how do we get xyz?”.

The same is true in an interview. Do what you need to do to get the job done. If you need to understand the context more, then ask me. If you need auto-complete and an IDE, or Google, ask for a laptop (actually I typically offer one).

At the very least, I expect you to ask me what kind of numbers we’re adding together.

If you need help remembering the way something works, or the syntax for something, ask me. Let’s work through it together. I want to know that you can work effectively with me and others on the team. I want to know that you feel comfortable asking for advice, that you’re not the kind of person to suffer in silence.

If you think that the whole thing is stupid, let me know. I want to know that the people on my team can challenge me and can stand up to being challenged themselves.

It’s about why

It’s not about what your answer looks like. It’s about why you made certain choices. Were there performance concerns you were worried about? What usage scenarios were you planning for — and did you ask? Is one language construct better than another from a maintainability standpoint? What approaches did you consider, and how did you weigh up each?

If you’re [un]lucky enough to interview with me, keep in mind that the main thing I’m looking for is your reasoning and approach when encountering a problem. Talk me through your thinking.

It’s a toy

Yes, this coding exercise is a toy example. And yes, I’ll make stuff up as I go. If you ask whether the numbers need to be integers or floats, I’ll make the answer up as a I go, and it might be different every time I do an interview.

But put yourself in my shoes for a moment. In an interview setting, we only have time to play with a toy example. We don’t have time for me to explain a real problem and have you craft a real solution for me. I need to know how you code, and a toy example is the only tool in the box that can get the job done effectively in a short span of time.

Hopefully by now you understand that it really won’t help you to use a laptop in my interview. The things I’m looking for are things where a laptop won’t help you. Sorry. But do whatever makes you feel comfortable.


  1. …and that you can communicate effectively, and work well with my team, and have the relevant experience. But all of this is pointless if you can’t code. 

  2. I don’t care what the blog posts are about actually — I will evaluate you based on passion, ability to communicate clearly, etc. Bonus points if your blog is about coding, and I can get a two-for-one evaluation on communication and coding. 

  3. Like maybe “WTF are we doing this for?” 

  4. [Edit:] Since writing this post, I’ve changed my mind about the second, “hard” question. I now ask the hard question as well, depending on how much time is available. I might also ask a “time-boxed” version of the hard question — saying something like “do as much as you can of this in 10 minutes”. Part of the reason is that I want to give you the opportunity to present your skills on structurally different questions. 

  5. *an 

  6. Since the position is for an Azure developer, the most common language of choice for this question is C# 

  7. Even if you didn’t know that it existed, I’d give you points for saying “I’m sure this function already exists in some library somewhere. I would probably first work out if I can use that, rather than reinventing the wheel” 

Why Types Introduce Coupling

Why Types Introduce Coupling

Last time I gave a long post about why C is complicated and JavaScript can make your code simpler and less buggy. Today I want to touch on another idea: that statically typed languages introduce a level of coupling in your program that’s unnecessary and makes maintenance harder.

Imagine that I ask you to write a function to simply iterate through a list of integers and output them to the console. In C, you might write the following code:

void printListToConsole(int* list, int count) {
    for (int i = 0; i < count; i++) {
        printf("%i\n", list[i]);
    }
}

But wait a second. I didn’t tell you that the list was going to be presented a contiguous array! What if it was a linked list? Then the code might look like the following, assuming that we had previously declared the types LinkedList and LinkedListNode appropriately:

void printListToConsole(LinkedList* list) {
    LinkedListNode* node = list->first;
    while (node) {
        printf("%i\n", node->value);
        node = node->next;
    }
}

The problem in both of these is that the function we’ve written is intimately coupled to the type of the parameters it accepts. There’s no way in C to express the idea of “I want my function to accept any list-like type”. This means that we cannot reuse the above code in other circumstances where the type is even slightly different.

C++

In C++, we can address this problem to some extent by using templates. We could write the above function something like this in C++:

template <typename TList>
void printListToConsole(TList& list) {
    for (auto i = list.begin(); i != list.end(); i++) {
        printf("%i\n", *i);
    }
}

There are still some of the same issues in this code. For example, do we pass an iterator pair to the function (as many STL functions do), or do we pass the container itself?

To me this is a non-solution. For one thing, it’s saying that if you want to write reusable code that’s decoupled from the types of its parameters, you need to write everything in your program as templates. This is clearly not an option for many reasons, including compilation time, error messages, complexity, etc.

All that C++ templates give us is a taste of a world without explicit types. The above C++ function is decoupled from the caller simply because it does not state what type it expects.

It’s true that there are languages that perform type inference, and allow you to write pretty generic code without losing static type safety, but as with C++, the error messages and language complexity associated with this type of coding makes it difficult for beginner-to-intermediate level programmers.

C#

At first glance, you may think that higher level statically-typed languages solve this with better abstraction mechanisms. Consider the C# code for the above function:

public void printListToConsole(IEnumerable<int> list) {
    foreach (int i in list) {
        Console.WriteLine(i);
    } 
}

The above function says that it accepts any argument that is enumerable, meaning anything that can be iterated. This is fantastic from a reusability and code maintenance perspective, so it does address one side of the issue. But there is another issue lurking behind the scenes. One that may be more noticeable to people working in very resource-constrained environments like in embedded firmware.

Although the above C# code does not couple itself to the specific type of integer list, and so can be used throughout the program, it does actually still have a level of coupling at the binary level: the function accepts something that has a binary signature of IEnumerable<int> . It is not the same as a C++ template that is decoupled by having the template instantiated for multiple binary signatures.

This machine-level coupling is invisible to the programmer, which is convenient and important when it comes to managing large and complex codebases. But remember that interfaces as a language feature are normally implemented at the binary level by passing control through several levels of indirection. I won’t go into detail – you can look it up yourself if you’re interested – but here is a brief summary to illustrate my point.

The machine code must first look up a virtual dispatch table from the interface pointer, and then look up the correct entry in the dispatch table corresponding to the method you want to call, and then perform an indirect call through that entry in the dispatch table. The indirect call doesn’t go straight to the function implementation, but first through a thunk that adjusts the this  pointer from the interface’s this  pointer to the underlying object’s this  pointer (which are different because objects can implement multiple interfaces), and then the thunk redirects to the actual function. This is to perform a single function call on an abstract interface.

Although the C# language doesn’t dictate these details, you have to be aware of them anyway. The logical types are still coupled to the binary types, and so the choices you make about how generic you want your code really do still have an effect on the performance and overheads involved in your program.

JavaScript

So what’s different about JavaScript? Simply put, JavaScript doesn’t require you to specify these type details at all.  This not only brings a level of decoupling to the logical program, where you gain benefits of reusability and maintainability, but also unshackles the compiler to theoretically be able to apply a whole new range of optimizations. As compilers are getting better and better, I think this will be more and more the case, and soon the performance of a program will be more related to how few details you dictate, rather than how many, and the performance of JavaScript programs will exceed those of the C and C++.

Why JavaScript

Why JavaScript

People who have known me for a while, and long-time readers of this blog, will know that I never used to be a fan dynamically typed languages. The static type system available in C# and C++ can help to prevent many bugs and runtime errors, and allow the programmer to add extra clarity to the code.

But in recent years, I’ve come to acquire a taste for JavaScript. So much so that I’m actually on a mission to write a JavaScript compiler for embedded firmware, so that I can bring the bliss of JavaScript to the nightmare of embedded programming.

For those who don’t already know, JavaScript is a completely different language to Java. Please, do not use the word Java to refer to JavaScript, or vice-versa. They are as unrelated to each other as C++ is to Java. I know this is stated many times, but I’m expecting that some of the readers of this post are coming from a purely C/C++ background, and might not be aware of this fact.

Many people have written about the pros and cons of JavaScript, so what I’m saying here is not unique, but I’m going to approach it from the perspective of an embedded firmware programmer, who typically works in C or C++ on a daily basis. Some of what I say will also apply to those of you in other arenas.

Familiarity

One thing that JavaScript has going for it, is that the syntax and structure will be familiar to most embedded programmers (and many other programmers). It uses curly-braces to denote blocks of code. Function calls are done using parentheses with comma-separated arguments, as in foo(1,2,3). Most of the C/C++ operators work as expected, such as the logical operators &amp;&amp; and ||, and the bitwise operators &amp; and |.

These may seem like obvious things, if you’re from a background of C/C++, C#, or Java, or some related language. But bear in mind that not all languages are like this, and so the familiarity of JavaScript eases the learning curve associated when coming to JavaScript from languages like C/C++. Compare these two snippets of code, one in C, and the other in JavaScript, that implement the Fizz-Buzz challenge:

// In C
void fizzBuzz() {
  for (int i = 1; i < 101; i++) {
    if (!(i % 15)) { 
        printf("FizzBuzz\n"); 
    } else if (!(i % 3)) { 
        printf("Fizz\n"); 
    } else if (!(i % 5)) { 
        printf("Buzz\n"); 
    } else {
        printf("%d\n", i);
    }
  }
}
// In JavaScript
function fizzBuzz() {
  for (var i = 1; i < 101; i++) {
    if (!(i % 15)) { 
        console.log("FizzBuzz"); 
    } else if (!(i % 3)) { 
        console.log("Fizz"); 
    } else if (!(i % 5)) { 
        console.log("Buzz"); 
    } else {
        console.log(i);
    }
  }
}

If you’re a C programmer, then hopefully the above similarity appeals to you.

Simplicity

Perhaps the number one thing I like about JavaScript is simplicity. With JavaScript you can often write very clean and readable code, because you’re focusing on the algorithm and behavior, rather than adding bloat associated with types and ABI’s and memory concerns.

In C++, you’re writing code to appease three different rulers: the type system, the machine, and the reader/maintainer. Most C++ code seems to be a compromise between these, but in JavaScript your code can focus on the reader/maintainer, and let the compiler worry about the machine. You worry more about clearly describing, in code, the behavior of the program, rather than worrying about how the compiler/interpreter will implement that behavior in terms of instructions and memory access, or worrying about constructing elaborate types to describe the meta-rules about what the program is allowed to do.

As a thumb-suck estimate, I probably spend at least half my time in C++ trying to appease the static type checker. And I would say that 90% of that work is spent on false-positives – cases where the type checker notes an inconsistency, but where the same code code in a dynamically typed language would not have had a bug in it. Inconsistencies in the type system do not always equate to real program inconsistencies.

At the risk of making this post far too long, I’m going to give you an example, albeit somewhat contrived. Let’s say that we have a binary tree structure: a node in the tree is either a leaf-node with a value, or an internal node with no value but with left and right subnodes/children. Now let’s say that we want a function that flattens the tree, returning an ordered sequence of only the leaf values.

In JavaScript

In JavaScript, I can imagine a function that looks like this:

function flattenTree(root) {
    var items = [];
    var stack = [];
    stack.push(root);
    while (stack.length > 0) {
        var node = stack.pop();
        if (node.left && node.right) {
            stack.push(node.right);
            stack.push(node.left);
        } else {
            items.push(node);
        }
    }
    return items;
}

var t = {
    left: {
        left: 1,
        right: 2,
    },
    right: {
        left: 3,
        right: 4
    }
};
var l = flattenTree(t);
console.log(l);

It uses a stack to iterate the tree. I could have made a much more succinct solution using generators and recursion, but I’m appealing to those of you who are coming from strictly imperative, procedural programming backgrounds, so that’s why I chose this approach.

In C

Let’s write the equivalent code in C.

Firstly, a node can be either a single value, or branches into two subnodes. How do we represent this?

Here is one way. A node can be either a leaf value or an internal node, and to tell the difference, we probably need some kind of flag (or tag). It also needs something to store the contents of the node, which is a union between the two different options:

struct Node {
    bool isLeaf;
    union NodeContents contents;
};

union NodeContents {
    struct InternalNodeContents internalContents;
    struct LeafNodeContents leafContents;
};

Side note: the style of the above code might not be what you’re familiar with. Or or it might be. I don’t know because C (and C++) doesn’t come with a standard style, which is one of the ways in which I think JavaScript is better: there is a single generally-accepted style to writing JavaScript code.

The contents of an internal node is that it has a left and right sub-node:

struct InternalNodeContents {
    struct Node* left;
    struct Node* right;
};

Hmm. There’s another question here that’s unanswered. Who owns these nodes? Does the tree own them? Could there be multiple trees that share the same nodes? Is it even the same between one tree and the next, or within all the nodes of a tree? It’s not specified here, but it’s yet another “bloaty” detail to figure out.

We said that a leaf node is “a value”. But what value exactly? Is it an integer? Another struct? Is the value type the same for all nodes in a tree? Should the memory for the value embedded into the node, or should the node point to the value? If it points to the value, then who owns it? If it’s embedded, then how big is it, and are there any rules we have to follow when copying it or moving it to different locations in memory (is it pointed to by anything else, or has ownership of anything else)? So many questions. So many details that aren’t relevant to the problem at hand.

One way here is just to say that a leaf node has a pointer to the value, and that we don’t know anything further about what type it is:

struct LeafNodeContents {
    void* value;
};

I could save on some of the bloat by using an anonymous union, but I’d argue that not a whole is gained in terms of simplifying real complexity.

struct Node {
    bool isLeaf;
    union {
        void* leafContents;
        struct {
            struct Node* left;
            struct Node* right;
        } internalNodeContents;
    };
};

How much time have we wasted so far? Let’s recap what we’ve done:

  • We’ve defined one possible implementation of a tree node
  • We’ve coupled the implementation to issues of memory layout, such as whether nodes point to their children and values or have them embedded in the same memory.
  • We’ve opened many cans of worms regarding ownership, style, the type of contents, etc.
  • We haven’t even started writing the actual function yet.

Now for the actual function (oh dear, this is going to be a long post). We need a way to pass the tree to the function, and a way to retrieve the list back. We’ve already defined a type for the tree, there are other questions to be answered when it comes to passing it to the function:

  • Should the root node be passed by value, or by pointer?
  • Is the function be expected to mutate, or free the tree passed to it? For example, can the output list reuse the memory from the input tree?
  • Should the tree be passed using a shared global variable, or an actual parameter?1

There are similar questions about getting the output from the function, with all the same concerns that we’ve already discussed about representing trees. Should the result be a contiguous array or a linked list? Should it be allocated by the caller or callee? Should it be a shared global variable? And any number of other considerations.

I’m going to try my hand at a direct implementation in C, trying to sidestep issues of memory allocation by having everything passed to the function itself:

int flattenTree(struct Node* root, void** list, int* listSize, int maxListSize) {
    *listSize = 0;
    struct Node* stack[MAX_TREE_DEPTH];
    int stackSize = 1;
    stack[0] = root;

    while (stackSize > 0) {
        struct Node* node = stack[stackSize--];
        if (node->isLeaf) {
            if (*listSize >= maxListSize)
                return ERR_LIST_TOO_SMALL;
            *list++ = node->leafContents;
            (*listSize)++;
        } else {
            if (stackSize >= MAX_TREE_DEPTH)
                return ERR_TREE_TOO_DEEP;
            stack[stackSize++] = node->internalNodeContents.right;
            stack[stackSize++] = node->internalNodeContents.left;
        }
    }
}

int main() {
    int values[4];
    values[0] = 1;
    values[1] = 2;
    values[2] = 3;
    values[3] = 4;
    struct Node tree[7];
    tree[0].isLeaf = false;
    tree[0].internalNodeContents.left = tree[1];
    tree[0].internalNodeContents.right = tree[4];
        tree[1].isLeaf = false;
        tree[1].internalNodeContents.left = tree[2];
        tree[1].internalNodeContents.right = tree[3];
            tree[2].isLeaf = true;
            tree[2].leafContents = &values[0];
            tree[3].isLeaf = true;
            tree[3].leafContents = &values[1];
        tree[4].isLeaf = false;
        tree[4].internalNodeContents.left = tree[5];
        tree[4].internalNodeContents.right = tree[6];
            tree[5].isLeaf = true;
            tree[5].leafContents = &values[2];
            tree[6].isLeaf = true;
            tree[6].leafContents = &values[3];
    struct Node* treeRoot = &tree[0];

    void* list[4];
    int listSize;
    flattenTree(treeRoot, &list[0], &listSize, 4);

    for (int i = 0; i < listSize; i++) {
        printf("%d", list[i]);
        if (i != listSize)
            printf(", ");
    }
}

This code makes me cringe.

It doesn’t really match the spec, because it has a hard limit on how deep a tree can be. To get around that we would need some dynamic memory allocation, which would add a whole lot more bloat and complexity. This code also requires that the caller have some idea of the maximum size of the resulting list, which may or may not be easy to know.

The complexity is ridiculous. How many times do you need to look at *list++ = node->leafContents  before you can be sure that you’re incrementing the pointer to a list of pointers, and not incrementing the pointer in the list. Maybe we need to add some more bloat to encapsulate these: more functions for managing the list so we only have to write that kind of code once. Don’t even get me started!

So let’s see how well our beloved type system did. I’m going to compile the above code and see what errors come up.

Here are the list of mistakes I made. I’m categorizing each as either true-positive (the compile error saved my skin), false-positive (using a dynamic type system I would not have had a runtime  bug), or false-negative (I found a bug by looking actually the compiler didn’t catch it).

  • I forgot to #include <stdbool.h>   … arguably a false-positive, since stdbool is only needed if you have static types.
  • I forgot to include `stdio.h`. True-positive: I forgot to include a module that was relevant to the program behavior.
  • In creating the example tree,  the line tree[0].internalNodeContents.left = tree[1] , I was missing an `&` sign. I’m going to say this is also a false-positive. I was assigning one value to another, and the fact that one value is typed by pointer and the other by value is not a concern related to the algorithm or code logic.
  • To pop a value off the stack, I used `stack[stackSize–]` instead of `stack[–stackSize]`. This is a false-negative. The type system did bugger-all to protect me from accessing invalid memory. Luckily the hardware caught it and issued a seg-fault, but on embedded hardware you aren’t always so lucky! What’s more is that code that caused the issue is unrelated to the algorithm that the function was supposed to be implementing. In a sense, it’s the implementation of a completely different algorithm (the algorithm for pop stacks). So the bug in the code was not just not-noticed by the C compiler, but it was in a real sense caused by the limitations of the C language.
  • In `printf(“%d”, list[i])`, I was logically printing an integer to the console, since the list is a list of integers, but actually the integers are physically stored as references (pointers), so it should have been `printf(“%d”, *2`. Pretty, ain’t it? This is a false-negative. There was a bug, but the type checker failed to find it. Instead it just printed out crap to the console. On GCC with the default settings3, there was no warning about this.
  • I’m not returning a “success” code, or checking the return code when the function is called. This caused no error in this case, but might cause strange behavior if there was something that did check the result error code, or a case where the error code was necessary (a failure). I’d call this a true-negative in this particular case. The function acts unexpectedly, but doesn’t explicitly say otherwise so actually there’s no spec that it’s defying. What’s more is that it doesn’t introduce a bug into this particular program.

So how does that compare with JavaScript?

Well, what happened when I ran the JavaScript program? Exactly what I expected to happen. It ran. It output the correct result. No errors. No bugs.

This is not because I’m an expert in JavaScript. I have many more years’ experience in C than JavaScript. It’s because simple, bloat-free code is easy to reason about, and so less likely to contain bugs.

Conclusion: please use JavaScript instead of C. You will have fewer bugs because your code is simpler. It will also cost less to develop because there are fewer concerns to worry about, and it will be easier to maintain because the code is clear and easy to understand.

In C++

I’m not going to implement the above in C++, but instead I’m going to say, in a hand-wavy way, that I don’t think it’s much better. In C++, you could write something that looks similar to the JavaScript version, using a stack from the STL to implement the stack variable. But the problem with this is similar to the problem with C: the implementations are coupled to the machine in a way that means when you bring in your favourite container, you’re forcing the compiler’s hand when it comes to implementing the code in terms of machine instructions and memory. The result is essentially bloat in a different kind of way. It get’s messy, and to make a solution that is as generic as the JavaScript one would require a ton of code, and with it a ton of bugs.

That’s all I’m going to say for the moment. If you come from the land of C++ and want to hear my opinion in more detail, leave a comment or send me an email, and perhaps I’ll make another post about it. This one is well long enough that I should be moving on to my last point.

Safety

The above C example leads me to another great thing about JavaScript: safety.  What I mean by safety (or lack thereof) is:

  • How easy is it to introduce bugs?
  • How bad are the bugs, and how difficult are they to fix?

C is awful in this respect. The simple, real, bug in the above code where I dereferenced a pointer that wasn’t assigned, leaves the program open to the most hideous kinds of erroneous behaviors – those that are non-deterministic, and can affect anything in the program. Once you’ve crossed the line of undefined behavior, things in completely unrelated parts of your program can start failing for no apparent reason, not matter how well you wrote them. This is not just a door for bugs, but also for malicious attackers.

In JavaScript , there is no such thing as undefined behavior. There are a few things that are implementation-defined, meaning that different JavaScript engines will execute them differently, but there is nothing like C or C++’s undefined behavior, where a line of code can have literally any effect, including damaging the data or even functions in unrelated parts of the program. When you want behavior to be well-defined, use JavaScript instead of C/C++.

JavaScript is also a very good abstraction. Programs execute in isolation from the rest of the system, which is great if you have safety-critical or security-critical applications which need to guarantee some sort of behavior.

Conclusion

I could go on and on about the benefits of JavaScript, and perhaps I will in later episodes, but for the moment I hope that in this extraordinarily long post I’ve convinced you that there is some hope to JavaScript, even to the point of using it in embedded firmware development.


  1. Most people would say using a parameter is preferable, but as I’ve said before: in C you’re appeasing multiple gods. The choice of whether to use a global variable or a parameter is not just about what is easier to reason about or better for code reuse, it’s also about the function ABI and the machine instructions generated. 

  2. int*)list[i] 

  3. and the `-std=c99` flag 

Continuations in C

Continuations in C

There are times when you need to call a function, but you want to say “call me back when you’re done” rather than blocking the current thread.

An example might be when you’re reading a file. If you imagine for a moment that every CPU cycle is on the scale of 1 second, then disk access is in the order of days to months (take a look at the coding horror post about it). When you call a simple C function like fread, you could be blocking the current thread for millions of CPU cycles. You don’t want to be blocking the thread, because threads are a valuable resource and multithreading is a difficult skill.

The Typical Solution

The typical way to solve this in C is to use a callback function. I’m not going to explain callback functions here, since there’s an abundance of information about them on the internet. Instead I would like to point out a convenient pattern of how to store state for the callback function.

Let’s use a concrete example. Say we have some function bar, which is expected to take a long time to execute, and a function foo which needs to call bar. The synchronous way of writing the code (non-callback way) might look like this:

int bar() {
    // Some time-consuming task
    // ...

    return 42;
}

void foo() {
    // Some code before we call bar
    // ...

    int b = bar();

    // Some code after we call bar
    // ...
}

The task finishes by returning some result of the long process. For the purposes of this example, we’ll say that the result is 42.

If we convert it to the asynchronous form (the callback form) it might look like this:

typedef void (*bar_callback)(int result);

void bar(bar_callback callback) {
    // Start some long-running process.
    // Save the callback somewhere so that
    // when the long-running process is done
    // we can call it with the result

    callback(42);
}

void foo() {
    // Some code before we call bar
    // ...

    bar(continue_foo);
}

void continue_foo(int b) {
    // Some code after we call bar
    // ...
}

Note that normally bar would not call the callback itself, but instead save the callback to be called later. I’ve only called it directly from bar as a convenience in the example.

The Problem

I’ve seen this pattern many times. But it’s flawed in a major way: if foo has some state that must be persisted across the call to bar, how does it get that state to the continue_foo function? For example, if I declare a variable in foo, how do I access the variable in continue_foo? Typically what I see is that people will simply use global variables. But that’s an awful solution for many reasons1.

Slightly Better

A better pattern, which I’ve used myself quite often, is to for foo to tell bar, “please hold the value of XYZ for me, and when you call me back, please give XYZ back to me to remind me why I called you in the first place and help me remember where I left off”. It might look like this:

typedef void (*bar_callback)(void* state, int result);

struct Foo_State {
    int x; // Some "variable" that foo wants to have preserved
};

void bar(bar_callback callback, void* state) {
    // Start some long-running process.
    // Save the callback and state somewhere so that
    // when the long-running process is done we can 
    // call it with the result and the state

    callback(state, 42);
}

void foo(struct Foo_State* state) {
    // Some code before we call bar
    // This code can save things to
    // the `state`. 
    // ...

    bar(continue_foo, state);
}

void continue_foo(void* state_, int b) {
    struct Foo_State* state = (struct Foo_State*)state_;

    // Some code after we call bar.
    // This code also has access to variables in `state`
    // ...
}

A few quick points I’d like to draw your attention to:

  • Bar only sees the type `void*`, and not something more specific like `Foo_state`, because obviously `bar` may be called by other functions as well, not just `foo`
  • Rather than allocating foo’s `state` on the heap, foo just accepts the state as a parameter, leaving it up to the caller to decide where it must be allocated. This parameter is only to say where the state should be stored, and is not expected to have any values populated by foo’s caller.

Let me emphasize that last point again: there is no heap allocation involved in this example. The state could very easily be statically allocated, or pool-allocated, or even stack allocated2. Especially, consider that foo’s caller is likely to face the same problems foo has faced with state management, and so might already have it’s own state structure which would provide the perfect home for foo’s state structure without incurring an additional heap allocation.

The Best Solution

But we can do even better. The problem with the above example is that we’re passing two things around: the callback function pointer, and the callback state pointer3. Let’s take a look at a way of doing this while only passing one pointer:

typedef void (*bar_callback)(void* state, int result);

struct Foo_State {
    bar_callback call;
    int x; // Some "variable" that foo wants to have preserved
};

void bar(bar_callback* callback) {
    // Start some long-running process.
    // Save the callback and state somewhere so that
    // when the long-running process is done we can 
    // call it with the result and the state

    (*callback)(callback, 42);
}

void foo(struct Foo_State* state) {
    // Some code before we call bar
    // This code can save things to
    // the `state`. 
    // ...

    state->call = continue_foo;
    bar(&state->call);
}

void continue_foo(void* state_, int b) {
    struct Foo_State* state = (struct Foo_State*)state_;

    // Some code after we call bar.
    // This code also has access to variables in `state`
    // ...
}

I’ll draw your attention to the differences:

  • `Foo_state` now contains a field called `call` which holds the callback function pointer. It’s important that this field is the first field in the structure so that a pointer to this field is also a pointer to the whole structure.
  • The callback function signature still accepts the state as a parameter, as before.
  • The call to `bar` no longer takes two parameters but now only takes a pointer to the callback function pointer (note the double-pointer)
  • When `bar` needs to call the callback function, it needs to dereference it first. It also needs to pass the callback state. But since, by design, we’ve said that a pointer to the callback function [pointer] is also a pointer to the callback state, we can simply pass that pointer as the argument. This gives us the interesting syntax `(*callback)(callback, result)`. In a sense, this is saying “call the callback, and tell it which callback we called”.

Those who are familiar with how object-orientated programming works under the hood may recognize this pattern. Typically objects are laid out in memory such that the first field in the object state is a pointer to the class vtable. When you call a virtual member function on the object, the pointer-to-the-object is treated as a pointer-to-the-vtable-pointer and is used to resolve the virtual dispatch. In our example above there is actually less complexity and overhead, since we don’t need a whole vtable but can point directly to the function.

Advantages

I love this pattern because it’s really clean and quite simple. The whole callback, including the function and the state, is neatly represented by a single pointer4.

The callback pointer can be called using a very self-contained syntax. That is, it only depends on one variable, not two. This is actually not just a matter of syntax: a single variable means better use of CPU registers, and fewer accesses to memory.

The fact that the callback is represented by one small value also makes it easier to manage. There’s much less risk of calling the callback with the wrong state. It’s also lighter to pass around.

Disadvantages

The most obvious disadvantage to me is that it uncommon. Someone looking at the code for the first time won’t just understand what’s happening straight off the bat. It also means that there’s no language support for it. C++ is in some ways an extension to C with language support for first-class objects. But there is no common language that is an extension to C with support for this kind of first-class-function (with state).

The performance of using this pattern isn’t a disadvantage in my opinion. If you’re comparing it to the performance of a “naked” function pointer, then yes, you may incur some overhead from passing the additional state argument and from double-dereferencing the function pointer. But consider that this type of function call should actually be faster than calling a virtual function in a most object orientated languages (which has a triple-dereference), because there’s no vtable lookup. And virtual function calls are in turn typically faster than interface function calls (and correspondingly virtual functions with multiple inheritance, depending on the optimizer and conditions).

I’d also like to dispel another disadvantage, not directly related to the pattern but more about using callbacks in general. At first glance it seems that there is a lot of overhead in accessing persistent variables in the state structure, because instead of saying “x” you have to say “state->x”, which implies an extra pointer deference and possibly some pointer arithmetic. But think about this: how are variables normally accessed anyway? Variables are normally stored in the stack frame, which is essentially a structure pointed to by the stack-pointer. Yes, there may be less register elevation which would affect the performance, but I think it may be less of a problem than you’d expect.

Likewise, at first glance it seems that there is extra space used to store the callback function pointer. But in reality, a stack frame also stores the “callback” function pointer anyway: we just normally refer to it as the “return address”. An important point to note in the last example, is that the very last thing foo does is call bar. This is what’s called a tail call, and it means that any half-decent optimizer will re-use foo‘s stack frame space for bar. To put it another way: while bar is active, foo doesn’t use any stack space, but it does use space in the persistent state structure (wherever that may be), and the persistent state structure has many of the same attributes as the stack frame would have had, including a pointer into code space. From this perspective, there is no extra space required to store the callback address in the state structure.

The only thing missing is hardware support. A “normal” call has hardware support for automatically populating the return address into the state structure (aka stack frame) and saving register states etc (aka saving persistent variables). And a “normal” return has built-in support for dereferencing the stack pointer to obtain the return address pointer (note the double-pointer again) and jumping to that address, all in one step. But I imagine that if this pattern became more common in usage (probably with language support), hardware support would probably follow.

Until then, I still think it’s a great pattern to use in C, and we should all add it to our toolbox of C patterns.


  1. Please ask me – I’ll be happy to tell you all the reasons why it’s so horrible 

  2. In the less likely scenario that the caller decided to manually block the thread using thread synchronization techniques. 

  3. On most modern architectures this would just mean that it takes twice the space, since there are two pointers involved. But C doesn’t require function pointers to be the same size as data pointers. One embedded architecture I work with has function pointers that are twice the size of normal heap pointers – after all, RAM is more expensive per bit than ROM 

  4. A RAM pointer, which in some cases is smaller than a function pointer, giving it yet another advantage over the typical callback 

Moonwalk debugging

Moonwalk debugging

How many times have you run into this situation:

You’ve just finished a large section of code. You launch your program in the debugger. You wait for it to start up, and then try it out a bit. But then… it crashes. Your debugger eagerly jumps up tell you there’s a problem. You look through the call stack and all the program variables, and you find something strange. A value that’s just clearly wrong. Why is it 25 and not 5? Why didn’t it remove the item from the list correctly? Why is it not sorted? You look through the code to try figure out the problem, but your conclusion is that, really, it should just be right. Your code looks fine – why did it get the wrong answer?  The only way to find out is to go back in time and breakpoint it at the point where it should have been generating the correct answer, but didn’t. So you set some breakpoints, and restart the whole program. You go through all the steps to reproduce the problem again (if you can even remember them). Finally you hit the breakpoint. You step slowly through your code, and finally find the problem. Or more likely, you find some other strange thing that may have lead to the first strange thing, and have to repeat the process. Or in haste you accidentally step over the most important line of code instead of stepping into it, and have to start again.

This used to happen to me all the time. In fact, it used to be my primary method of debugging: a binary search through the code, each time rerunning it to reproduce the problem. Wouldn’t it be easier to just step backwards through the code? The program breaks when it enters some invalid state. By the law of causality, the bug must have been in code that was before the point where the strange state occurred. Stepping forwards seems like the least useful thing you can do.

Nowadays I don’t have this problem as much, and I’ll tell you the secret: Write programs in a pure functional style. Write pure functions. That is, write functions that always return the same value if called with the same arguments, and that don’t have side effects that change the state of the system when the function is called. Write code that doesn’t mutate state.

Let me give you a real example. Recently I’ve been developing a graphics library for an embedded microcontroller. The previous graphics system was stateful: you had a mutable image buffer, and then you called different subroutines, such as drawLine, to affect a change to the pixels in the image buffer. I was tasked with creating a new graphics system – in part because we now needed to render graphics to an area many times larger than the size of all the microcontroller’s RAM.

The design I produced is almost completely stateless. For example, instead of calling a rotate function to rotate a graphic, I call a rotated function , which returns a completely new graphic object representing a rotated version of the original. Did you catch that? The word “rotate” is imperative: it commands the machine to do something. The result of what it does is known its side effect1. A function like this would probably return void, or a result code to confirm its success, and the rotation would be visible as a change in the original image.

On the other hand, the word “rotated” is, in this case, an adjective2. It describes not what the function does, but what its result is. If the argument is a picture of a dog, then the result is a rotated picture of a dog. It always will be. Every time. A rotated(dog) is always a rotated dog. The unrotated dog is still there, and still unchanged.

Then there are similar functions for creating a shifted graphic from an existing one, a composite graphic from other graphics, and a line, a polygon etc. As a quick implementation note: you may have guessed that these graphic functions can’t possibly return objects with all the pixels of the graphic they represent. The way this is resolved is with laziness – the graphic objects are only implemented as lightweight descriptions of what the resulting graphic should look like relative to their input. When it comes time to produce actual pixels, there is some short-circuiting magic that combines all the transformations into one ultra-efficient transformation, and then produces the pixels on-demand with almost no buffering required3.

Moonwalk the debugger

MoonwalkThe great thing is that doing things this way plays really well with the debugger. These techniques work in other environments, but I’m specifically going to talk about my experience in Visual Studio. When your program stops at a breakpoint or exception, you can rewind the stack one frame at a time (in Visual Studio 2013 this is Shift+F11), and you can rewind within a particular frame by simply jumping directly to that line of code (“Set next statement”, or Ctrl+Shift+F10). If your program is stateful, this is sure to break something. Typically when rerunning the same code twice you might not expect it to do the same thing the second time because the conditions may have changed since the first time it, and so the program may be in a different state, and thus produce different results. This isn’t a problem at all if you use immutable data structures and your functions have no state or side effects. Now you can step backwards! 

In addition to being able to step backwards, you can also evaluate arbitrary expressions in the debugger. For example, I have a function like saveGraphic(graphic, filename), which takes a graphic and saves it to a bitmap file. This is technically not a side-effect-free function since it writes to file, but for the purposes of debugging it has no side-effect which modify the state of the program in any observable way, and so is “pure” for practical purposes. In Visual  Studio, I can add a watch expression like saveGraphic(dog, "debug.bmp"), and the debugger will happily execute the function. Now I can see the value of my immutable graphic object! Since rendering doesn’t mutate program state, I can use “pure” functions like this to visualize any complex objects. I can even evaluate transformation functions in the watch window, such as saveGraphic(rotated(dog), "debug.bmp"), knowing confidently that my program state hasn’t changed as a result of evaluating the watch.

I hope you enjoyed this post, and learned something new that will help you with your bug battles, and challenge the way you think about coding.


  1. Whether or not a function’s “effect” is its primary purpose, it is still technically known as a “side effect” 

  2. I believe the term is deverbal adjective 

  3. I would probably have normally used matrices for this, but this microcontroller has no integer multiply or divide instructions, and no floating point anything

Finish before you Start

Finish before you Start

I find a lot of the coding I do is completely different, unlike anything I’ve done before. The reason is that if it’s anything like something I’ve done before then I’ve failed to recognize the pattern and abstract it out. As a software engineer I want to never be doing repetitive tasks, because that’s what software is for (repeating tasks). If you’re repeating yourself in any way then you’re not fully leveraging your time. Every pattern, every similarity, every “oh, I know how to do this because I did something similar with xyz” should be under careful scrutiny.

After a certain point you may run up against the limitations of a programming language, since many design patterns are simply language features that are missing. I come across this all the time in embedded C programming, where a “linked list” is not a container but the design pattern of having a next pointer in a structure (and all the algorithms that go with it), and a “virtual class” is not a data type but the pattern having a vtable pointer at the beginning of a structure.

At the point where the language betrays you, your next port of call could be writing your own domain specific language to target the problems you’re working on. In C# this may include the use of reflection to interpret existing C# data structures in new ways (such as interpreting a class as a list of serializable members), using expression trees to interpret C# expressions in new ways, or using reflection.emit to dynamically generate bytecode.

But despite the huge advantages of automating the mundane programming tasks and patterns, the problem arises: if you never repeat yourself, then everything is new. Everything is uncharted territory1. How do you know how long it will take? How do you know if it will work? Will it really be useful?

The solution is simple: finish before you start. Do whatever it takes to get the results of your experimentation before you actually perform the experiment. In software this means developing proof of concepts. Hack the result together using whatever means necessary. Spare no time for optimization or good coding technique. Don’t worry about corner cases and details. Get to the finish line by cutting across the sports field. Get to the finish line by cheating. By redefining the race. By faking the results. Calculate them manually. Hand-draw the GUI. Fake the input data. Brute-force the algorithm. Anything it takes.

It helps to clearly define the interface of whatever you’re coding: How will it interact with the rest of the system? If it’s a web page, the interface is the GUI. If it’s a library, the interface is the publicly exposed API. If it’s a DSL, the interface is a syntax or spec. Once the interface is there, it doesn’t matter if implementation is a birds-nest of cheating and killing. All that matters is that the interface offers just – just – enough to give you a little hint of what the future will look like. What it will feel like? Get a tangible product you can play with2.

Once you’ve seen the end, then you can start. A clearer picture of where you’re going will help not only in producing a better product or code, but it might change your direction completely. Once your fortune is told, it may bring news of triumph, or of peril. Perhaps when you see the end you realize it actually isn’t so great there. It isn’t as easy to use as you thought. It’s more complicated than anticipated. Perhaps the whole principle is flawed in some horrible way. There are many things that are hard to anticipate without getting a closer look. Or a hands-on experience. Perhaps you need to reconsider some of your assumptions, or abandon the whole idea altogether.

It’s better to know before you start.


  1. This is an exaggeration and oversimplification 

  2. I use the word “tangible” very loosely. A visual interface is “tangible” if you can see it, and a library interface is tangible if you can call it, etc.