Category: Software Design

Quality vs Quantity

Quality vs Quantity

It’s an age-old debate, but as I get older, my perspective is changing.

I feel increasingly frustrated by tools that don’t work smoothly; tools that weren’t thought out very well. And I ask myself, wouldn’t my life be better without these?

TL;DR The economic incentive for product development is not always aligned with what makes the world a better place. Tools that make people more productive do not necessarily make them happier. As creators, we should first-and-foremost create things that are pleasant to use, even when it’s not economically optimal for us or the companies we work for — economics should serve humankind, not be our master.

I’d be remiss creating such a post without whining about some concrete examples:

My Washing Machine

My washing machine1 has a “quick wash” setting that can do a load of washing in about 30min — great for those in a rush. And it allows you to further customize the quick wash with things like temperature and spin speed.

But for some unknown reason, it doesn’t allow you to select the maximum spin speed while in the “quick wash” setting!

If you want your clothes well-spun, you need to select another setting, the quickest of which is “whites”, and takes three-and-half-hours!

There’s a work-around, which I use most times — you run the machine on a quick-wash cycle (30 min), and then afterward change the setting to “whites” and then select the sub-setting “spin only”.

There are many other confusing and frustrating choices made for the interface and control of this machine, but it would take a whole post to go into them all!

How many extra man-hours would it have taken that company to properly think through the HMI2 design of the washing machine? How much extra would that have added to the price of each machine?

And here’s a deeper question: does the existence of this washing machine design make the world a better or worse place? If we could magically remove all washing machines with this design from history, would we be better or worse off?

What does it even mean to be better or worse off? This depends heavily on your world view. If you are a hedonist, then happiness/pleasure is the ultimate goal, and you can mentally compare the global happiness with or without this washing machine design in it, and make a value judgement about whether the washing machine is a net win or loss.

The difference between the two hypothetical worlds is subtle and complicated. The annoyance I feel when I do the washing is brief and small, but it’s multiplied by the thousands of people who have a similarly-designed machine.

And if we magically removed the design from the world timeline, we’d also be removing the satisfaction I felt when I discovered the work-around, and the catharsis of complaining about it in this blog post, or the social bonding it provides between people with the same issue who can enthusiastically affirm each other’s miserable experiences of such a damned machine.

But even with all these “benefits” of bad design, I honestly believe that we’d all be better off, in at least a small way, if this design didn’t exist.

So, what does this mean? Should humankind band together to create the perfect washing machine — a design which can be used for hundreds of years into the future, making people all of the world happy or even excited to do their washing?

In principle, I’d say “yes”. If we as humankind had collectively spent our efforts creating just one “great” washing machine design instead of a mix of good and bad designs across many companies, we’d probably all be better off overall.

The reality, of course, is more complicated. Different people’s ideas of the “perfect washing machine” are different. Technology makes this a moving target — the best washing machine 1000 years from now will be much better than the best we can do now, but we don’t expect a washing machine company to internally do a thousand years of technological research before creating their first product!

Adobe After Effects

Adobe After Effects is software for creating and editing video and animation. I learnt how to use After Effects this year so I could create little animated diagrams like that in my snapshot-vs-bundler post.

I find After Effects to be a really interesting example when it comes to armchair-philosophy about product quality. After Effects is, in my opinion, incredibly poorly thought out and frustrating to use. But it is also the only software that can do everything that it does, and using it makes you 100x more productive than if you were to try to do the same animations/effects manually.

I won’t go into a lot of specifics of what I think is bad design in After Effects, but I can’t help but mention one example: you can’t copy-paste something from one project to another (in fact, you can’t even open two projects at the same time). If you designed something great in one project and want to use it in the next (reuse, DRY), it’s a complicated and time-consuming process to export parts of one project into a temporary project and then import the temporary project as a nested child of the second project, from which you can now copy and paste.

You’re welcome to disagree with me and think that it’s a great design. The thought-experiment remains valid: in cases where a tool achieves something that was previously unachievable, or boosts their user’s productivity by some factor, is it better for the tool to have more capability or to be more pleasant to use?

The answer is not completely obvious, and it really got me thinking. My mind is in the weird paradox of simultaneously thinking that my life is worse off with After Effects while also I choose to use it because nothing else is available to do the job.

I wonder how many other people might also be in a similar position, perhaps frustrated by tools they use, but pushed to use them because the tools make their competitors more productive. It’s a prisoner’s dilemma situation (wiki).

50 years ago, nobody was complaining about the lack of After Effects software — they were fine with its absence — but today, we complain about its flaws. We’re “forced” to use it, either because our competitors in the space are using it or, in my case, because the mere knowledge of its existence makes work without it feel much slower, which itself is frustrating.

I honestly think the world would be a better place if After Effects didn’t exist and if knowledge of its absence also didn’t exist, nor a craving for something like it to exist. Or even better, the world would be a better place if After Effects was created with more design thought, making common workflows easy and pleasant, even at the expense of its variety of features.


As a software engineer, I’m one of the lucky few who are actually contributing to the development and design of these kinds of tools. So what kind of lessons can I learn from some of my frustrations with other tools, like the washing machine or After Effects?

For me personally, the lesson is that quality is far more important than quantity. I think that humankind will be better off with fewer tools and less productivity, in exchange for tools that are more pleasant to work with. It’s better if technological development is slower, maybe 20 years behind where it would otherwise be, but the technologies developed are robust, dependable, and a pleasure to use.

In some ways, this goes against modern business practice. In our capitalistic world, there’s a temporary advantage gained for using a tool that makes the business more productive, even if it makes its employees less happy — the benefit is negated when the competitors do the same thing, removing the competitive advantage of the productivity boost and driving down overall prices of the product (but leaving employees still unhappy with the status quo).

Likewise, on the other side, it’s advantageous for a software company to sell tools that deliver the maximum economic value to their users, not necessarily maximum pleasure and peace. Remember that automating the repetitive parts of a user’s job does not necessarily make their life better, it just changes the nature of their work (now their job is about running the automation software). If a new piece of software allows users to complete their work in half the time, they don’t take the other half of their time to sit in the park and watch the birds play in the fountain!

If you create a tool that delivers more productivity in a way that’s more frustrating to use than the less-productive way, the machine of capitalism forces people to use your more-productive tool (which may make you rich in the process!) at the expense of increasing net frustration in the world.


(Non-technical audiences may want to skip this section)

Microvium has parallels with After Effects, in that it’s a new tool that achieves something that was previously unachievable (it brings JavaScript to devices with only 16kB of flash). It boosts productivity significantly over alternatives such as EmbedVM.

So would it be ok if Microvium brought these benefits but was frustrating to use?

I think not. I’ve spent considerable effort in making Microvium as easy and pleasant to use as possible. For example:

  • I take great care in the full experience of integrating and using Microvium.
    • The one-liner install for node hosts, or the two-file copy-and-paste for C hosts (one h file for the interface, and one c file for the library).
    • The getting-started guide demonstrates this to the extreme, leading the user incrementally through all the concepts, building up to a full native host example.
    • The getting-started guide is even part of the regression tests — the automated test infrastructure literally extracts the example code from the getting-started markdown file and make sure it all still works. There’s little more frustrating than an outdated or incorrect introduction to a new tool.
  • I spent a great deal of effort making the FFI3 design super simple.
    • I hope that the way I designed it seems “obvious” and “intuitive” to people looking at it in hindsight — this is the way that designs should be. When you read through the getting-started guide, you probably don’t think at all about how good or bad the FFI system is. It’s hopefully invisible to the user, in the same way that a good washing machine should be invisible. It should just do its job, in the least-frustrating way possible.
    • Compare the FFI design of Microvium with that of Node.js for example, where one requires complicated binding files, additional configuration files (e.g. binding.gyp), the need to understand N-API vs NAN vs the V8 API, etc. In Microvium, you can export a JavaScript function to C code using a single line of JavaScript code and no configuration file is needed:
      vmExport(1234, sayHello);
  • Even for something like the module API for Microvium (for node.js hosts), which many people won’t encounter directly, I put a great deal of thought into making the model as conceptually simple as possible.
    • Compare to something like the Compartment API for JavaScript (under development), where the explanation of its module API necessitates the definition of 6 different kinds of “module specifier”, and the distinction between “resolving”, “loading”, “binding”, and “importing”. While Microvium does all of these, it doesn’t require the user to understand these concepts in order to use it.

This is to say that I’m practicing what I preach. However much I may fall short of the ideal, I put a lot of time and energy into making Microvium a pleasure to use and simple to understand, at the expense of not delivering as many features and it taking longer to develop.

I believe this was the right choice, and I’ll definitely apply it to other projects moving forward.

What are you optimizing for?

I’ll conclude by asking the question: what are you optimizing for?

If you are a manager or leader at a company, you may be inclined optimize the company to produce profits. You may even have a fiduciary duty to do so on behalf of your investors.

Sometimes the profit incentive is aligned with making good products: after all, happy customers are returning customers.

But clearly that’s not always the case — it’s common practice to release buggy software early, so you can iterate on it and use your users as a testbed to weed out the bugs. It’s common practice to “fail fast” in small startups — getting new ideas out the door quickly so you can establish their efficacy in the market, rather than spending significant time designing something that might be pleasant to use but nobody wants it because the idea was flawed. It’s common practice to aim for a first-mover advantage, especially in markets with a network effect, to trap people in your ecosystem before competitors (who may have a more pleasant product but were slower to get there) have a chance to penetrate the market.

Should fiduciary duty or duty as an employee transcend the betterment of mankind?

What goals are you optimizing your products for? What are you optimizing your company for? What are you optimizing your life for?

  1. Actually, my housemate’s washing machine 

  2. Human-machine interface 

  3. Foreign Function Interface — the way in which script code communicates host code and vice versa 

Be a multiplier

Be a multiplier

You may have heard the analogy that some software engineers add productivity, while some multiply productivity. Today I’d like to dig a little deeper into this and share my own thoughts.

What does it mean?

For those who haven’t heard the phrase before, let me try to unpack my understanding of it. Consider a tale of two programmers – let’s call them Alice and Bob, to pick two arbitrary names. Alice’s boss gives her a task to do: she is told to add a new thingamajig to the whatchamacallit code. She’s diligent, hardworking, and knows the programming language inside out. She’s had many years of experience, and especially knows how to add thingamajigs to whatchamacallits, because she’s done it many times before at this job and her last job. In fact, she was hired in part because of her extensive experience and deep knowledge with whatchamacallits, and at this company alone must have added over a hundred thingamajigs.

Because of her great skill and experience, she gives her boss an accurate time estimate: it’s going to take her one week. She knows this because she did almost exactly the same thing two weeks ago (as many times before), so she’s fully aware of the amount of work involved. She knows all the files to change, all the classes to reopen, and all the gotcha’s to watch out for.

One week later, she’s finished the task exactly on schedule. Satisfied with the new thingamajig, her boss is happy with the value she’s added to the system. Her boss is so grateful for hiring her, because she’s reliable, hard working, and an expert at what she’s doing.

Unfortunately for the company, Alice’s great experience gets her head-hunted by another company, where she’s offered a significantly higher salary and accepts immediately. The company mourns the loss of one of their greatest, who soon gets replaced by the new guy – Bob.

Bob is clearly wrong for the job by all standards, but some quirk of the job market and powers-that-be landed him up taking Alice’s place. He has no prior experience with whatchamacallits, let alone thingamajigs. And he doesn’t really know the programming language either (but he said he knows some-weird-list-processing-language-or-something-I-don’t-remember-what-he-said-exactly, and said that he’d catch on quickly). His new boss is very concerned and resents hiring him, but the choice was out of his hands.

On his first week, his boss asks him to add a thingamajig to the whatchamacallit code, as Alice had done many times. He asks Bob how long it will take, but Bob can’t give a solid answer – because he’s never done it before. It takes bob an abysmal 2 weeks just to figure out what thingamajigs are exactly, and why the business needs them. He keeps asking questions that seem completely unnecessary, digging into details that are completely irrelevant to the task. Then he goes to his boss and says it will take him 3 weeks to do it properly. “3 Weeks! OMG, what I have I done? Why did we hire this idiot”.

There’s not much to be done except swallow the bitter pill. “OK. 3 weeks”. It’s far too long. The customers are impatient. But, “oh well, what can you do?”

3 weeks later Bob is not finished. Why? Well again, he’s never done this before. He’s stressed. He’s missing deadlines in his first months on the job, and everyone’s frustrated with him. When all is said and done, and all the bugs are fixed, it takes him 2 months to get this done.

By now there is a backlog of 5 more thingamajigs to add. His boss is ready to fire him, but he optimistically dismisses the 2 months as a “learning curve”, and gives Bob another chance. “Please add these 5 thingamajigs. How long will it take you?”

Bob can’t give a solid answer. He swears it will be quicker, but can’t say how long.

The next day Bob is finished adding the 5 more thingamajigs. It took him 30 minutes to add each one, plus a few hours debugging some unexpected framework issues. What happened? What changed?

What happened is that the first 10 weeks that Bob was spending at his new job, he immediately noticed a big problem. There were 150 thingamajigs in the whatchamacallit codebase, and they all had a very similar pattern. They all changed a common set of files, with common information across each file. The whole process was not only repetitive, but prone to human error because of the amount of manual work required. Bob did the same thing he’s always done: he abstracted out the repetition, producing a new library that allows you just to define the minimal essence of each thingamajig, rather than having to know or remember all the parts that need to be changed manually.

To make things even better, another employee who was also adding thingamajigs, Charlie, can also use the same library and achieves similar results, also taking about 30 minutes to add one thingamajig. So now Charlie can actually handle the full load of thingamajig additions, leaving Bob to move on to other things.

Don’t do it again

The development of the new library took longer than expected, because Bob never done it before. This is the key: if you’ve done something before, and so you think you have an idea of the work involved in doing it again, this may be a “smell” – a hint that something is wrong. It should light a bulb in your mind: “If I’ve done this before, then maybe I should be abstracting it rather than doing almost the same thing again!”

You could say, in a way, that the best software engineers are the ones that have no idea what they’re doing or how long it will take. If they knew what they were doing, it means they’ve done it before. And if they’ve done it before then they’re almost by definition no longer doing it – because the best software engineers will stop repeating predictable tasks and instead get the machine to repeat it for them1.

Adding and Multiplying

In case you missed the link to adding and multiplying, let’s explore that further. Let’s assign a monetary value to the act of adding a thingamajig. As direct added value to the customer, let’s say the task is worth $10k, to pick a nice round number ($1k of that goes to Alice, and the rest goes to running expenses of the company, such as paying for advertising). Every time Alice completed the task, which took her a week, she added $10k of value. This means that Alice was adding productive value to the company at a rate of $250 per hour.

Now Bob doesn’t primarily add value by doing thingamajigs himself, but instead develops a system that reduces an otherwise 40 hour task to 30 minutes. After that, every time a thingamajig is added, by anyone, $10k of value is added in 30 minutes. Bob has multiplied the productivity of thingamajig-adders by 80 times. In a couple more weeks, Bob would be able to add more value to the company than Alice did during her entire career2.

Is it unrealistic?

The short answer is “no”. Although the numbers are made up, the world is full of productivity multipliers, and you could be one of them. Perhaps most multipliers don’t add 7900% value, but even a 20% value increase is a big difference worth striving for.

The laws of compound interest also apply here. If every week you increase 10 developers’ productivity by just 1%, then after 2 years you’d be adding the equivalent value of 6 extra developers’ work every day.

The alternative

What happens if Bob was never hired? Would the company crash?

Perhaps, but perhaps not. What might happen is that Microsoft, or some big open source community, would do the multiplying for you. They would release some fancy new framework that does thingamajigging even better than the way Bob did it, because they dedicate many more people to the task of developing the library. The company will take 5 years before they decide to start using the fancy new framework, in part because nobody on the team knew about it, and in part because they now have 250 thingamajigs to migrate and the expected risks are too high for management to accept. But in the end, most companies will catch on to new trends, even they lag behind and get trodden on by their competitors.

Final notes

In the real world, it’s hard to tell Alice from Bob. They’re probably working on completely different projects, or different parts of the same project, so they often can’t be directly compared.

From the outside it just looks like Bob is unreliable. He doesn’t finish things on time. A significant amount of his work is a failure, because he’s pushing the boundaries on the edge of what’s possible. The work that is a success contributes to other people’s success as much as his own, so he doesn’t appear any more productive relative to the team. He also isn’t missed when he leaves the company, because multiplication happens over time. When he leaves, all his previous multiplicative tools and frameworks are still effective, still echoing his past contributions to the company by multiplying other people’s work. Whereas when an adder leaves the company, things stop happening immediately.

Who do you want to be – an adder or a multiplier?

  1. This is not entirely true, since there is indeed some pattern when it comes to abstracting code to the next level, and those who have this mindset will be able to do it better. Tools that should come to mind are those such as the use of generics, template metaprogramming, proper macro programming, reflection, code generators, and domain specific languages 

  2. How much more do you think Bob should be paid for this? 

Sequences: Part 4 – Push and Pull

Sequences: Part 4 – Push and Pull

Welcome back! This week in our exploration of sequences we’re going to do something a little different. Up until now we’ve been looking at a single running example of a function that interacts with sequences, and progressively developing its interface and considering the consequences of each choice. But this week I’d like to delve a little deeper into the meaning of these strange concepts of push and pull, as they apply to sequences.

What is push and pull?

Wikipedia doesn’t talk directly about “push” and “pull” as coding patterns, but does talk about the related concept of a push technology, which it says is a

…style of Internet-based communication where the request for a given transaction is initiated by the publisher or central server.

There is also a corresponding Wikipedia entry for pull technology, where requests are initiated by the client.

So here it seems that these terms are, in a sense, used to say which side of communication interface has control, or initiates an action. If the server has control, and is the one to initiate or “cause” the transaction, then the server is pushing to the client.

We can apply the same principles to software design at a smaller scale. When your code calls a function, it could be said to be a client of that function. When your code uses an object, it could be said to be a client of that object. So our server-client relationship applies in these cases as well.


Consider the following simple piece of C code which might be used to print a sequence of values to the console:

for (int i = 0; i < numValues; i++) { 
    int value = produceValue();
    printf("Value %i: %i\n", i, value);

Loop Pull

This code uses the values from the sequence, so it’s a client of the sequence. It accesses items from the sequence by calling produceValue, which pulls the next item from the sequence. The pulled value is available to be used on the very next line of code, where in turn it’s pushed to the console output. Whatever is producing the sequences responds to the pull by providing the pulled values as they’re requested. This is the same iterator pattern that we’ve been looking at in past posts1.


What happens if the producer of the sequence needs to provide these values by pushing them to the consumer?  Now our consumer code might look like this:

int i;

void initConsumer() {
    i = 0;

void consumeValue(int value) {
    printf("Value %i: %i\n", i, value);


All the same elements of the code are here: the declaration of int i, the initialization of i to 0, printing the values to console by printf, and incrementing i after every value is consumed. The producer just has to call initConsumer before it starts producing values, and it has to call consumeValue every time it produces a new value. So here the producer is in control, and decides when new values are available.

Strangely enough, by writing the consumer in such a way that it doesn’t pull values, the producer code might look much like our original consumer code:

for (int i = 0; i < numValues; i++) { 
    consumeValue(i * 10); // produces values 0, 10, 20, 30, ...

It’s clearly much easier to write producer code that pushes, and consumer code that pulls. But these two things seem to be at odds with each other – either the producer or the consumer is “in charge”, and the other one loses out. Many sequence-manipulating functions both produce and consume sequences, since they have an input and an output. It’s much easier to write these kinds of functions if they can pull from a data source, and push to a data sink.

C++ seems to deal with this by dividing “sequence processors” broadly into 2 categories: on the one hand we have containers (and their corresponding iterators) which are happy to be both pushed-to and pulled-from. On the other hand we have algorithms and application-specific functions, which both push and pull from containers in order to get their job done. These application-specific functions can be written more simply because now they don’t have to worry about being pushed-to or pulled-from.

C++ Pattern

It’s as if the containers act as peace-makers, mediating between all these functions that all want to be in charge, that push and pull the data whenever they find it most convenient. But as I’ve mentioned before, containers almost always come at a cost, and it would be better if they weren’t the default solution to the problem of manipulating sequences.

Harmony between Push and Pull

But consider what happens if we take our first example, and replace getValue with the getchar from the stdio library:

for (int i = 0; i < numValues; i++) {
    int value = getchar();
    printf("%i\n", value);

This code instead reads the values from the console2. This is still a pull, since the code appears to be actively causing the value to be acquired, and just like before, the value is available to be used on the very next line of code.

But yet we also know where these values ultimately come from: they come when the user pushes keys on the keyboard (this is a “push” in two senses of the word). How can it be that both sides of the interface think that they initiated the transaction?

Push and Pull Keys

In many ways I think it’s equally true that both sides did really initiate the transfer. This example shows that it’s possible for the parties on both sides of the interface to feel like they’re both in complete control, calling the shots and deciding when the next value should appear. In this particular case what’s happening is that the thread is suspended or blocked until the value becomes available.

Let’s step out of reality for a moment, and imagine a strange world with different laws of physics and time. Imagine that I decide I want to talk to my friend, and after dialing his number on my phone, the phone freezes me in time, without forwarding the call to my friend. There I am, suspended in time in my room for days, not knowing that the world is continuing around me. A few weeks later, of his own accord, my friend also spontaneously decides it would be a good time to give me a call. After dialing my number, time unfreezes for me, and I hear his voice as if no time had passed at all for me. To me, I feel like I called him, but to him, he feels like he called me. In fact both are true: we both initiated the call.

This is very convenient for both my friend and myself, since we both waited till the perfect moment to call each other. Neither of us were interrupted by “receiving” the other person’s call (possibly while we were in an inconvenient state, such as in the shower or on another call).

The same is true in the world of code. Code which pushes and pulls is much easier to write than code which is pulled-from or pushed-to. State management is much easier, because code that is pushed-to has to be prepared to be interrupted (hear: race-conditions and re-entrancy problems).

Just like freezing time is difficult in the real world, multithreading in the software is not easy either (although it’s obviously much easier to suspend thread-time than to suspend physical time!). But lets take a look at another example, this time in C#:

static void Main()
    foreach (int value in Values())

static IEnumerable<int> Values()
    for (int i = 42; i < 52; i++)
        yield return i;

The example is simple. In our main function we have a loop that pulls values from a sequence of values returned from the Values function. The Values function which produces the data also has a loop, and pushes values the consumer (Main). Don’t let the keyword “return” deceive you into thinking that that the Values function is not pushing, since the act of returning is initiated by the called function, and the function progresses to the next line of code when the consumer comes back for the next value. The code would look pretty much the same if yield return i was instead produceValue(). Or to put it another way, the code would look horribly different if Values was actually just responding to pulls and not structured in such a way that it’s pushing.

Neither the Main function nor the Values function is more in control than the other – they are like a pair of coroutines, cooperatively yielding control to each other. Both are written as if they’re in control, which makes the code much simpler to write. And best of all, it does this without using any extra threads! This is the power of the generator – to be able to write functions  which produce values by pushing them, while their consumers are able to pull them. The best of both worlds.

  1. Except that the producer state is statically allocated, which may not be a good thing in general but it makes for a simpler example 

  2. Or other standard input 

Sequences: Part 3

Sequences: Part 3

I’m taking a bit longer than usual to write up new blog posts recently since I’m in the process of moving from San Diego to Melbourne, Australia. Hence the photo of the linked-list sequence of Koalas above1. Things should get back to normal in a couple of posts from now, and I’ll let you know how the move goes!


But enough about my life – you’re here because software is awesome! And together we’re exploring the best ways of working with sequences. Today we’re going to try to write a function2 that decodes UTF-8, without using the heap, and keeping the input “shape” the same as the output shape, so that multiple similar functions can be stacked together (something I touched on at the end of the previous post).

Let me quickly try to describe what I mean having the input “shape” match the the output. I’ll take last week’s example of having a “decompress” function which feeds bytes into our beloved “decodeUtf8” function, which feeds characters into a “parse” function. Often what we land up with something like the following situation:


That is, the “stack” of functions doesn’t fit together on it’s own. One function wants to push data to the next function (it wants to be in control), while the next function wants to pull from the previous (it wants to be in control instead). What we land up needing is something in between each layer of the stack. Something that doesn’t mind being pushed to and pulled from. Something doesn’t doesn’t take any control. This is normally a container, such as a list or buffer:


Each of our two attempts so far has manifested this problem slightly differently. In our first attempt, the function pulled bytes from an array, and pushed the resulting characters to an array. In that case the buffer was built into the function itself, so this push-pull conflict was absorbed, but its memory inefficiencies and lack of asynchrony were still an issue.

Our second attempt could be said to have rightfully had a “pull-pull” shape – as we want – since it pulled out of the input array, and the caller pulled characters from it one-at-a-time by calling it. The shape mismatch in that case was simply that the caller was forced to provide an array input, while the output was definitely not array. What was the output exactly?

This takes us into the land of iterators.


For reference, here’s last week’s function again.

// Takes null-terminated UTF-8 encoded string at `cursor`
// Returns first code character at the cursor
// Outputs cursor of the next character at `nextCursor`
wchar_t decodeUtf8_attempt2(const uint8_t* cursor, const uint8_t** nextCursor);

And let’s consider how we might use it:

const uint8_t* cursor = init_decodeUtf8(document);

const uint8_t* nextCursor;
wchar_t c = decodeUtf8_attempt2(cursor, &nextCursor);

while (c != 0)
    // Use c
    // ...

    cursor = nextCursor;
    c = decodeUtf8_attempt2(cursor, &nextCursor);

This code assumes that we have some function “init_decodeUtf8” that gives you the initial cursor state for some document. Notice that our code here doesn’t interact directly with the value of the cursor state, it only interacts with the functions init_decodeUtf8 and decodeUtf8_attempt2. This is intentional, and is done to encapsulate the state of the cursor. That is to say, the state of the cursor is managed only by those two functions, which limits the number of places in the code you need to consider when you think “what state can the cursor be in?”. Although in this case the encapsulation is manually enforced (we have to just “know” not to interact with the cursor state outside those two functions), if we upgrade our example to C++ we can get the compiler to enforce the encapsulation and data hiding:

class Utf8Decoder
    Utf8Decoder(Document document)
        cursor = init_decodeUtf8(document);

    wchar_t next()
        return decodeUtf8_attempt2(this->cursor, &this->nextCursor);

    const uint8_t* init_decodeUtf8(Document document)
        // ...

    wchar_t decodeUtf8_attempt2(const uint8_t* cursor, const uint8_t** nextCursor)
        // ...

    const uint8_t* cursor; 
    const uint8_t* nextCursor;

This C++ class has a public interface (aka “surface area”) that exposes two functions: the constructor to initialize the object state, and a “next” function to progress the state to the next element and retrieve that element. The class encapsulates the state of the cursor, and prevents clients of the class from accidentally modifying the cursor.3

This class is an iterator. It provides a way for users of the function/class to iterate through the output sequence of characters. I would say that we haven’t made it into an iterator by making the class, but that we’ve just revealed the true nature of the original decodeUtf8_attempt2 function: it always was an iterator.

For those who are familiar with C++, you’ll probably notice the similarity between the Utf8Decoder class and a standard C++ input iterator.

For those who aren’t that familiar with C++, you may notice the similarity with Java’s Iterator<T> (and corresponding Iterable<T>), or C#’s IEnumerator<T> (and corresponding IEnumerable<T>). These are each codifications of the pattern that we described in part 2.

Attempt 3

So, we said that we wanted to try get the input and output “shapes” to be the same. Since we’ve now said that the output surface area is an iterator, we can be more specific and say that we want the input to also be accessed via an iterator, rather than directly passing it a whole array of input bytes.

This is actually quite a challenging problem, and I’m going to first choose C# as my tool of choice to represent the solution:

IEnumerable<char> decodeUtf8_attempt3(IEnumerable<byte> bytes);

This is perhaps the most elegant solution we’ve encountered so far. It’s the most direct representation of the original problem statement, and works entirely using iterators. The input and output are the same “shape”, and it would be very easy to pipe the result of one function into another that accepts a sequence of that type.

Now let’s also take a look at how we might do this in C. What we want to do is abstract the input to decodeUtf8 function, so that while the input could be an array, but it could also be another iterator. We also want this function itself to be an iterator of the same shape. What about this:

typedef uint8_t ByteSourceFunc(void* state);

// state must be of type decodeUtf8_state
void init_decodeUtf8(void* state, ByteSourceFunc input, void* input_state);

// state must be of type decodeUtf8_state, initialized by init_decodeUtf8
wchar_t decodeUtf8_attempt3(void* state);

struct decodeUtf8_state
    ByteSourceFunc* input;
    void* input_state;

This is quite awful, and requires some explanation. Firstly, the decodeUtf8_attempt3 looks very much the same as it did in attempt 2. This new decodeUtf8 function is expected to yield a new character every time it’s called, the same as before. The significant difference is that now the cursor state isn’t statically typed (it just uses void* to represent “any type”), and that it holds some sort of abstracted state (the input_state field). State does have a runtime type, and for this to work the state must be of type decodeUtf8_state. Why is it typed void if it must be decodeUtf8_state? It’s because the caller of decodeUtf8_attemp3 doesn’t know that it’s calling this specific function, but instead could be calling any function that produces characters while maintaining state.

The input to the iterator is provided when we initialize it, by calling init_decodeUtf8. We tell it what state to initialize, and where it must get its input data from. It must get its input data from another iterator function, and that function itself requires some iterator state which decodeUtf8_attempt3 needs to provide, so we pass that in.

This is quite awful, and if it doesn’t make complete sense to you, don’t worry. The point is that it gets incredibly difficult to write code in C that has abstract dependencies. Not only is the abstraction apparent at runtime, since every byte needs to be read through an indirection function call accessing indirect state data, but it’s also just less readable and really hard to get right.

C++ is only marginally better. It provides standard containers with iterators, but this doesn’t solve the problem of chaining functions together since most functions that act on sequences must pull from an input iterator and push to an output iterator. Most often you then need to have a container as a buffer to be able to “fit” these functions together. This can be good, but if you’re operating under tight memory constraints or dealing with asynchronous data then this typical approach can be a problem4.

C++ also provides template programming, which could allow you to have an abstract iterator input to a function, without the runtime overhead. But this is not easy to do, and although I would always suggest having functions that depend on abstractions, I would never recommend writing all your functions using C++ template programming to get those abstractions.

C# provided a much better solution to the eye, although at runtime there are many similarities between our C implementation and the C# one. For example, both will be using indirect function calls, and both provide a level of runtime abstraction.

We may have run out of options on this one. The languages have just let us down. There seems to be no way to get the efficiency, abstraction, and syntactic simplicity in the same package.

But that’s not the end of our journey. This pull-pull pattern is only one answer to dealing with sequences. Next time, we’ll turn the problem on its head and consider how to deal with sequences that are asynchronous. That is, sequences where you can’t pull data from the source, but instead the source pushes data to your function. For example, when you’re processing data from a network stream, you don’t want to have to wait for all the data to be present before starting to operate on it.

  1. Which, by the way, I do not have rights to, and could not track down its original source. No copyright infringement is intended, so if the photo is yours, please let me know. 

  2. As before, we’re don’t care about the implementation of the function, but more about writing a good interface to the function 

  3. You’ll note that the class is a little bit more verbose than it needs to be, because I’ve intentionally kept the init_decodeUtf8 and decodeUtf8_attempt2 functions as similar as possible to the original forms to show the equivalence between the object orientated way of looking at it and the functional way of looking at it. 

  4. Newer versions of C++ may be starting to deal with these problems, but it still isn’t nearly as neat as it could be 

Sequences: Part 2

Sequences: Part 2

In this series we’re looking at different ways of designing interfaces that interact with sequences. To investigate different interface design choices we’re using an example function which decodes UTF-8 encoded text – one that consumes a sequence of bytes, and produces the corresponding sequence of Unicode characters. Last time we considered a very simple design where the function interface simply accepted a null-terminated, heap-allocated byte array as an input argument, and returned a null-terminated, heap-allocated character array as output. Here it is again for reference:

wchar_t* decodeUtf8_attempt1(const uint8_t* data);


Remember that we’re only looking at the interface of the function, since that’s the most important part when it comes to modularity and maintainability. Last week we considered some of the problems with the design of this function’s interface. One of things we said was a problem is that the output sequence is passed as a fully populated heap-allocated array. This meant that our function would probably have to use the heap, which would add inefficiencies and possibly duplicated code for a double-pass over the input data. It also raises the concern of pointer ownership, and coupling the function caller to unnecessary implementation details.

So let’s try again with our second attempt.

Attempt 2

What happens if, instead of returning the whole output sequence at once in an array1, we instead return the output sequence one element at a time. For example, we might do this:

// Takes null-terminated UTF-8 encoded string at `cursor`
// Returns first code character at the cursor
// Outputs cursor of the next character at `nextCursor`
wchar_t decodeUtf8_attempt2(const uint8_t* cursor, const uint8_t** nextCursor);

Again, since this is a language-agnostic investigation, I’d like to just clarify some points for those who might be a little rusty with C/C++. The double asterisk in const uint8_t** means that nextCursor is an output parameter2. Both consts still mean that the input data is unchanged by the function.

So the function essentially accepts one argument: a pointer to the first byte of the UTF-8 data we wish to decode. It returns two outputs: the Unicode character represented by a wchar_t, and a pointer const uint8_t*. To decode a whole document or stream of data we would call the function multiple times – once for each Unicode character.

Although this function has changed a little since now it returns only one character at a time, it hasn’t really changed in essence. The new function interface itself is still just a particular implementation of our overarching conceptual interface:

sequence of bytes -> sequence of Unicode code characters

That is to say, we can still think of it as a function that accepts a sequence of inputs and returns a sequence of outputs – because that was our original requirement and this function fulfills that requirement. The state of the iteration is now contained outside the function itself, which is why we have the extra parameter, but the function still manages that state (calculating the next cursor and moving through the input bytes).

For those of you who are unconvinced about the idea of it still returning a sequence when it appears to return only one item, consider how this function could be seen as a generator. Each time it’s called, it will generate the next item in the output sequence. The parameters it requires are simply for persisting state between generator calls, and could be seen as “private” to the generator.

We could say that the data representing the sequence is no longer associated with a sequence of contiguous memory, but is instead “stored” in a more mysterious form. Something like a chronological sequence of return values.

So, is it better?

This function now doesn’t need to do any heap allocation at all, which could improve its performance. It also alleviates the problem of pointer ownership for the returned sequence, since there is no pointer because there isn’t any heap allocation.

But now the function is called many more times for the same sequence. Will this be a problem? Well, function calls on their own aren’t a problem, since the optimizer can inline many calls that aren’t necessary. For example if the caller was indeed outputting directly into some container or array in a tight loop, then the optimizer might inline the whole decodeUtf8 function. Of course it might not, so it may be a consideration for you. But word on the street is that most modern compilers are probably better than us humans at figuring out when a call should be inlined, so I think of this as a win.

There’s also a nice separation of concerns with this implementation. Since the function doesn’t loop, the number of test cases required to verify its behavior is much smaller. If it operates correctly on one character, and sets up the state correctly for the next character, then by induction it must work correctly for all following characters in the sequence.

So, we’re done?

Nope. This second attempt is much better than the first. But it leaves a lot to be desired.

For one thing, the input sequence must still be represented by a contiguous block of memory, which gives similar problems to what we thought we just solved.

Another problem with the input being a solid block of memory, which may not be immediately evident, is that the input and output sequences use inconsistent representations. The output is pulled by the caller “on demand”, while the input must already be there and waiting for use. This would be a problem if we wanted to stack multiple such functions together.

What if the input bytes come from a decompression function, while the output characters go so some parser function?


Now we have a problem. Since the output of one function doesn’t match the representation of the input to the next function (assuming that each layer looks a lot like our decodeUtf8_attempt2), we will again need containers to act as buffers between the functions.

What we need is a way to get the input and output to use the same philosophy, but without forcing the implementation of the function to use the heap as in our first attempt. This is what we’ll be looking at next time.

  1. Or, in other languages, most other container types such as lists, queues vectors, etc 

  2. The exact details are more complicated if you aren’t familiar with pointers, and in different situations it will mean different things. 

Sequences: Part 1

Sequences: Part 1

The other day I had to write a function in C which decodes Unicode characters from a sequence of bytes in UTF-8.

UTF-8 is a variable length encoding, meaning that one character (known as a “code point”) can be represented by any number of bytes – usually one or two bytes. The exact format of UTF-8 is irrelevant for this series, but you can find more information on Wikipedia if you’re interested.

So what we want is something that takes a sequence of bytes, and produces a sequence of Unicode characters.

sequence of bytes -> sequence of Unicode code points

UTF-8 Sequence Mapping

But in this series I don’t want to talk specifically about UTF-8 sequences. The concepts I want to investigate with this example aren’t unique to UTF-8 decoding. Many of the problems we solve in software development involve writing code that transforms a sequence of one thing into a sequence of another. Whether its parsing objects from a file, querying objects from a database, or presenting data to the user. In fact most programs don’t do anything except process sequences: they accept a sequence of user input, and produce a sequence of reactions to those inputs.

So, sequences are important, and figuring out how to represent and manipulate them is also important. In this series I hope look at a few different ways of representing sequences, and consider the pros and cons of each. I’m not going to be looking at any one particular programming language, so the insights I uncover should help understand sequences in any language at a deeper level. I’m going to start by looking at C, because of its simplicity and lack of abstraction. Then I’ll consider other common languages such as C++ and C#. And I’ll finish off by pointing out the common flaws across all of these languages, and consider a new hypothetical language to address these problems.

Attempt 1

Let’s start with what is perhaps the most obvious implementation in C. The function must accept a sequence of bytes, and return a sequence of characters. In C, as in many languages, a common way of representing a sequence of something is with the use of an array. So let’s have a function that accepts an array as input, and produces an array as an output:

wchar_t* decodeUtf8_attempt1(const uint8_t* data);

Since this is a language-agnostic investigation, let me explain that uint8_t is, unsurprisingly, the type representing a byte. And wchar_t is one of the types that can be used to represent a character (in this case a Unicode code point). The const modifier is good practice, and in this case means that the data passed to the function is strictly input data, and that the function promises not to modify it.

The implementation for this function is too boring to show here, but would do something like this:

  1. Run through the input data, counting the number of Unicode characters
  2. Allocate the memory for the output array of characters, probably by calling the traditional C malloc function
  3. Run through the input data again, this time storing each output character in the output array
  4. Return the output array


We can see immediately that there are problems with this implementation. Perhaps the most striking is that we’re making two passes over the input data. In terms of performance, this might not be serious: it might might make an O(n) operation into an O(2n) operation, which is still an O(n) operation, and not much slower. On the other hand if the input data is large enough then the beginning of the input might not still be in the CPU cache by the time we make the second pass, and it could land up a lot slower.

A more important problem with making two passes over the input data is that we’re breaking the DRY principle. The second pass is very likely to repeat a lot of the code used in the first pass. It wouldn’t be difficult to accidentally predict a slightly different size on the first pass than you use on the second pass, and land up with some ugly memory bugs.

But actually, we can’t do much better an implementation of the function given this particular function interface (aka function “prototype”). We could perhaps have avoided the double pass over the input data if we’d chosen to provide a conservative over-estimate of the output array size by using the fact that there will never be more Unicode characters than bytes. But this would be a clever trick that wouldn’t apply to sequences in general, so it’s not relevant to our investigation. Or we could perhaps have avoided the double-pass if we were willing to incur extra overhead of resizing the output array as we encountered more characters while parsing the input. And indeed this might have been a preferable way to go since it avoids some duplicate code, although it’s debatable whether it would be more efficient.

Heap, Memory and Ownership

Another issue with this implementation/interface is that we’re allocating memory at all. The heap is slow by most definitions. If you’re in a multithreaded environment the heap is generally also a shared resource, causing thread locking, CPU multi-core cache invalidation, and a variety of other effects that would put the breaks on our program.

Once the heap space is allocated, somebody needs to free it. There are the usual problems with this: added risk forgetting to free memory results in a leak, or freeing it twice or prematurely and having difficult-to-track-down bugs in the program. But there’s also a more subtle issue: since the function abnegates ownership of the result-array to the caller, the caller is coupled to the implementation of the decoder function. That is to say, the decoder function chose to implement the output as a heap-allocated array, but now the caller also has to be aware of this implementation detail because it needs to also access the heap to free the array. This coupling would make it difficult to change the implementation down the line to use some other dynamic storage, such as a memory pool.


But there’s another memory-related problem that’s also pertinent to our investigation that’s hiding in our function interface: both the input and the output sequences are completely “populated” before they’re passed across the interface boundary. Or to put it another way: the entire input sequence is passed to the decoder function at once, and it returns the entire output sequence back to the caller at once.

It would be a terrible thing if all sequences were manifested like this. Imagine if, in your favorite desktop application, it were necessary for you to perform all the clicks and key presses before you ran the program (so that the whole sequence of input events is “populated” before the program is invoked), and then execute the program on those clicks and presses, and it in turn skips through all the GUI sequences that correspond to those clicks and presses. Some things clearly work better if you can provide input in a streaming fashion, rather than a batched fashion. We’ll talk more about streaming implementations later in the series.

The Good Parts

So there are lots of problems with this naïve implementation. But there are also some good things, which may not be obvious amongst all the flaws I’ve highlighted.

For one it’s a very simple interface and implementation, and simplicity is often severely underestimated. It’s very easy to understand how to use it and what it’s doing. Whoever is maintaining it doesn’t need to understand any complicated patterns or language features.

Another benefit that may not be immediately apparent is that the implementation decides exactly when and where it wants to read from the input (I call this “pull“, since the function actively pulls elements out of the input array), and exactly when and where it wants to write to the output array (I call this “push“, since the function actively pushes into the output array). This makes for a simpler and more maintainable implementation.

Next time I’m going to look at another alternative function interface – one which accepts a “push” input rather than a pull one – and we’ll compare this with our first solution.

Change Engineering

Change Engineering

In software design, we’re often obsessed with perfecting the qualities software has right now, or the qualities of how the software should be under ideal conditions in the future. We don’t often focus on designing changes to software – we mostly focus on the start and end points.

Imagine different components of a software system dancing together – each step of the dance is a change to the component. In an ideal situation, when software development is complete the dance stops and players remain perfectly still in their final state of interlock. No change is required, because the software is “finished”.

But no software is like this. When a software project first commences the dance floor is empty. Gradually new players/dancers are introduced to the dance floor. Dancer’s positions and poses are change to accommodate each other. Bug fixes and refactorings change the position or pose of dancers that needed fixing.

But by today’s software design practices this dance is nothing but a crowd of players, each one rigid in his/her shape. We go to great lengths to fix solid the interface between the players (components) – their interlocking posture – and then we hope never to change it again. Instead of a dance, this is like a weird game of twister. Players have to hold their position as steadily as possible. It’s considered a design flaw if changes to one player cause changes to another player – this is software coupling.

Let me emphasize this point: todays design practices emphasize immobility and decoupling between components of the system. This is very valid way of dealing with the dancing problem: limit the movement of the dancers so that they don’t run into each other. Give them each enough space so that when they do need to move a little they won’t influence others around them.

But we know all this comes at enormous cost. The interfaces have to be clearly defined and agreed upon – never to be changed – so they better be correct right at the beginning. We have to use complicated interfacing patterns – facades, indirection, visitors, injection. No longer can we just call function foo. We now have to add foo to the interface, and inject the concrete object through the construct, saving it until we need to call it, and then call it through an adapter because it didn’t quite have the right structure1.

So these principles solve half the problem: slow down the game so that it’s easier to keep everyone separate. But there’s another solution which also deserves attention: formalize methods of change. That is, formalize the dance moves.

For example, here’s a situation I’ve encountered. Let’s say a program X calls a web service Y. So X and Y hold a single dance position with each other. Now lets say we need to add a new parameter to the service. What moves can we make?

  • We can simply change X and Y, and deploy them at the same time.
  • We can change Y first to accept the new parameter, but not require the new parameter, and deploy Y. Then we can change X to provide the new argument, and deploy it
  • We can change X first to provide the new argument, and deploy X. Then we can change Y to use the argument, and deploy Y.
  • We can create a new service Z which supports the new parameter, and then update X to use Z.

The order of footwork is important: we can’t change X to require that Y have the parameter if it doesn’t yet. We can’t change Y to require the new argument, if X isn’t supplying it yet. We can’t deploy X to require the new service Z, if the new service hasn’t been implemented yet. There are a number of steps in this part of the dance, and the order is important.

The above example is simple, and could just be solved by reasoning it out, but you can see it could get complicated quite quickly. If some data had to be passed back from the service as well, then the moves become more intricate.

Do any of these dance moves have a name? Not that I know of (but correct me if I’m wrong). We don’t have names for them, and we also don’t really have a standard notation to describe the moves. And what about tools, and what about static analysis? What if the development tools could give a compilation error if we make an illegal dance move?2

This doesn’t just apply to distributed systems. It applies equally to self contained programs, or within modules and components themselves. For one thing, things don’t get developed atomically. A developer can only write one line of code at a time, and each new line of code is a small dance move. If the move is illegal, you might get a compilation error or perhaps your unit tests fail. It’s much better to be able to develop in small increments, rather than having to go a week with failing tests. I’ve certainly had times like that: where I started a “small” refactoring but it was weeks before the code even compiled again. Instead of engineering a sequence of small legal moves, I made one multi-week leap – flying dangerously through the air. This is the same thing that I’ve seen happen with producing new major versions of software: it takes months or years to develop the new major version from scratch, and only then can it be introduced to the dance floor with the customers. It would be better to engineer a sequence of small changes to get from the old version to the new one, even if the new one has a completely different architecture.

This is also important for collaboration, since software teams don’t make changes atomically either. There are dependencies between different people’s development, and order is often important. Every time you commit code to the common repository you’re making a dance move with your team mates, and you don’t want to tread on their feet. A formal discipline of change engineering would make it easier to coordinate this dance without having to resort to tons of scaffolding or indirection code.

Coupling can be good

I’d like to make one last point and say that coupling can be a good thing. The reason is perhaps obvious: the more connected two pieces of code are, the easier it is to directly access information between them. All the rules we have for decoupling generally add layers of indirection, which in turn makes it harder to see what’s really going on (the runtime structure looks less and less like the code structure), and harder to add new functionality because you have to weave new pieces of data through all the decoupling layers and interfaces. The “god-class” anti-pattern is, practically speaking, a great pattern to use if you have a program that’s only a few hundred lines of code.

The apparent problem with coupling is that solidifies the program. You can’t change one thing because it requires a change to something else. Each coupling link is an iron beam that connects the two pieces: when one moves, so does the other. Enough iron beans and the whole system becomes completely rigid. You don’t want to change anything because you would risk breaking everything else.

Although decoupling addresses this issue, so does dancing. That is, if we can define formal rules for changing components that are coupled together, we are expanding the number of coupled pieces that we can handle in the system. We’re replacing the iron beams with hinges, and allowing the dancers to interact directly with each other without being rigid.


In software engineering and design, there is perhaps too much emphasis on the fixed state of code. You generally design software with a single endpoint in mind, rather than the sequence of atomic changes required to get there without breaking the system at each checkpoint. These are almost independent of each other, and there should be a new discipline for change engineering, so that we can build more robust systems, minimize risk, and reduce code. For large systems there should perhaps even be dedicated engineers focusing on the problem of change design, formally engineering the sequences needed to move the software towards long-term goals of the company, rather than letting the dancers to wonder undirected into a dead-end of rotted software before starting again on a new version.

  1. Don’t think to hard about this example. The point is that it’s normally a long process to do something that should be simple 

  2. Don’t underestimate the power of this possibility