Month: February 2016

Why JavaScript

Why JavaScript

People who have known me for a while, and long-time readers of this blog, will know that I never used to be a fan dynamically typed languages. The static type system available in C# and C++ can help to prevent many bugs and runtime errors, and allow the programmer to add extra clarity to the code.

But in recent years, I’ve come to acquire a taste for JavaScript. So much so that I’m actually on a mission to write a JavaScript compiler for embedded firmware, so that I can bring the bliss of JavaScript to the nightmare of embedded programming.

For those who don’t already know, JavaScript is a completely different language to Java. Please, do not use the word Java to refer to JavaScript, or vice-versa. They are as unrelated to each other as C++ is to Java. I know this is stated many times, but I’m expecting that some of the readers of this post are coming from a purely C/C++ background, and might not be aware of this fact.

Many people have written about the pros and cons of JavaScript, so what I’m saying here is not unique, but I’m going to approach it from the perspective of an embedded firmware programmer, who typically works in C or C++ on a daily basis. Some of what I say will also apply to those of you in other arenas.

Familiarity

One thing that JavaScript has going for it, is that the syntax and structure will be familiar to most embedded programmers (and many other programmers). It uses curly-braces to denote blocks of code. Function calls are done using parentheses with comma-separated arguments, as in foo(1,2,3). Most of the C/C++ operators work as expected, such as the logical operators && and ||, and the bitwise operators & and |.

These may seem like obvious things, if you’re from a background of C/C++, C#, or Java, or some related language. But bear in mind that not all languages are like this, and so the familiarity of JavaScript eases the learning curve associated when coming to JavaScript from languages like C/C++. Compare these two snippets of code, one in C, and the other in JavaScript, that implement the Fizz-Buzz challenge:

If you’re a C programmer, then hopefully the above similarity appeals to you.

Simplicity

Perhaps the number one thing I like about JavaScript is simplicity. With JavaScript you can often write very clean and readable code, because you’re focusing on the algorithm and behavior, rather than adding bloat associated with types and ABI’s and memory concerns.

In C++, you’re writing code to appease three different rulers: the type system, the machine, and the reader/maintainer. Most C++ code seems to be a compromise between these, but in JavaScript your code can focus on the reader/maintainer, and let the compiler worry about the machine. You worry more about clearly describing, in code, the behavior of the program, rather than worrying about how the compiler/interpreter will implement that behavior in terms of instructions and memory access, or worrying about constructing elaborate types to describe the meta-rules about what the program is allowed to do.

As a thumb-suck estimate, I probably spend at least half my time in C++ trying to appease the static type checker. And I would say that 90% of that work is spent on false-positives – cases where the type checker notes an inconsistency, but where the same code code in a dynamically typed language would not have had a bug in it. Inconsistencies in the type system do not always equate to real program inconsistencies.

At the risk of making this post far too long, I’m going to give you an example, albeit somewhat contrived. Let’s say that we have a binary tree structure: a node in the tree is either a leaf-node with a value, or an internal node with no value but with left and right subnodes/children. Now let’s say that we want a function that flattens the tree, returning an ordered sequence of only the leaf values.

In JavaScript

In JavaScript, I can imagine a function that looks like this:

It uses a stack to iterate the tree. I could have made a much more succinct solution using generators and recursion, but I’m appealing to those of you who are coming from strictly imperative, procedural programming backgrounds, so that’s why I chose this approach.

In C

Let’s write the equivalent code in C.

Firstly, a node can be either a single value, or branches into two subnodes. How do we represent this?

Here is one way. A node can be either a leaf value or an internal node, and to tell the difference, we probably need some kind of flag (or tag). It also needs something to store the contents of the node, which is a union between the two different options:

Side note: the style of the above code might not be what you’re familiar with. Or or it might be. I don’t know because C (and C++) doesn’t come with a standard style, which is one of the ways in which I think JavaScript is better: there is a single generally-accepted style to writing JavaScript code.

The contents of an internal node is that it has a left and right sub-node:

Hmm. There’s another question here that’s unanswered. Who owns these nodes? Does the tree own them? Could there be multiple trees that share the same nodes? Is it even the same between one tree and the next, or within all the nodes of a tree? It’s not specified here, but it’s yet another “bloaty” detail to figure out.

We said that a leaf node is “a value”. But what value exactly? Is it an integer? Another struct? Is the value type the same for all nodes in a tree? Should the memory for the value embedded into the node, or should the node point to the value? If it points to the value, then who owns it? If it’s embedded, then how big is it, and are there any rules we have to follow when copying it or moving it to different locations in memory (is it pointed to by anything else, or has ownership of anything else)? So many questions. So many details that aren’t relevant to the problem at hand.

One way here is just to say that a leaf node has a pointer to the value, and that we don’t know anything further about what type it is:

I could save on some of the bloat by using an anonymous union, but I’d argue that not a whole is gained in terms of simplifying real complexity.

How much time have we wasted so far? Let’s recap what we’ve done:

  • We’ve defined one possible implementation of a tree node
  • We’ve coupled the implementation to issues of memory layout, such as whether nodes point to their children and values or have them embedded in the same memory.
  • We’ve opened many cans of worms regarding ownership, style, the type of contents, etc.
  • We haven’t even started writing the actual function yet.

Now for the actual function (oh dear, this is going to be a long post). We need a way to pass the tree to the function, and a way to retrieve the list back. We’ve already defined a type for the tree, there are other questions to be answered when it comes to passing it to the function:

  • Should the root node be passed by value, or by pointer?
  • Is the function be expected to mutate, or free the tree passed to it? For example, can the output list reuse the memory from the input tree?
  • Should the tree be passed using a shared global variable, or an actual parameter?1

There are similar questions about getting the output from the function, with all the same concerns that we’ve already discussed about representing trees. Should the result be a contiguous array or a linked list? Should it be allocated by the caller or callee? Should it be a shared global variable? And any number of other considerations.

I’m going to try my hand at a direct implementation in C, trying to sidestep issues of memory allocation by having everything passed to the function itself:

This code makes me cringe.

It doesn’t really match the spec, because it has a hard limit on how deep a tree can be. To get around that we would need some dynamic memory allocation, which would add a whole lot more bloat and complexity. This code also requires that the caller have some idea of the maximum size of the resulting list, which may or may not be easy to know.

The complexity is ridiculous. How many times do you need to look at *list++ = node->leafContents  before you can be sure that you’re incrementing the pointer to a list of pointers, and not incrementing the pointer in the list. Maybe we need to add some more bloat to encapsulate these: more functions for managing the list so we only have to write that kind of code once. Don’t even get me started!

So let’s see how well our beloved type system did. I’m going to compile the above code and see what errors come up.

Here are the list of mistakes I made. I’m categorizing each as either true-positive (the compile error saved my skin), false-positive (using a dynamic type system I would not have had a runtime  bug), or false-negative (I found a bug by looking actually the compiler didn’t catch it).

  • I forgot to #include <stdbool.h>  … arguably a false-positive, since stdbool is only needed if you have static types.
  • I forgot to include stdio.h. True-positive: I forgot to include a module that was relevant to the program behavior.
  • In creating the example tree,  the line tree[0].internalNodeContents.left = tree[1] , I was missing an & sign. I’m going to say this is also a false-positive. I was assigning one value to another, and the fact that one value is typed by pointer and the other by value is not a concern related to the algorithm or code logic.
  • To pop a value off the stack, I used stack[stackSize--] instead of stack[--stackSize]. This is a false-negative. The type system did bugger-all to protect me from accessing invalid memory. Luckily the hardware caught it and issued a seg-fault, but on embedded hardware you aren’t always so lucky! What’s more is that code that caused the issue is unrelated to the algorithm that the function was supposed to be implementing. In a sense, it’s the implementation of a completely different algorithm (the algorithm for pop stacks). So the bug in the code was not just not-noticed by the C compiler, but it was in a real sense caused by the limitations of the C language.
  • In printf("%d", list[i]), I was logically printing an integer to the console, since the list is a list of integers, but actually the integers are physically stored as references (pointers), so it should have been printf("%d", *((int*)list[i])). Pretty, ain’t it? This is a false-negative. There was a bug, but the type checker failed to find it. Instead it just printed out crap to the console. On GCC with the default settings2, there was no warning about this.
  • I’m not returning a “success” code, or checking the return code when the function is called. This caused no error in this case, but might cause strange behavior if there was something that did check the result error code, or a case where the error code was necessary (a failure). I’d call this a true-negative in this particular case. The function acts unexpectedly, but doesn’t explicitly say otherwise so actually there’s no spec that it’s defying. What’s more is that it doesn’t introduce a bug into this particular program.

So how does that compare with JavaScript?

Well, what happened when I ran the JavaScript program? Exactly what I expected to happen. It ran. It output the correct result. No errors. No bugs.

This is not because I’m an expert in JavaScript. I have many more years’ experience in C than JavaScript. It’s because simple, bloat-free code is easy to reason about, and so less likely to contain bugs.

Conclusion: please use JavaScript instead of C. You will have fewer bugs because your code is simpler. It will also cost less to develop because there are fewer concerns to worry about, and it will be easier to maintain because the code is clear and easy to understand.

In C++

I’m not going to implement the above in C++, but instead I’m going to say, in a hand-wavy way, that I don’t think it’s much better. In C++, you could write something that looks similar to the JavaScript version, using a stack from the STL to implement the stack variable. But the problem with this is similar to the problem with C: the implementations are coupled to the machine in a way that means when you bring in your favourite container, you’re forcing the compiler’s hand when it comes to implementing the code in terms of machine instructions and memory. The result is essentially bloat in a different kind of way. It get’s messy, and to make a solution that is as generic as the JavaScript one would require a ton of code, and with it a ton of bugs.

That’s all I’m going to say for the moment. If you come from the land of C++ and want to hear my opinion in more detail, leave a comment or send me an email, and perhaps I’ll make another post about it. This one is well long enough that I should be moving on to my last point.

Safety

The above C example leads me to another great thing about JavaScript: safety.  What I mean by safety (or lack thereof) is:

  • How easy is it to introduce bugs?
  • How bad are the bugs, and how difficult are they to fix?

C is awful in this respect. The simple, real, bug in the above code where I dereferenced a pointer that wasn’t assigned, leaves the program open to the most hideous kinds of erroneous behaviors – those that are non-deterministic, and can affect anything in the program. Once you’ve crossed the line of undefined behavior, things in completely unrelated parts of your program can start failing for no apparent reason, not matter how well you wrote them. This is not just a door for bugs, but also for malicious attackers.

In JavaScript , there is no such thing as undefined behavior. There are a few things that are implementation-defined, meaning that different JavaScript engines will execute them differently, but there is nothing like C or C++’s undefined behavior, where a line of code can have literally any effect, including damaging the data or even functions in unrelated parts of the program. When you want behavior to be well-defined, use JavaScript instead of C/C++.

JavaScript is also a very good abstraction. Programs execute in isolation from the rest of the system, which is great if you have safety-critical or security-critical applications which need to guarantee some sort of behavior.

Conclusion

I could go on and on about the benefits of JavaScript, and perhaps I will in later episodes, but for the moment I hope that in this extraordinarily long post I’ve convinced you that there is some hope to JavaScript, even to the point of using it in embedded firmware development.


  1. Most people would say using a parameter is preferable, but as I’ve said before: in C you’re appeasing multiple gods. The choice of whether to use a global variable or a parameter is not just about what is easier to reason about or better for code reuse, it’s also about the function ABI and the machine instructions generated. 

  2. and the -std=c99 flag