Monthly Archives: July 2014

Debugging Embedded Code

Embedded microcontroller

At first I didn’t think this would be something useful to blog about. But the more people I speak to, the more I realize that I might be one of the few who has my perspective on how to debug embedded microcontroller/firmware code on small devices1. So let me share my “secret formula” with you.

Problems

For those who don’t traditionally write embedded microcontroller applications but are still following along for interest, let me quickly highlight some of key problems we face debugging embedded code.

Problem 1: Hardware

The most obvious problem with developing embedded code is that these microchips are generally embedded into a piece of hardware, such as your washing machine, and the code’s primary purpose is often closely related to controlling the hardware. You can often use what is called an in-circuit debugger to actually connect your IDE/debugger to the firmware program running on the embedded device2.

But lets say you’re writing code for an x-ray machine, and lets say that it needs to pulse the x-ray cathode for 10 ms. You do not want to accidentally put a breakpoint during those 10 ms. The breakpoint will suspend the execution of the firmware, but will generally leave the hardware in whatever state it was in – which may be a state you really don’t want it to be in for an extended period of time3.

There are other reasons why hardware can be a problem for debugging, such as when the program outputs physical voltages but does it incorrectly. You may need to start resorting to using a multimeter or oscilloscope just to see what the program is doing. You can’t add a software “debug watch” to a piece of hardware.

Problem 2: Download time

Another problem with debugging is the time it takes to cycle through your modify-build-execute-debug process, because it has an extra “download” phase. In other words, every time you make a change, you have to download the compiled executable to the embedded device. This can take anywhere from a few seconds to minutes, depending on the size of the program. It seems a common pattern for developers to do something like this:

  1. Compile program
  2. Download program
  3. Execute in debug mode
  4. Try put it into the problem state
  5. See the problem (perhaps by a breakpoint and watches)
  6. Go back to your code
  7. Add diagnostic printf statements or more breakpoints
  8. Go back to step 1

I’ll call this the compile-debug cycle, and it can take minutes to get through it each time.

Problem 3: Non-Standard Compilers

Another problem, which perhaps isn’t quite as obvious, is that is seems that an overwhelming number of compilers targeted at embedded microcontrollers do not conform to the C or C++ standards. They often only support a subset of standard C/C++, and they include many extensions for things which are difficult or impossible to do in standard C. This can be a good thing: how do you declare an interrupt routine in standard C? But for debugging code it can cause problems. In the same way that hardware dependent code forces you debug code on the physical hardware, compiler dependent code forces you to debug with the specific debugger provided with the compiler. Why is this a problem? I’ll get to that in a moment.

Solution

Microchip solution!

So how do I solve these problems?

The answer is stupidly simple. Perhaps even insulting. It is simply:

Don’t write embedded code

Avoid writing code that accesses hardware, or that needs to be downloaded to run, or that uses non-standard extensions of C/C++. You could almost say, “Don’t write embedded code at all”. Embedded code has all of these problems, so just don’t do it!

Instead, write platform-independent code. Use all the modern techniques of dependency injection, unit testing, etc. Get your code working in isolation first, in an environment where your compile-debug cycle is in the order of seconds rather than minutes. Get it working on your local PC! This eliminates step 2 – “Download program” – from the above compile-debug cycle, and makes everything easy.

Also, ideally you should be debugging unit tests, not your fully integrated software. This eliminates step 4 in the above compile-debug cycle, because your unit tests automatically generate the problem state. If they don’t, then you may need to write more tests.

The unit tests for the module you’re working on should automatically run at the click of a button or keyboard shortcut, and should be integrated into the IDE so you don’t have to leave your code to run tests. Personally, I divide my code into small enough modules that my unit tests for the module-under-development can compile and run in under a second, and causes the IDE to jump immediately to the first failing test assertion.

Doing it this way you can easily shorten the compile-debug cycle overhead by a factor of 100 (“overhead” meaning tasks not related to finding or fixing the actual problem). If you also write the code in such a way that you can debug-step backwards then you can also reduce the number of times you need to repeat the compile-debug cycle.

But, Hardware!

Of course, you do need to access hardware at some point. And you probably need to use compiler pragmas and intrinsics to squeeze out the most performance or do unusual things. But you can isolate these cases, and confine them to very thin layers on which your main application code runs. For example, instead of having application-level code directly access an IO port, it can access it through a function which is implemented by an ultra-thin HAL (hardware abstraction layer). The HAL should do as little as possible, and its interface should be pure C (or C++). When I say “hardware”, I also mean other platform specific dependencies, such as a compiler quirks, third-party libraries, and the OS.

The rest of your application code should be platform independent. Don’t use features that aren’t standard C/C++. Obviously only use the subset of the standard that’s actually supported by your compiler. Don’t rely on unspecified “features” of the language, such as the size of a pointer, the result of signed integer overflow, the endianness of an integer, or the alignment of fields in structure (But you weren’t anyway, right?).

Use dependency injection to inject the HAL into your application code4. This is just generally good advice for any program. And how else would you do unit testing? For different build configurations you can inject mocks and interactive simulators just as easily as the HAL on your release build.

Remember that every line of HAL code might take 10 times longer to develop, so you really want to minimize how much HAL there is.

Conclusion

So, the solution to the long debug-compile cycle with embedded code is simply to avoid “embedded” code as much as possible. I do this, and it really works. I can go weeks developing code for a microcontroller without ever powering up the physical device, and then take a few hours or so to integrate it into the final compiled application. The result is that I can use all the modern techniques and practices to develop high quality firmware in a fraction of the time.

Why don’t many embedded developers seem to do this5? I really don’t know. But perhaps its related to the fact that developing for such limited systems is a lot like developing for the desktop computers of 20 years ago, before object orientated patterns and unit testing were really around. Or perhaps it’s because C and C++ make it very difficult to do zero-cost dependency injection. Perhaps its just because the embedded microcontroller industry is much slower to move, since it targets a much smaller audience.

Whatever the reason, it doesn’t have to be that way, and hopefully I’ve given you a few ideas of how to accelerate your development process. If you have any other ideas or questions, feel free to leave them in the comments below.


  1. I’m talking about anything that’s too small to run a proper operating systems like Linux – something in the order of kilobytes of RAM 

  2. Again, I’m talking about quite low level programs here. If the “firmware” is running a full OS, such as Linux, then it may have its own debugging tools such a remote GDB server 

  3. I’m not saying you would use a breakpoint at all in a real x-ray machine, but the point applies to other scenarios 

  4. I’m not going to cover how to do dependency injection for this type of application, but techniques you could consider are: using the linker to “inject” a dependency C API or mock layer into your application; using CRTP to inject dependencies into C++ code statically; and of course plain old virtual classes and/or function pointers 

  5. I may be generalizing. But realize again that I’m referring to embedded firmware programs, perhaps running in sub 100 kB of RAM, and not smartphone or embedded Linux development 

Moonwalk debugging

How many times have you run into this situation:

You’ve just finished a large section of code. You launch your program in the debugger. You wait for it to start up, and then try it out a bit. But then… it crashes. Your debugger eagerly jumps up tell you there’s a problem. You look through the call stack and all the program variables, and you find something strange. A value that’s just clearly wrong. Why is it 25 and not 5? Why didn’t it remove the item from the list correctly? Why is it not sorted? You look through the code to try figure out the problem, but your conclusion is that, really, it should just be right. Your code looks fine – why did it get the wrong answer?  The only way to find out is to go back in time and breakpoint it at the point where it should have been generating the correct answer, but didn’t. So you set some breakpoints, and restart the whole program. You go through all the steps to reproduce the problem again (if you can even remember them). Finally you hit the breakpoint. You step slowly through your code, and finally find the problem. Or more likely, you find some other strange thing that may have lead to the first strange thing, and have to repeat the process. Or in haste you accidentally step over the most important line of code instead of stepping into it, and have to start again.

This used to happen to me all the time. In fact, it used to be my primary method of debugging: a binary search through the code, each time rerunning it to reproduce the problem. Wouldn’t it be easier to just step backwards through the code? The program breaks when it enters some invalid state. By the law of causality, the bug must have been in code that was before the point where the strange state occurred. Stepping forwards seems like the least useful thing you can do.

Nowadays I don’t have this problem as much, and I’ll tell you the secret: Write programs in a pure functional style. Write pure functions. That is, write functions that always return the same value if called with the same arguments, and that don’t have side effects that change the state of the system when the function is called. Write code that doesn’t mutate state.

Let me give you a real example. Recently I’ve been developing a graphics library for an embedded microcontroller. The previous graphics system was stateful: you had a mutable image buffer, and then you called different subroutines, such as drawLine, to affect a change to the pixels in the image buffer. I was tasked with creating a new graphics system – in part because we now needed to render graphics to an area many times larger than the size of all the microcontroller’s RAM.

The design I produced is almost completely stateless. For example, instead of calling a rotate function to rotate a graphic, I call a rotated function , which returns a completely new graphic object representing a rotated version of the original. Did you catch that? The word “rotate” is imperative: it commands the machine to do something. The result of what it does is known its side effect1. A function like this would probably return void, or a result code to confirm its success, and the rotation would be visible as a change in the original image.

On the other hand, the word “rotated” is, in this case, an adjective2. It describes not what the function does, but what its result is. If the argument is a picture of a dog, then the result is a rotated picture of a dog. It always will be. Every time. A rotated(dog) is always a rotated dog. The unrotated dog is still there, and still unchanged.

Then there are similar functions for creating a shifted graphic from an existing one, a composite graphic from other graphics, and a line, a polygon etc. As a quick implementation note: you may have guessed that these graphic functions can’t possibly return objects with all the pixels of the graphic they represent. The way this is resolved is with laziness – the graphic objects are only implemented as lightweight descriptions of what the resulting graphic should look like relative to their input. When it comes time to produce actual pixels, there is some short-circuiting magic that combines all the transformations into one ultra-efficient transformation, and then produces the pixels on-demand with almost no buffering required3.

Moonwalk the debugger

MoonwalkThe great thing is that doing things this way plays really well with the debugger. These techniques work in other environments, but I’m specifically going to talk about my experience in Visual Studio. When your program stops at a breakpoint or exception, you can rewind the stack one frame at a time (in Visual Studio 2013 this is Shift+F11), and you can rewind within a particular frame by simply jumping directly to that line of code (“Set next statement”, or Ctrl+Shift+F10). If your program is stateful, this is sure to break something. Typically when rerunning the same code twice you might not expect it to do the same thing the second time because the conditions may have changed since the first time it, and so the program may be in a different state, and thus produce different results. This isn’t a problem at all if you use immutable data structures and your functions have no state or side effects. Now you can step backwards! 

In addition to being able to step backwards, you can also evaluate arbitrary expressions in the debugger. For example, I have a function like saveGraphic(graphic, filename), which takes a graphic and saves it to a bitmap file. This is technically not a side-effect-free function since it writes to file, but for the purposes of debugging it has no side-effect which modify the state of the program in any observable way, and so is “pure” for practical purposes. In Visual  Studio, I can add a watch expression like saveGraphic(dog, "debug.bmp"), and the debugger will happily execute the function. Now I can see the value of my immutable graphic object! Since rendering doesn’t mutate program state, I can use “pure” functions like this to visualize any complex objects. I can even evaluate transformation functions in the watch window, such as saveGraphic(rotated(dog), "debug.bmp"), knowing confidently that my program state hasn’t changed as a result of evaluating the watch.

I hope you enjoyed this post, and learned something new that will help you with your bug battles, and challenge the way you think about coding.


  1. Whether or not a function’s “effect” is its primary purpose, it is still technically known as a “side effect” 

  2. I believe the term is deverbal adjective 

  3. I would probably have normally used matrices for this, but this microcontroller has no integer multiply or divide instructions, and no floating point anything

Finish before you Start

I find a lot of the coding I do is completely different, unlike anything I’ve done before. The reason is that if it’s anything like something I’ve done before then I’ve failed to recognize the pattern and abstract it out. As a software engineer I want to never be doing repetitive tasks, because that’s what software is for (repeating tasks). If you’re repeating yourself in any way then you’re not fully leveraging your time. Every pattern, every similarity, every “oh, I know how to do this because I did something similar with xyz” should be under careful scrutiny.

After a certain point you may run up against the limitations of a programming language, since many design patterns are simply language features that are missing. I come across this all the time in embedded C programming, where a “linked list” is not a container but the design pattern of having a next pointer in a structure (and all the algorithms that go with it), and a “virtual class” is not a data type but the pattern having a vtable pointer at the beginning of a structure.

At the point where the language betrays you, your next port of call could be writing your own domain specific language to target the problems you’re working on. In C# this may include the use of reflection to interpret existing C# data structures in new ways (such as interpreting a class as a list of serializable members), using expression trees to interpret C# expressions in new ways, or using reflection.emit to dynamically generate bytecode.

But despite the huge advantages of automating the mundane programming tasks and patterns, the problem arises: if you never repeat yourself, then everything is new. Everything is uncharted territory1. How do you know how long it will take? How do you know if it will work? Will it really be useful?

The solution is simple: finish before you start. Do whatever it takes to get the results of your experimentation before you actually perform the experiment. In software this means developing proof of concepts. Hack the result together using whatever means necessary. Spare no time for optimization or good coding technique. Don’t worry about corner cases and details. Get to the finish line by cutting across the sports field. Get to the finish line by cheating. By redefining the race. By faking the results. Calculate them manually. Hand-draw the GUI. Fake the input data. Brute-force the algorithm. Anything it takes.

It helps to clearly define the interface of whatever you’re coding: How will it interact with the rest of the system? If it’s a web page, the interface is the GUI. If it’s a library, the interface is the publicly exposed API. If it’s a DSL, the interface is a syntax or spec. Once the interface is there, it doesn’t matter if implementation is a birds-nest of cheating and killing. All that matters is that the interface offers just – just – enough to give you a little hint of what the future will look like. What it will feel like? Get a tangible product you can play with2.

Once you’ve seen the end, then you can start. A clearer picture of where you’re going will help not only in producing a better product or code, but it might change your direction completely. Once your fortune is told, it may bring news of triumph, or of peril. Perhaps when you see the end you realize it actually isn’t so great there. It isn’t as easy to use as you thought. It’s more complicated than anticipated. Perhaps the whole principle is flawed in some horrible way. There are many things that are hard to anticipate without getting a closer look. Or a hands-on experience. Perhaps you need to reconsider some of your assumptions, or abandon the whole idea altogether.

It’s better to know before you start.


  1. This is an exaggeration and oversimplification 

  2. I use the word “tangible” very loosely. A visual interface is “tangible” if you can see it, and a library interface is tangible if you can call it, etc.