Microvium: updated memory model

Microvium: updated memory model

TL;DR Microvium can now address up to 64 kB of ROM, up from 32 kB previously, and now runs more efficiently on small 32-bit devices such as an ARM Cortex MCU.

This is a minor update regarding the data model in Microvium (previous post here), for the kind of person who’s interested in this kind of thing.

Recap: Microvium uses 16-bit slots

As covered in the previous post, Microvium uses a 16-bit slot size — variables in Microvium are 16 bits each. This is unusual among the small embedded JavaScript engines. Cesanta’s mJS and Elk use a 64-bit slot size with NaN-boxing, and Moddable’s XS uses a 128-bit slot size (4x32bit words). So, if you have a variable with the number 42 in it, it will take 2 bytes in Microvium, 8 bytes in mJS, and 16 bytes in XS (on the other hand, the number 42.5 is a float and will take 12 bytes in Microvium since it overflows the slot into the heap, but it will still take 8 bytes in mJS and 16 bytes in XS).

Pointers and Paged Memory

If the lowest bit in the slot is 0, Microvium treats the value as a pointer to heap memory.

Since heap memory in Microvium is always 2-byte aligned, the lowest bit of a pointer will always be 0, so the value in the slot exactly corresponds to the pointer value, at least on 16-bit systems (or on 8-bit systems with a 16-bit address bus).

But what about 32-bit systems? Microvium is optimized for devices with 64 kB or less of RAM, but many devices with 64 kB of RAM are actually 32-bit devices. In these devices, there is usually a 32-bit address space, and some sub-range of these addresses will map to physical RAM (and some other range of addresses will map to physical ROM). Even though only 16 bits of information are required to index every byte of RAM, pointers are still 32 bits on these devices.

Previously, Microvium worked fine on 32-bit and 64-bit devices (I do all the testing on my 64-bit PC) but it did so through an expensive mapping table that mapped 16-bit VM addresses to their 32-bit or 64-bit native counterparts (like virtual memory implemented in software). The mapping itself was still O(1) in many cases1, but it involved a function call every time a pointer needed to be mapped, which is a massive overhead to incur. I wasn’t too worried about this because I wasn’t aiming for 32-bit devices as my initial audience, but with the number of 32-bit devices out there, this would quickly become a problem.

To support this kind of scenario more elegantly, Microvium now has a port macro definition that allows you to specify the upper 16-bits of a 32-bit pointer (or upper 48-bits of a 64-bit pointer) for the platform2.

For a more concrete example, the Arduino Nano 33 IOT has up to 32kB of SRAM, starting at the address 0x20000000. So the upper 16-bits of a real pointer into RAM will always be 0x2000. Here is a snippet of the data sheet that shows this:

So, in the Microvium port file, you can now specify the high bits of a 32-bit pointer to be 0x2000, and Microvium will interpret all pointers by simply indexing into the given memory page.

In compiled ARM machine code, the conversion from a 16-bit slot value to 32-bit pointer is just one or two instructions3! This is a significant performance improvement over how it worked before, and makes pointer access in Microvium almost as efficient as native pointer access.

I didn’t actually make this change for performance reasons. I did it because it makes the development of the Microvium engine much easier.

  • On my Windows machine where I develop, I can now use VirtualAlloc to pre-allocate a single 64 kB “page” of memory where the high bits of a pointer are always 0x5555, and run Microvium in just this region of memory. So if I see the Microvium value 0x002A, I know instantly that corresponds to the address 0x5555002A.
  • The addresses are consistent across runs, so when I note down an address in my notebook while debugging, I know it will be the same if I restart the program.
  • I can also have a memory view open in the debugger and it remains consistent across runs and shows all the VM memory in one place.

64 kB ROM

Previously, if the lower 2 bits of a slot were 01b then the value was considered a pointer into ROM after shifting right by 1 bit, giving us a 15-bit address space for ROM, and requiring ROM to be 2-bit aligned to keep the remaining 0 bit.

Now, the slot value is considered a pointer into ROM after zeroing the bottom 2 bits. This doesn’t change the performance, but it means that we can now address up to 64 kB of ROM.

A side effect is that ROM must now be 4-byte aligned since the lower 2 bits of ROM pointers must be zero. This means extra padding sometimes, but I’ve found that the ROM overhead doesn’t grow substantially with this change.

Why did I do this?

  1. Debugging. Previously, if I saw the slot value 0x2A1 while stepping through the engine, I have to bring out a calculator to see that 0x2A1 corresponds to the bytecode address 0x150. Now, the value 0x2A1 corresponds to the bytecode address 0x2A0, which is much easier to follow.
  2. I was irked by the fact that I couldn’t just say that “Microvium supports up to 64kB of memory”. I previously had to qualify it every time — “it supports 64kB of RAM, and 64kB of snapshot image, but only up to 32kB of ROM”. Now I can just say “supports up to 64kB of memory” with no added asterisks or qualifications, since it supports 64 kB of each type of memory it uses. This is part of simplifying the mental model of Microvium, since Microvium is all about simplicity.

  1. The performance was actually O(n) where n is the number of memory blocks, but since Microvium uses a compacting garbage collector, memory is consolidated into a single block periodically, making it O(1) for most pointers most of the time. 

  2. More accurately, you can specify any native address as the origin of the VM address space, but I found it easier to explain this in terms of the high bits. 

  3. In the full ARM instruction set, the conversion is a single 32-bit instruction. In the ARM Thumb instruction set, it takes two 16-bit instructions. The specific number 0x20000000 is relevant here because it’s a power of 2 which is more efficient. 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.