What is a Variable?
The other day I was thinking about how variables should be represented in a language. This got me down the strange path of wondering, “What, actually, is a variable?”
Most people get taught the concept of a variable very early on in programming. The analogy I hear used a lot is that of a box, or pigeon hole – it is a placeholder in which values can be put. Only a single value can be put in the placeholder at a time, so new values overwrite older values.
int x; // placeholder x
x = 5; // x has a value of 5
x = 6; // x has a value of 6
print(x);
Accessing a variable is accessing the value that was most recently put in it.
When looking at a low level language like C or C++, we can literally think of variables as memory locations. We can use the syntax &x
to find out the memory location. We could literally open up our computer, and if we had a sufficiently good microscope (and one which can detect electric charges), and some crazy surgery tools, and a really good map, we could actually see the electrons in this memory location that make up the value. Changing the value of the variable will literally change the charges of the electrons or currents making that physical memory location. Of course, in a modern computer this isn’t quite as simple as I’ve made it sound – but certainly in simple embedded microcontrollers this is literally the case.
So a variable is a memory location?
The problem I have, is that I think a program can be fully defined in terms of input and output. That is, a program can be defined by how it interacts with its environment. A memory location is, generally speaking, unobservable by the “outside” – it is neither an input nor an output. Typically when a program changes a variable, I (the user outside the program) can’t tell that it changed. When you download photoshop, it’s description doesn’t say “Photoshop is a program that changes the following variables…”.
So a program is defined by its input and output, and a variable is not input nor output. So a variable is not part of the program definition. So what is it then?
A variable is a language feature. It is a part of the special language that we use to describe the program definition to the compiler.
Consider the following program:
void main()
{
int x;
x = 5;
x = 6;
printf("x: %i\n", x); // Outputs "x: 6"
}
I would say that the program definition is to output “x: 6”. Whether or not the compiler uses a memory location to do this is irrelevant – in fact it probably shouldn’t. The statement x = 6
is not part of the definition, but rather a description to the compiler to “remember for next time, that x is 6”.
So here’s a question: what happens to the statement x = 5
? Is it optimized out?
No. I think it’s incorrect to think that x = 5
is optimized out. It’s better to say that x = 5
is always there in the code, but has no influence on definition of the program behavior and so won’t affect the outputted binary. x = 5
is never in the specification to start with. The description of the behavior (the code) says the program must output that last assigned value of x
to the console. The last assigned value of x
is 6
, so “6” must be outputted to the console. The value 5
doesn’t even feature here. It doesn’t contribute to the program description, and so is as good as a blank line.
In this case there is no program input, so we’re only looking at the program output. The compiler doesn’t even need to consider x = 5
when it decides what program must do. For the sake of making the compiler itself easier to implement, it may indeed originally consider x = 5
as setting a memory location, and then optimized it out later. But this is a detail of how the compiler is implemented, and it need not be that way at all.
Why does it matter?
To the average programmer, this distinction shouldn’t matter at all. Whether a variable is a physical memory location, or whether a variable is a tool for a describing program behavior to the compiler, makes no difference most of the time.
To me it makes a difference in how you think about programs. It makes a difference about how upset you are when the compiler doesn’t generate exactly what you expected, or it “optimizes” out something that you intended it not to. These cases are not the compiler being stupid because it’s not doing the code you wrote, it’s the compiler being intelligent and implementing your description exactly as you put it. It’s actually all the other times that the compiler is being “stupid” – when it can’t think of how to implement your program except by applying the very basic fall-back rules like “make a memory location for each variable”.
This is more and more important as compilers get more and more “intelligent”. It’s also important when thinking about the design of new programming languages. One needs to think about what language features to provide that make it easiest for the for compiler to be intelligent. Variables are not one of these features – it’s very difficult to reason about the behavior of a program by looking at how variables change, and very easy to instead dumbly map variables to memory locations.