Month: December 2013

Navigation by Compile Error

Navigation by Compile Error

Where is the mistake in this code:

void myPrint(int x) {
    printf("%s", x);
}

The problem is that `x` is declared as an integer, but used as if it were a string. But my question was, where is the mistake? Is the mistake that `x` is declared as an integer, or is it a mistake that `x` is used as a string? A compiler would probably chose to highlight line 2 as the problem (possibly as a warning in this case).

Compile errors are usually conflicts between two different assertions in the code. “Undefined field”? – it’s a conflict between using a field and not declaring it. You can either solve it by adding the field, if you intended it to be there, or by not using it, perhaps if you intended to use a different field.

The problem is that compilers almost always only error on one side of the conflict. This is sensible in some ways, but inconvenient in other ways.

I have a habit of using compile errors as a means of searching through code. If I need an extra parameter in the function I’m working on, I simply add it. The compiler will automatically “find” all the places where the function is called with one less value than it should be, and takes me to those places to modify the code. In other words, I’m relying on the compiler to tell me where I need to change the code.

And there’s the problem with only getting an error on one side of a conflict: it only takes you to where you want to go half the time. If I want to add an argument to a function I’m calling, the error will be on the code I’ve just written rather than on the function declaration I need to add the parameter to. Likewise if I use an undefined field or function of an object, it would be nice if the error took me to the class declaration so I can change the class.

Yes, there are ambiguities in deciding where exactly the other side of the conflict is. Is it the base class? Which overload of the function do we go to? Where would the variable be declared? But I think a simple approach would be for the compiler to present a few of the top candidates and list them along side the error. The programmer could click to go to the code location, and perhaps the IDE would even perform the obvious action at the same time, like adding in the parameter or field at that location.

Would you find this useful?

Rethinking goto

Rethinking goto

Most programming languages have support for an if-statement1:

Untitled1

The if-statement is a branch which splits two directions: one direction for `true` and the other direction for `false`2. The code after the if-statement is executed regardless of which branch was taken, so in effect the two branches come together after the if-statement3.

Most of the languages I’ve encountered also define some variant of a switch-statement, which is the concept of an if-statement except generalized to any number of branches:

Untitled2

Just like the if-statement, control returns to a single path after the switch-statement.

But what about this example:

Untitled3

So far as I know, there is no language construct which can directly represent this idea, except that of `goto`.

Here is another, slightly more complicated, example:

Untitled4

It’s difficult to come up with a simple program that follows this flow without repeating condition checks or other some other unnecessary overhead. Again, the answer may be to use a `goto` in this case.

But practically, when would you ever run into a complicated branching construct that doesn’t fit so well into `if` or `switch`? Well, lets look at this often-required-but-difficult-to-implement flow:

ErrorBranch

I’ll call this the “error-branch statement”, because I don’t know of a better term for it. A sequence of operations needs to be performed, and each may fail to common error branch. If we say that a switch-statement generalizes an if-statement to have multiple alternative branches with one condition, the error-branch statement generalizes an if-statement to have multiple conditions with a single alternative “else” branch.

A try-catch-statement could be used to model this behavior, but there are many situations where exceptions are more trouble than they’re worth. This is particularly true in code embedded on a microcontroller that needs to run indefinitely, which is the type of code I deal with quite often. And an error-branch-statement would probably be a more idiomatic fit into a language like C than exception handling would.

Here’s another type of error handling branch I would find handy in low-level language like C:

ErrorBranch2

I broke convention here and changed physical orientation of the error handling flow path because it fits better. This is the same case as the error-branch statement before, except each operation requires different cleanup code if it fails. For example, if operation 1 is to open a file, cleanup 1 might be to close the file again, etc.

In C# this type of branching is achieved by a `using` statement and exceptions. In C++ this type of behavior is achieved using RAII and exceptions. In C there is no direct way of representing this, despite the fact that I’ve seen this pattern more often in C than any other language. In C you have to settle: either goto, nested ifs, function calls, error state variables, etc. None of them is quite the right fit, and the code reads as if it were more complicated than it really is.

What about the other flowcharts I presented earlier? Why can’t we represent those using structured-programming constructs? Well, you can’t define a new branching statement for every possible flowchart, so instead we need a generic mechanism to handle arbitrary cases. In C this is the `goto` statement: it allows us to define any arbitrary control paths for anything we want to.

Next time I’m going to dig deeper by discussing an alternative to `goto` which I’ve seen used quite often, for better or worse.


  1. I’m using a basic flowchart representation 

  2. Obviously the else/false branch may be empty if it hasn’t been specified 

  3. I’m ignoring exceptions and returns and other such things in this basic flow 

Dynamic scoping is better

Dynamic scoping is better

When I first discovered that there was a thing called “dynamic scope”, I was amazed that there was actually another way of doing “scope” other than lexical scope. Part of the amazement was the thought, “why would anyone want to do it that way, it seems so error prone!”. Apparently others agreed, because you don’t see dynamic scope much in modern languages anymore.

For those who don’t know what dynamic scoping is, let me give a quick demonstration in a fictitious language:

void foo()
{
    string x = "Hello World!";
    bar();
}

void bar()
{
    print(x); // Prints "Hello World!" if called from foo
}

In the above example, x is in scope in bar not because bar knows about foo but because bar is called by foo at runtime.

At first glance dynamic scoping seems terrible. It reminds me of programming in assembly, where a called routine may refer to the “variable” EAX, whose value is dynamically determined by the most recent time EAX was set – be it the caller code or anywhere else. Surely this is a bad thing?

Wikipedia points out that,

Dynamic scoping also voids all the benefits of referential transparency. As such, dynamic scoping can be dangerous and few modern languages use it

In other words, we can’t really tell what a function will evaluate to because it might use variables we didn’t pass to it, whose values depend on where the function is currently being called from.

As a side note, I think Wikipedia is wrong in this case [1]. The reason is simple: we could just as easily pass the dynamic environment into each function as an argument. Arguments are the bridge between lexical and dynamic scope: the parameter value is determined dynamically based on the context of the function’s execution rather than its lexical context, but its identifier is determined lexically by the parameter name. So functions accessing dynamically scoped variables can still be referentially transparent if the dynamic context is considered to be syntactic sugar for an additional parameter that holds the dynamic environment.

That technicality aside though, dynamically scoped variables are still quite different to lexical scoping in practice. For example a function can access a variable that is multiple frames down in the stack, and it could be quite difficult to predict what value a particular variable might have just by looking at the program.

The flip side

I think dynamically scoped variables can actually be a good thing.

Here’s the reason: global variables.

Global variables are bad [2]. At least that’s what we’ve been taught, and I partially agree.

But what happens when we literally need a single instance of something?

One example is a program. We only need one program in our program [3], and the solution is that we normally need a globally defined main function. How would you write a program in C# or Java if they didn’t have a way to say public static main?

Maybe main is a stupid example of a global variable. What about std::cout in C++? In this case a program has a console interface, and cout is a global variable that represents that interface. One console = one cout. What a pain it would be if cout wasn’t global. We would have void main(ostream& cout...).

In other words, there are places where a global variable just makes sense. These are cases where there really is just one instance of something (or any fixed number of something). All the non-member functions in the C and C++ standard libraries are examples of this. We have one heap, so we have one global malloc function. We have one operating system, so we have one global fopen function.

I can easily extend these principles to functions and classes I’ve written myself. I have a program that accesses one database, reads from one config file, interacts with one GUI. I think all of these qualify just as easily as correct places to use global variables.

Except…

What happens when there are two?

What happens when a piece of program code normally writes to the console, and now I want it to write to a log file? What happens if it uses a native GUI, and now I want to use a web interface, or have multiple client GUIs? What happens if it connects to a database, but now in my unit tests I want it to connect to a mock data source? In these cases, hard-coding access to cout or to myDatabase seems like a bad design decision.

Enter: Dynamic scoping

I think dynamic scoping is a perfect fit here, and I probably don’t even need to explain my reasons because I’m sure you can see exactly where I’m going with this. You can have whole chunks of your program just “assume” that there’s a thing called a “database” without caring where it comes from.

Using dynamic scoping has a decoupling effect. Functions right at the top of the stack can access environments near the bottom of the stack without coupling to anything in between. Nothing is coupled directly to global variables or functions. If you want to override the heap and define your own malloc functions for a particular branch of the program, it’s as simple as pie.

It could reduce code. For an entire module that’s intended for for data manipulation it makes no sense to manually “thread” the database variable as a parameter through the call tree – it’s just work for nothing. The developers implicitly know there is a database accessible from any function in the module, so why can’t the language support that?

It’s an ideal replacement for global variables. In any situation where a global variable might have been a good choice, a dynamically scoped variable will be a better one, and probably better than dependency injection with lexical scoping in these cases. In fact why not just define the whole program as having a “root dynamic scope” which contains all the “global variables” – then any function can override global variables at any time.

Final Note

I have a few final things to say. One is that dynamic scoping doesn’t mean dynamic typing. I think it’s entirely reasonable to have a static type checker enforce the type bindings of a dynamic scope environment. A function has certain environment requirements, and it implicitly adopts the requirements of the functions it calls. The environment requirements become implicitly part of the published type signature of the function.

Finally, let me say that I don’t intend a language to only support dynamic scoping. That would be crazy – lexical scoping is much more sensible 99% of the time. Dynamic scoping is better than global variables, and it’s better than lexical scoping in some situations.

I really think dynamic scoping has promise, and we shouldn’t throw the baby out with the bath water. Yes it’s dangerous, but don’t take the chainsaw away from the lumberjack.

Have you ever worked in a language with dynamic scoping? I’d be interested to hear what your experience is and if you agree with what I’m saying.

[1] I won’t change Wikipedia because I don’t have any evidence of this, it’s just my opinion.

[2] And by extension, static fields are also bad. And since we can use functions as variables (eg function pointers or delegates), static member functions are also bad, and non-member functions are equally bad since they’re just globally scoped function values.

[3] Oh, I didn’t know that