Compilers, Part 2: The Back End

Last month, we began investigating how compilers actually work. Our look "under the hood" started with the front end of the compiler -- the phases that parse and tokenize the source file, verify syntax and semantics (the rules of the programming language), and translate the source code into an intermediate representation. Figure One shows a overview of the entire process. This month, we pick up the discussion at step five, "Intermediate Code Optimization".

Compile fig1.o
Figure One: The compilation process

Last month, we began investigating how compilers actually work. Our look “under the hood” started with the front end of the compiler — the phases that parse and tokenize the source file, verify syntax and semantics (the rules of the programming language), and translate the source code into an intermediate representation. Figure One shows a overview of the entire process. This month, we pick up the discussion at step five, “Intermediate Code Optimization”.

Why Optimize?

Before we continue, let’s discuss why code needs to be optimized at all. Take a look at the program in Figure Two. If we emit the code as-is, the program would assign 5 to j every time through the loop, which is completely unnecessary and costly. With a smart compiler, the redundant instruction j = 5 would be optimized away to produce code that runs more efficiently.

Figure Two: Sample code with room for improvement

void foo () {
int i, j = 5;
for (i = 0; i < 1000; i++)
j = 5;

Intermediate Code Optimization

As we discussed last month, the front end of the compiler transforms source code into an intermediate form. The intermediate form reflects the intent of the original source code, but is otherwise generic — a kind of pseudo-code. All optimizations are made to this pseudo-code, but the end-result is the same: your program runs more efficiently.

In the examples that follow, the intermediate form we’ll use is very much like C with two notable exceptions: it has no high-level looping constructs and no more than one operation may be performed in a single assignment. We’ll allow gotos and labels but no for, while, or do loops. You can assign the result of a calculation to a variable, but the calculation must be simple. For example, a pointer dereference is allowed, taking the address of a variable is allowed, addition is allowed, etc. Examples of calculations that are not allowed are a = b + c + d, and a = *(b + 4).

We want to simplify the intermediate representation as much as possible to decompose calculations into their most basic operations. Also, this kind of intermediate representation closely mirrors the instruction sets implemented on most processors.

To make this more concrete, let’s take a look at how the code in Figure Two would be represented in intermediate form. Figure Three shows the translated code. Notice that the for loop was boiled away by manually initializing and incrementing i, as well as turning the loop into an if statement with gotos and labels. As another example, let’s take a look at how we’d write a = b + c + d in intermediate form:

Figure Three: Sample intermediate representation for Figure Two

foo: j = 5
i = 0
label: if (i >= 1000) goto done
j = 5
i = i + 1
goto label

t1 = b + c
a = t1 + d

We introduce a new temporary variable, t1, to help perform an intermediate calculation. For the a = *(b + 4) example, the intermediate form would look like this:

t1 = b + 4
a = *t1

The next question is: What does the compiler do with the intermediate form to optimize it? Before the compiler can optimize, it must split up the intermediate code into a control flow graph. A control flow graph is a graph of instructions that represents all possible execution paths through the code. Let’s take a look at building a control flow graph by using this simple code example:

int i;
if (i) foo ();
else bar ();
exit ();

Compile fig3.o
Figure Four: A flow control graph

The control flow graph for this code is in Figure Four.

There are two possible paths through this code: the path followed if i evaluates to true and the path followed if i is false. Each node in the control flow graph is called a basic block. Code is divided into basic blocks such that if the first statement in a block is executed, all the statements in that block are executed. You can think of a basic block as the smallest unit of division in a control flow graph. Typically, each basic block keeps track of the blocks that follow it and the blocks that precede it; hence, traversals of the control flow graph are simple.

Let’s take a look at a few code optimizations given an intermediate form and a control flow graph.

Dead Code Elimination and Decidability

One of the simplest optimizations is dead code elimination which simply removes any instructions that can never be executed, thus reducing the size of the executable. While dead code elimination might not seem to affect performance, it’s possible that fewer bytes will yield better use of the processor’s cache and increased overall performance.

Dead code can occur in many forms and some are easier for the compiler to find than others. In fact, some dead code is impossible to find at compile time. Let’s take a look at some of the different ways that source can lead to dead code.

Figure Five shows a trivial example of dead code. Notice that the assignment j = 7 cannot be executed because the goto statement that immediately precedes it jumps over it. You can easily determine that the statement is dead code, but how does the compiler do it? The compiler starts at the first basic block in the control flow graph and performs a depth-first traversal. Each block traversed is marked to indicate that the code has been reached (at least once). Any block never reached is left unmarked and is thus dead code.

Figure Five: Dead code

void foo () {
int i, j = 5;
for (i = 0; i < 1000; i++)
j = 5;
goto label;
j = 7;
label: return;

Figure Six shows a more complicated example. Since bar() always returns 0, the if statement in foo() always evaluates to false and the program prints “Hello World!” every time. Here, a sophisticated compiler may be able to determine that the “true” path is dead. But what if the bar() function was located somewhere else? If the compiler doesn’t have access to the code for bar(), it must assume that bar() can return either 1 or some other value. In this case, the compiler cannot eliminate either branch of the if statement.

Figure Six: More dead code

int bar () {
return 0;

void foo () {
if (bar () == 1)
printf (“Never will print!\n”);
printf (“Hello world!\n”);

In general, deciding whether a condition will evaluate to true or false is undecidable. This means it’s sometimes impossible to determine whether a certain piece of code will be executed. In those cases, the compiler must make conservative assumptions (e.g., bar() could return anything) even if they aren’t true. A compiler should never introduce errors while optimizing — the program must behave exactly the same after any optimizations are applied to it.

Common Subexpression Elimination

Another optimization is common subexpression elimination. If you use the same calculation more than once in your code, the compiler can simply perform the calculation once and use the result in multiple places. Figure Seven shows an example of code that can have common subexpressions eliminated. Notice that a + b is calculated three times. Since the values of a and b can’t change between the calculations, the result is the same every time. Therefore, the compiler can convert the original code to the modified code shown in Figure Eight. Notice that in the optimized code, the calculation is performed once and the result is used everywhere that it is needed.

Figure Seven: Code with common subexpressions

int foo () {
int a, b;
scanf (“%d”, &a);
scanf (“%d”, &b);
printf (“a + b = %d\n”, a + b);
bar (a + b);
return a + b;

Figure Eight: Code with common subexpressions removed

int foo () {
int a, b, temp;
scanf (“%d”, &a);
scanf (“%d”, &b);
temp = a + b;
printf (“a + b = %d\n”, temp);
bar (temp);
return temp;

A common expression can be eliminated only if its result does not change no matter what execution path leads up to that point in the program. The exact algorithm that a compiler uses to eliminate common subexpressions is beyond the scope of this article (and is, in fact, best left for graduate-level compiler courses).

Other Optimizations

We’ve just barely scratched the surface of compiler optimizations. However, there are a few other optimizations worth mentioning at this time.

Compilers can perform strength reduction to change multiplications into additions. For example, imagine you have a loop index, i, which is always used to calculate j, with a statement, j = i * 4. Knowing that i will increase by 1 on each iteration of the loop, the compiler can figure out that j will increase by 4 on each iteration of the loop and simply change the statement to j = j + 4 (assuming that the compiler inserts the code to initialize j correctly).

Then there’s copy propagation, which propagates the values of variables throughout the code. For example, if your code contains the statement x = 0 followed by a few uses of x (but no assignments to the variable x), the compiler can replace each occurrence of x with 0.

Another thing compilers can do is detect when code is irrelevant to the body of a loop, such as the example shown in Figure Three (pg. 45). This kind of optimization is called loop invariant code motion. A good optimizing compiler would transform the code in Figure Three by removing the assignment statement in the loop and converting the code into:

int i, j = 5;
for (i = 0; i < 1000; i++) ;

You may be familiar with the inline keyword in C++ that automatically inserts the code from a function into its caller. inline saves the overhead of calling the function. The compiler can automatically inline other short functions (whether or not the programmer provided the inline keyword). By inlining functions, the compiler may be able to combine the code from the function with other code from the caller to optimize the function further (through common subexpression elimination, etc.).

Another technique that can induce further optimization is loop unrolling. Instead of emitting a loop, the compiler may generate separate code for multiple iterations of the loop to potentially allow cross-iteration optimizations.

Both inlining and loop unrolling may cause the size of code to grow and it may be inappropriate to inline all functions and unroll all loops. The compiler must be well tuned to effectively utilize these optimizations.

Next: Targeting The Machine

This month, we’ve seen some of the machine-independent optimizations that a compiler can perform on intermediate code to boost performance and conserve memory. Next month, we’ll conclude our discussion of compilers by looking at object code generation and object code optimizations — the final two steps in compilation that target specific hardware. Until then, happy hacking!

Benjamin Chelf is an author and engineer at CodeSourcery, LCC. He can be reached at chelf@codesourcery.com.

Comments are closed.