Compilers, Part 3: The Back End

This month, we finish peeking under the hood of the compiler with a look at the last two steps in the compilation process (the process that turns your source code into something that a machine can actually execute). If you recall, there are seven steps in compilation. These steps are shown in Figure One. Last month, we looked at step five, "Intermediate Code Optimization", those optimizations that the compiler can perform independent of the architecture of the target machine.

Compile fig1.o
Figure One: The compilation process

This month, we finish peeking under the hood of the compiler with a look at the last two steps in the compilation process (the process that turns your source code into something that a machine can actually execute). If you recall, there are seven steps in compilation. These steps are shown in Figure One. Last month, we looked at step five, “Intermediate Code Optimization”, those optimizations that the compiler can perform independent of the architecture of the target machine.

This month we’ll look at steps six and seven: object code generation and object code optimization, also known as the machine-dependent optimizations. Machine-dependent optimizations depend heavily on the type of processor the and are primarily focused on the most important compiler optimization of all: register allocation.

Registers and Allocation

A register is a CPU’s internal “memory” — a scratch pad where the CPU stores, retrieves, and processes transient data. Typically, each CPU will have something like 32 registers imprinted in its silicon, and each register can store four bytes (four bytes is usually enough to represent an integer or a pointer). CPUs have instructions to store and load registers to and from main memory, and all computation is done on (or in) registers. For the CPU, accessing its registers is much, much faster than accessing memory.

Since all computation is performed with registers, moving values from main memory to registers is a requirement for a program to work at all. A non-optimizing compiler could simply turn every “read variable” in your program to “load from memory into some register”, every “write variable” to “write into memory from some register”, and perform computations in the registers themselves. However, this would be very slow since each read and write would go to memory.

Listing One: Simple example

void foo ()
int i, j = 0;
for (i = 0; i < 1000; i++)
j = j + i * i;

Let’s look at an example to make this more concrete. Listing One shows some C code that computes the sum of the squares of the numbers 0 to 1000. Now imagine, for the sake of our discussion, that reading/writing to a register takes 1 unit of time and reading/writing to memory takes 10 units of time. Clearly, Listing One will run 10 times slower if each read/write of i and j must go to memory.

Instead, if the compiler could simply leave i and j in registers during computation and only write the variables back to memory after the loop was completed, it’d save huge amounts of execution time.

The process of figuring out what variables to keep in which registers and for how long is called register allocation. Since register allocation greatly impacts the running time of programs, it’s certainly the most important compiler optimization.

Live Variables

Compile fig2.o
Figure Two: Control flow graph for Listing Two

To successfully allocate registers, the compiler must determine when all variables are live in your program. A variable is considered live at a point in the program if its value at that point is used at some point later in the program. Listing Two shows a somewhat contrived function with three variables that are live at various points in the code.

Let’s examine where each variable is considered live. To do this, it helps to have a control flow graph of the function. Figure Two shows the control flow graph for this function.

We’ll start with a: a is assigned in line 3 and used in lines 5 and 8 (in the assignments to b and c, respectively). So this makes a live along lines 3, 4, 5 (on the left branch) and lines 3, 4, 8 (on the right branch). We consider a not live at lines 6 and 9 because the value of a at those points is never used since it is reassigned at line 11. a is also live at lines 11 and 12. Using similar analyses, we see that b is live along lines 5, 6, 11, and 9, 11, and c is live along lines 8, 9, 11, and 6, 11.

Listing Two: Function with three variables

0: void foo ()
1: {
2: int a, b, c;
3: a = 7;
4: if (a < 10) {
5: b = a;
6: c = 2;
7: else {
8: c = a;
9: b = 1;
10: }
11: a = b + c;
12: return a;
13: }

Interference Graphs

Now that we’ve established the live “ranges” for each variable, we’ll use this information to allocate variables into registers. There are two rules for register allocation:

  1. Allocate of all the variables to as few registers as possible (there are a limited number of registers on a machine).
  2. If two variables are live at the same point in the program, the variables must be allocated to different registers.

One technique to determine if two variables are live at the same point is to build an interference graph. An interference graph has nodes for each of the variables and edges between two nodes if those variables are live at the same time.

Figure Three shows an interference graph for the variables in Listing Two. Notice that there are two different nodes for the variable a. This is because the variable is defined in two different places and the live ranges for those definitions should be treated separately. You can think of them as completely separate variables — the code would not perform any differently if a fourth variable (like d) was declared and used in lines 11 and 12 instead of a. Notice that the interference graph isn’t a connected graph (i.e., there are some nodes that are not connected to another node). Disconnected nodes imply that not every variable interferes with another variable.

Figure Three: An interference graph for the variables in Figure Two

a1 (Defined on line 3)

b — c

a2 (Defined on line 11)

And even though a and b are live in line 5, they don’t interfere as they’re not live at the same point in the program. Whether a variable in any given statement is live or not depends on the statement itself and what happens immediately before and after the statement.

Take a closer look at line 5. Notice that right before the statement in line 5 is executed, a must be live because it’s value is about to be used. Similarly, b is not live because its value has yet to be defined. At the point right after the statement in line 5 is executed, the current value of a is no longer needed so a is not live, but b is live since it has been defined and its value will be used subsequently. Since b is live only at the exit of line 5 and a is live only at the entry of line 5, they do not interfere. The only variables that interfere in this example are b and c.

Coloring the Graph

So, how do we use the interference graph to allocate variables to registers on our machine? It turns out that once we have an interference graph, the problem of allocating registers is equivalent to the problem of coloring the interference graph so that no two connected nodes (i.e., nodes that have an edge between them) have the same color. Each distinct color in the interference graph corresponds to a register on the machine.

There are many algorithms that can color an interference Graph, but each one is imperfect. (The problem of coloring an interference graph is an NP-complete problem. Explaining this mathematical concept is beyond the scope of this article, but in simple terms, it means you have a very tough problem that cannot be solved quickly.) The algorithm we’ll use will not always get us a solution, but will do well most of the time.

Given an interference graph and a number of registers, n, we perform the following steps:

  1. Pick a node in the graph whose degree (the number of edges coming out of it) is less than n and assign it a color that will not interfere with any color previously given. If there are no such nodes, give up.

  2. Remove that node and all of its edges from the graph.

  3. Repeat steps 1-2 until all nodes are colored or step 1 fails.

Let’s run through this algorithm for the interference graph in Figure Three. We’ll pretend that we have two registers, so n=2. All the nodes in this graph have degree less than 2, so we may pick any node.

We’ll start with a1 and give it the color red. Removing that node, we have a2, b, and c left. Next, we’ll pick a2. Since it has no edges, we can also color it red. Let’s move on to b. Since we haven’t colored the node that b is connected to, we can color b red as well.

Finally, we’re left with c. It must be colored differently than b, so we’ll color c blue. Notice that we succeeded in coloring the graph with only two colors. There are multiple nodes colored red, but those red nodes are not connected to each other.

Compile fig5.o
Figure Five: An interference graph that can be colored with 2 colors

Compile fig4
Figure Four: An interference graph that can’t be colored with 2 colors

However, this algorithm will not always succeed. Figure Four shows an example of a trivial interference graph where this algorithm doesn’t work. This graph cannot be colored with only two colors, so the algorithm will fail (it will actually fail on the very first step because there are no nodes with a degree of less than 2).

Now take a look at Figure Five. Clearly we can color this graph with only two colors (simply make a and d red, and b and c blue). However, our algorithm will fail on this example, too, because it can’t start on any of the nodes (again, they all have a degree of 2). When the algorithm fails, we can’t definitely say that the graph cannot be colored. We can only say, “We don’t know if the graph can be colored or not.”

Another weakness in this particular algorithm is that we lose some valuable information. All the interference graph tells us is that two variables interfere. It does not, for example, tell us how severely the variables interfere. In some cases, the live ranges of two variables may completely overlap. However, in other cases, the live ranges might only overlap by one or two instructions. In these cases, it might be worthwhile to rearrange the instructions (if possible) so the live ranges don’t overlap. Once we have converted the problem into an interference graph, we’ve lost the information necessary to perform additional analysis and additional optimizations.

Other Options

Of course, there are many algorithms that can allocate registers. But since register allocation is such a difficult problem to solve perfectly, all algorithms must make approximations that may ultimately yield “false negatives” (they may find a solution, but they may say “I don’t know” sometimes as well). Some register allocation algorithms may be suited for different types of memory access patterns, tailoring themselves to specific workloads. They may also attempt to be more general. The example algorithm we presented is a very simple introduction to register allocation and runs very quickly.

Generating Machine Code

Register allocation is by far the most important optimization a compiler can perform. However, once it’s decided how to allocate registers, there’s still some work to be done.

The compiler must know the specific Application Binary Interface (ABI) of the target architecture. An ABI defines precisely how binaries should be formed, including information about how to schedule instructions, how memory is organized, as well as the actual format of object files (where the code is placed, what headers are necessary to describe where the code is, etc.).

With knowledge of the ABI, the compiler determines how to allocate registers (based on the techniques described above) and what addresses to use for all the various data in the program (functions, global variables, etc.) as it generates machine instructions from the intermediate representation.

The compiler must also decide how to translate all of the instructions in its intermediate representation into the very best instructions that a processor has.

For example, some processors may provide a very fast instruction to increment a register while others may not. A compiler that doesn’t recognize this distinction may generate code that always adds one to a register rather than using a faster increment instruction. The code generated by this compiler would then be slower than the code generated by a smarter compiler.

This concludes our series on the internals of compilers. Hopefully, you have a better sense for just how compiler turns your source code into the 1s and 0s that run on your machine.

Benjamin Chelf is an author and engineer at CodeSourcery, LCC. He can be reached at chelf@codesourcery.com.

Comments are closed.