Squashing Bugs at the Source

Based on new research, source code analysis has been used to find thousands of bugs in the Linux 2.6.x kernel. Here's how the technology works, what it can find, and why coding may never be the same again.

There are many ways to make software better: unit testing exercises individual pieces of software; stress tests put an entire project through rigorous paces; and field testing allows customers to deploy and test pilot software releases in production-like settings.

If you’re a software developer, you’re no doubt familiar with these and a host of other techniques and methodologies designed to improve the quality of your code. But, believe it or not, you can start debugging and improving your code before your code even runs. Source code analysis attempts to find bugs by reading and analyzing the source code of your program.

Source code analysis can detect memory leaks, NULL pointers, and buffer overflows, just to name a few. Better yet, source code analysis can find bugs early — before the unit tests fail, before quality assurance finds faults, and before your customer gets irate — leaving you with more time to focus on the cool stuff. Oh, yeah.

While computer scientists and developers have been analyzing source code for decades — using everything from eyeballs to automated tools — the prescribed automatable techniques typically provided little practical benefit for working programmers. In the past, source code analysis tools were tied to specific programming languages or often only performed one or two kinds of audits.

However, recent research has solved many practical problems that older techniques failed to address, yielding real breakthroughs and technology that can be used by anyone. Modern techniques can find bugs from stylistic differences to exploitable security flaws.

For example, modern source code analysis has found thousands of bugs in the most recent versions of the Linux 2.6 kernel, a piece of software generally hailed as being of very high quality. If source code analysis can improve the kernel, it can probably help your project, too. Let’s find out how.

Basic Blocks

Before enumerating all of the different types of bugs that can be found by source code analysis, let’s discuss the building blocks for source code analysis, appropriately named basic blocks.




Listing One: A simple program to demonstrate basic blocks


1: void foo(int bar) {
2: int j;
3: j = 12;
4: printf(“decision\n”);
5: if (bar == j) {
6: printf(“bar == j\n”);
7: } else {
8: printf(“bar != j\n”);
9: }
10: printf(“decision done\n”);
11: }




Figure One: The basic blocks for the program shown in Listing One

Block One



2: int j;
3: j = 12;
4: printf(“About to make a decision\n”);
5: if (bar == j) {

Block Two



6: printf(“bar is equal to j\n”);

Block Three



8: printf(“bar is not equal to j\n”);

Block Four



10: printf(“Done with the decision\n”); |

A basic block is a section of a program that does not cross any conditional branches, loop boundaries, or other transfers of control. As an example, consider the code in Listing One. The statements within foo() can be split into four basic blocks, as shown in Figure One. The statements on lines 4 and 6 are not contained in the same basic block because it is not necessarily true that if line 4 is executed, line 6 is also executed. Line 4 always executes when foo() is called. Line 6 only executes if the value passed into the function is the same as the value of the local variable j. And while lines 4 and 10 always execute when foo() is called, the two lines belong in separate basic blocks because there are intervening statements between them that may be executed.

Once the source code is split into basic blocks, the blocks can be arranged into a control flow graph. A control flow graph provides a picture of how code may execute. Figure Two shows the control flow graph for the code in Listing One and represented by the basic blocks in Figure One.








code_01
Figure Two: The control flow graph for the code in Listing One

For example, foo() may either execute Block One followed by Block Two and Block Four, or it may execute Block One followed by Block Three and Block Four. Different execution possibilities through the code are appropriately called paths. The function foo() has two possible paths.

Source code analysis uses control flow graphs to find and examine all of the possible paths through a piece of code. This differs greatly from simply running the code, because when you run the code once, only one path is taken. Thus, to fully test a program by running it, you have to run it once for each possible path through the code.

For programs that are thousands or even millions of lines of code, there could be literally billions of paths (or more). Even if your program took only one second to run, running it a billion times would take almost 32 years! However, source code analysis can analyze billions of paths in just a few minutes.

How can source code analysis be so fast? The answer lies in the difference between the operations that source code analysis performs and the operations a running program performs. A running program must be 100% complete in what it keeps track of. Every byte of memory must be updated when the program says to update it, read when the program says to read it, and so on. However, source code analysis need not be so precise. There are many bugs that can be found without keeping track of every single byte. By giving up the precision that’s only necessary when a program actually runs, static analysis can analyze all of the paths through a program, looking for specific problems.

So Many Bugs, So Little Time

There are many different types of bugs that source code analysis can find. Let’s look at five of those types and see five real bugs discovered in the Linux 2.6.1 kernel via source code analysis. [The authors did not single out any particular kernel module or programmer for this article, but simply pulled a few random samples to use as examples.]

While Linux kernel code provides the basis of the examples, all of the bug types shown here are regularly found in common C programs (the programming language used to write Linux). Similar, additional, and other types of bugs can be found in other programming languages.

For example, Java developers and C developers introduce different types of bugs because the two languages are so different. However, source code analysis can be applied to both programming languages, albeit using different diagnostic techniques to find germane bugs.

* NULL POINTER DEREFERENCES. Programs read from and write to memory all the time. Each read and write has an associated memory address and programs must write to or read from a valid piece of memory. To read from or write to memory in C, you might perform a pointer dereference. If the address dereferenced is 0 (or NULL), you’ve performed a null pointer dereference and the program immediately crashes.

An example of a null pointer dereference comes from a device driver for a 3com Ethernet card and is shown in Listing Two. Running a static code analysis tool over this code yields:


NULL error: Dereferencing possibly NULL
pointer “pdev” drivers/net/3c509.c:1580




Listing Two: A null pointer dereference in a Linux device drive


1572: static int
1573: el3_resume(struct pm_dev *pdev)
1574: {
1575: unsigned long flags;
1576: struct net_device *dev;
1577: struct el3_private *lp;
1578: int ioaddr;
1579:
1580: if (!pdev && !pdev->data)
1581: return -EINVAL;

The error message points out that on line 1580, the variable pdev is dereferenced and may be NULL. Looking at line 1580, you can see a check that pdev isn’t NULL; however, the logic of the check is wrong. If the pointer is NULL, it’s then dereferenced in the right hand side of the &&. This programmer clearly meant to write if (!pdev || !pdev->data) on line 1580.

* MEMORY LEAKS. Besides reading and writing memory, C programs must acquire memory from the operating system. When C programs are finished using the memory, the memory must be returned to the operating system. The C functions malloc() and free() allocate and free memory, respectively. One common bug that programmers often introduce is what’s known as a memory leak. A memory leak occurs when a program allocates memory and then loses track of that memory instead of explicitly freeing it. This type of bug can be especially bad for programs that are intended to run over the course of days, weeks, or months (like a web server or an operating system), because even if the program slowly leaks memory, eventually the program will run out of memory, perhaps causing it to crash.

Listing Three shows an example of a memory leak that source code analysis found in Linux. The source code analysis error is shown above the code snippet. On line 169 the variable node is used to hold some memory that was allocated by the call to the function pnpbios_ kmalloc() (the source code analysis is smart enough to determine that this function allocates memory). However, on line 172, the function returns without calling a free()-like function on node as it should. Therefore, since there is no reference to this memory once the function returns, the program loses track of it, causing a memory leak.




Listing Three: A memory leak in Linux


LEAK error: leaking storage “node”
linux/drivers/pnp/pnpbios/proc.c:172

161: static int proc_read_node(
char *buf, char **start,
off_t pos,
162: int count, int *eof,
void *data)
163: {
164: struct pnp_bios_node *node;
165: int boot = (long)data >> 8;
166: u8 nodenum = (long)data;
167: int len;
168:
169: node = pnpbios_kmalloc(…);
170: if (!node) return -ENOMEM;
171: if (pnp_bios_get_dev_node(
&nodenum, boot, node))
172: return -EIO;

* USING FREED MEMORY. Another memory problem that can plague C programmers is using some memory after it’s been freed. Once a program tells the system that it no longer needs memory, it shouldn’t use that memory, as the memory could have been reclaimed for other purposes. Using memory that’s already been freed can cause random memory corruption.

Imagine a situation where a program writes to a piece of memory that it’s already freed. If that memory has been reallocated to another purpose, writing to it may wreak havoc at another point in the program or in the other program. When memory corruption occurs, it’s often very difficult to debug because the resulting program crash isn’t necessarily going to point back to the root of the problem.

Source code analysis can point directly to where the bug is introduced by showing the first place that freed memory is used right after it is freed. Listing Four demonstrates one such bug found in Linux. In Listing Four, lines 1057 through 1062 clean up in the case of an error. First, an error message is printed and then memory is freed to avoid memory leaks. The programmer has the right idea, but the memory is freed in the wrong order. Line 1059 frees card, but on the following two lines, this memory is then dereferenced in card->bch and card->dbuf. Once a program frees memory, it’s not allowed to use it anymore, so this code could lead to random memory corruption and a very unhappy Linux.




Listing Four: An instance of using memory after it’s been freed


FREE error: 1059-1060 Using freed “card”,
deallocated by call to “kfree”.
linux/drivers/isdn/eicon/eicon_mod.c:1060

1056: if (!(card->sbuf = kmalloc(…))) {
1057: eicon_log(card, 1,
1058: “eicon: some message”);
1059: kfree(card);
1060: kfree(card->bch);
1061: kfree(card->dbuf);
1062: return;
1063: }

* USING UNINITIALIZED VARIABLES. When you want to use a variable in C, you have to declare it first. However, simply declaring a variable does not give it a value. In fact, after you declare a variable, but before you use it, it has a random value depending on what was at that memory when you first declared the variable. A common error that programmers make is forgetting to initialize variables in all situations before they use them.

Listing Five gives an example of uninitialized variable use in Linux. Here, the variable error is declared on line 561. This variable appears to be used to keep track of some error status information. The all field of this variable is initialized on line 583 and then returned on line 596. As long as line 583 is run, the use of error.all on line 596 is fine. However, lines 583 and 596 are not in the same basic block. Line 583 happens within an if statement. If the condition of this if statement is false, error.all will not have been initialized by line 596 when it’s value is returned. Hence, the function that called this function could receive any value as a return value.




Listing Five: An example of uninitialized variable use


UNINIT error: Uninitialized “error.all”, used.
linux/drivers/ide/ide-cd.c:561:596

556: byte ide_cdrom_dump_status (
ide_drive_t *drive,
const char *msg,
byte stat)
557: {
558: unsigned long flags;
559:
560: atapi_status_t status;
561: atapi_error_t error;
562:

582: if ((status.all &
(status.b.bsy|status.b.check)) == status.b.check) {
583: error.all = …

594: }
595: local_irq_restore(flags);
596: return error.all;
597: }

Using uninitialized variables can have varying effects. Sometimes, it may be as innocuous as printing out a bogus value. However, if the uninitialized value is a pointer and that pointer is then dereferenced, it can cause your program to instantly crash.

* BUFFER OVERFLOWS. Probably the most common programming bug — and the one that gets a lot of press — is the buffer overflow. (A buffer is just another name for a piece of memory.) When you allocate memory from the system, you indicate how many bytes are needed. If you use more bytes than what you originally requested, you cause a buffer overflow. In it’s simplest form, a buffer overflow is simply a place in the code where you’re reading from or writing to memory that’s just beyond the designated boundary for a particular buffer.

Many Internet worms and viruses take advantage of buffer overflows because they’re a very particular type of memory corruption that can be exploited to overtake a program. [A future Linux Magazine feature will examine how crackers exploit buffer overflows and other faults.] However, a lot of buffer overflows can be detected before the program is even run through the use of source code analysis.

Listing Six demonstrates a case where 14 bytes are allocated to a program for a particular variable, but then a 15th byte is written to off the end of the buffer.




Listing Six: A prototypical buffer overflow


BUFFER error: Accessing buffer
“(*ins).src_scb_slots” of size “14″
at position “14″ with index variable
“(*pcm_channel).src_slot”
linux/sound/pci/cs46xx/dsp_spos_scb_lib.c:1466

1463: snd_assert(pcm_channel->src_slot >= 0 &&
pcm_channel->src_slot <=
DSP_MAX_SRC_NR,
1464: return );
1465:
1466: ins->src_scb_slots[pcm_channel->src_slot] = 0;

The programmer does check the value of pcm_channel->src_slot before using it to access the buffer ins->src_ scb_slots. However, the check on line 1463 is off by one. Rather than checking to make sure pcm_channel->src_ slot is less than 14 (DSP_MAX_SRC_NR), the code checks to see if it’s less than or equal to that value. When the buffer is accessed, the 15th byte in the buffer may be written to, causing a buffer overflow.

There are many more types of bugs that source code analysis can be used to find than those listed above. One hot topic in the software world is security holes. While it’s true that certain bugs more often lead to security holes than others, any type of bug can be considered a security hole. If input from the user can cause a null pointer dereference, or a use after a free(), or any type of behavior that’s incorrect, it should be considered a security hole.

The same static code analysis techniques used to find classic types of bugs can be extended to identify those cases in which input from the user triggers the bug in the code. Besides security holes, source code analysis can be used to find dead code (code that can never be executed because of faulty logic within a function), useless statements, places where return values shouldn’t be ignored, and so on.

How Does Source Code Analysis Work?

So, how does static source code analysis work? Remember that source code analysis keeps track of only that information necessary to find certain types of bugs. This information, or state, is the basis of the analysis. Source code analysis uses simplifying assumptions to reduce the number of possibilities of states in a program.

For example, to look for null pointer dereferences, an analysis need only keep track of the values of pointers in the program, not other types of variables. To take that even further, the analysis doesn’t really care what value the pointer has — it only cares whether or not the value is NULL. While running, the actual value of a pointer could be one of 4 billion or so values, but to source code analysis that looks for null pointer dereferences, the only two values that matter are NULL and not NULL. This is how analysis can reduce the problem to make it easy to look for bugs in the code.

Once the state space has been reduced sufficiently, making it tractable to analyze billions of paths through the code, the analysis uses a state machine to track, say, a pointer through any given point in the program on a given path. A state machine is made up of states (e.g., “NULL” or “Not NULL”), as well as transitions between those states. Different statements in the code can cause transitions between states.








code_02
Figure Three: A state machine for the source code analysis that looks for null pointer dereferences

Figure Three shows a state machine for the analysis that looks for null pointer dereferences. This state machine has four states. When the source code analysis first starts, every time it sees a pointer, it puts that pointer in the UNKNOWN state. Then, when it sees a comparison against NULL or an assignment to that variable, it makes a transition in the state machine to either the NULL or the NOT NULL state (depending on what’s appropriate for the given path). Finally, if a variable is in the NULL state for some path and a dereference of that variable is seen on the path, the state machine transitions into the ERROR state and an error message is issued that includes the line where the error occurs and the path that was taken to get to that error.

All of the types of bugs mentioned in this article are found using similar methods of defining the state space and transitions for each particular bug type and then implementing that state machine as an analysis. Using this approach, designing source code analyses is very modular and extensible. To add a new kind of analysis to the existing suite of analyses, one must only think of the state machine and encode that logic. Given a good engine and framework, this task has proven to be as easy as writing twenty or thirty lines in a scripting language.

Why Haven’t We Been Doing This For Years?

You may be wondering at this point, “Why, if source code analysis is so beneficial, haven’t I heard of it before?”

Until recently, source code analysis wasn’t feasible for everyday programming. Previous attempts at source code analysis yielded too many false positives and couldn’t scale.

A false positive is the report of a bug that’s not really a bug. In other words, the analysis is incorrect. False positives can be caused by the simplifying assumptions that the analysis must make. Until recently, the types of source code analyses that could run over millions of lines of code made so many assumptions about the code that they yielded too many false positives. In some cases, they could have up to 50-100 false reports for every real bug! No programmer wants to look at 100 different reports just to find one real bug.

Using the most recent advances in source code analysis, the types of bugs described in this article can be found in code bases the size of Linux with a false positive rate of less than 20%.

This means that for every five reports that a developer examines, four of them will probably be real bugs. Technology with this modest false positive rate is far more practical for every day use.

Scalability is also a concern for many source code analysis techniques. Many approaches try to be so precise that they’re practical only on very small projects of a few thousand lines or less. As the size of software grows each year, these techniques can only be applied to the most critical portions of the code. However, some techniques with additional precision can actually prove certain programs to be correct for certain properties. The techniques described in this article can’t make any claims like “Your code has no null pointer dereferences.”

While there is a place for tools that have more precision at the cost of only working on a few thousand lines of code, they’ll never be generally accepted since it takes so long for them to run.

More Bugs, Be Gone!

Source code analysis is not the only way to find bugs in programs, but as software grows larger and larger, the more tools and techniques you have to build and debug it, the better.

No solution can guarantee that you won’t have the occasional crash or reboot, but with source code analysis, you can find a lot more of these problems before they ever hit your desktop.

If you still have doubts about what source code analysis can do to help you, take a look at http://linuxbugs.coverity.com to see a complete database of bugs found in a recent version of the Linux 2.6 kernel using the source code analysis technique described in this article. After seeing those bugs, you’ll agree that by using source code analysis, we can get one step closer to a bug-free world.




Resources

Web site of Stanford professor Dawson Engler, original creator of much of the technology discussed in this article.

http://www.stanford.edu/~engler/

Coverity is a company that promotes source code analysis solutions based on the Stanford developed research.

http://www.coverity.com

An open source effort to implement many of the source code analysis ideas from Professor Engler’s group.

http://smatch.sourceforge.net




Benjamin Chelf is one of the original researchers of the Meta-compilation project at Stanford and is a co-founder of Coverity, Inc. He can be reached at ben@coverity.com. You can submit your code for source code analysis at http://findbugs.coverity.com.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62