Benjamin Chelf Archive

Squashing Bugs at the Source
Based on new research, source code analysis has been used to find thousands of bugs in the Linux 2.6.x kernel. Here's how the technology works, what it can find, and why coding may never be the same again.
Compilers, Part 3: The Back End
This month, we finish peeking under the hood of the compiler with a look at the last two steps in the compilation process (the process that turns your source code into something that a machine can actually execute). If you recall, there are seven steps in compilation. These steps are shown in Figure One. Last month, we looked at step five, "Intermediate Code Optimization", those optimizations that the compiler can perform independent of the architecture of the target machine.
Compilers, Part 2: The Back End
Last month, we began investigating how compilers actually work. Our look "under the hood" started with the front end of the compiler -- the phases that parse and tokenize the source file, verify syntax and semantics (the rules of the programming language), and translate the source code into an intermediate representation. Figure One shows a overview of the entire process. This month, we pick up the discussion at step five, "Intermediate Code Optimization".
Compilers: The Front End
Over time, this column has discussed many programming topics and techniques. However, one subject we have never fully addressed is what actually happens at "compile time." How does a compiler take the program you write and translate it into something that a machine can understand and execute?
Linux and Executables
For the past few months, we've been learning how the compiler and linker work together to take the programs you write and convert them into executables that the operating system can run. We've followed the process from source code to object module to executable, with static and shared libraries thrown in as well.
Building and Using Shared Libraries
Over the past several months, this column has shown you how to use gcc and g++ language extensions, how to link objects and functions, and how to build executables. We will continue this month with a discussion about a very specific type of object -- a shared (or dynamic) library -- and how to take advantage of it in your programs.
Building the Perfect Executable
Your program has compiled with no errors. You type its name and watch it run. It seems so simple, but there's a lot that had to happen behind the scenes at the time the program was compiled in order to make it look so easy. For one thing, in order to make it possible for the kernel to properly load and execute your program, the compiler toolchain has to know exactly how the kernel will expect the new process's virtual address space to look. In other words, the toolchain has to be able to build the executable according to specifications that the kernel understands and expects.
Working with Language Extensions
Many of you are familiar with the C and C++ languages. You know the syntax and the semantics of the various operations and have a feel for what is allowed by the language according to its specification. However, you may (or may not) be surprised to discover that compilers for these languages deviate from the official specifications.
The Insides of Networking
In the past two months, this column has introduced some of the functions necessary for writing networked programs. We've been throwing around terms like TCP, UDP, IP, and others without any real description of what they mean. This month, you should gain a better understanding of these abbreviations and exactly what is going on when they are used to communicate between machines. With this knowledge, you will be better equipped to determine exactly which protocol is appropriate for your applications -- for instance, why UDP is often used for broadcast-style transmissions, while TCP is used for transactions.
More Network Programming
Last month we started a discussion on network programming. However, in the interest of getting through an entire example of a client and a server and how they communicate, we omitted many details. This month, we'll examine our examples more closely to gain more knowledge about network programming. Specifically, we will discuss how to get IP addresses from hostnames and hostnames from IP addresses. We will also take a look at the difference between little-endian and big-endian machines and find out why "endianness" matters in network programming.
Network Programming
This month, we are starting a series on network programming. This area of programming is enormous, not only because of the sheer amount of information that is needed to successfully develop network applications, but also because of the number of applications currently being developed with networking in mind. With network speeds increasing, more and more applications have a "network" version of some sort. For example, Quicken can automatically update your account information from some banks, computer games can be played with other people on the Internet, and so on.
A Peek Inside the Clock
Last month we introduced a few functions that allow your applications to retrieve the current time from Linux. We discussed how one might implement a simple function that causes an application to wait for a specific amount of time before continuing execution. We also looked at the alarm() function, which keeps time for you, and how you might use alarm() instead of the timing functions to allow a program to wait for a specific amount of time without needlessly executing any instructions while waiting.
Time Functions
Every now and then, you'll find that in the midst of an application, you really need to know the time from the system clock. Even more likely, you need to have your application wait for a specific amount of time. Linux's timing functions are relatively straightforward; however, most people overlook them until they need to use one in an application.
Dynamic Memory Allocation — Part II
Last month we introduced some of the concepts that are involved with memory management. We discussed how to dynamically allocate memory in your applications. We also described the various methods that allocators use to manage free memory. This month we turn our attention to memory allocation at a lower level -- the level of the operating system.
Dynamic Memory Allocation — Part I
This month we're going to examine a topic that is probably familiar to those of you who have any experience with programming -- dynamic memory allocation. In any programs of significant heft, dynamic memory management is a necessity. You are probably experienced with the standard C functions malloc() and free(), and we'll do a brief recap of those memory allocation and deallocation functions. Following that, we will look at how memory allocation in Linux actually works.
The Fibers of Threads
For the last several months in this column, we've been looking at programming with Linux's threads library, pthreads. However, we have taken for granted the work that is actually done under the covers by the pthreads libraries. So this month's Compile Time will dissect Linux pthreads themselves to discover exactly what it is that makes them tick.
More Concurrent Programming Topics
In the past two months we've introduced threads and mutexes, the locking mechanism used to prevent race conditions in threaded applications. In this month's column, we'll look at two types of concurrent programming techniques used to synchronize the execution of code in threads. Hopefully, this discussion will further your perception of locks and how they are used in concurrent programming.
Threads and Mutexes — Part II
Welcome to the second part of our look at programming with threads. In last month's column we talked about the functions that allow you to create and wait on threads. This month we're going to dive deeper into the problems that often arise when using threads to write concurrent programs. Before we begin that however, we'll return to the ticket agent example we looked at last month and discuss the solution to the problem of over-selling of tickets.
Programming with Threads
Last month we looked at using pipes and FIFOs to communicate between concurrently running processes. However, there are occasions when you might want to run two pieces of code concurrently without the limitations of communicating through pipes. Perhaps you have two (or more) pieces of code that need to share a set of data or are constantly updating shared data structures. Pipes and FIFOs are not well-suited to handle this kind of situation because they would require each process to keep its own copy of the data and to communicate with all other processes when that data changes. Such an arrangement would cause a great deal of problems and would be very difficult to debug.
Pipes and FIFOs
One of the nice features of Linux (and other Unix-like operating systems) is its ability to chain together a number of small utility programs so that they act like one larger program. I'm referring, of course, to the "pipe" feature that is supported by most popular shells and is denoted by the | character. This feature allows data to flow from the standard output of one application directly into the standard input of the next application in the chain, much as if you placed a pipe from the end of the first application to the beginning of the second. For example, the four utilities cat, grep, sort, and less can all be chained together like this: