dcsimg

Common Concurrent Programming Errors:

Creating multithreaded programs can seem challenging at first. We help you sort through the problems and point you on the path to finding the solutions.

Concurrent programming has been around for a long time. Programmers have typically used threads to express concurrency. Mapping independent tasks to threads gives the operating system greater flexibility in scheduling, which helps hide program latency. While one thread waits for data, the operating system can run another. In addition, most systems are capable of doing computation and I/O simultaneously, provided that computation and I/O are independent tasks mapped to threads. With symmetric multiprocessors (SMP) becoming more common, programmers are starting to use threads to take advantage of the performance benefits of parallel computing. (For an introduction to threads and their uses, see the four-part “Compile Time” series mentioned in the Resources box.)

Concurrency adds a temporal component to coding that can be confusing at first. It also makes debugging harder because errors can be difficult to reproduce. Fortunately, the same types of errors tend to appear over and over again. This article will try to show you what not to do, hopefully avoiding bugs in the first place (or at least making it easier to spot errors in existing programs). The concurrent programming errors described in this article are divided into two categories — those causing data races and those causing deadlock. Most of the example programs sometimes execute correctly and sometimes don’t. That’s the difficulty of debugging concurrent programs. Multithreaded programs that work fine during testing often fail in the hands of the end-user.

In this article, each listing that contains an error is denoted as a Challenge. The corresponding Solutions (with the correct code) begin on pg. 30. The code examples use Pthreads because it is available on most Linux and Unix platforms. Linux has a good Pthreads implementation. It is efficient, fairly complete, and complies with the POSIX standard. Still, most of the lessons in this article are applicable to other threading methods (e.g., the Java thread class, Win32 threads, OpenMP, etc.). For brevity, error codes are not checked in the source code listings. However, checking error codes, while important in normal programs, is even more important in multithreaded programs, as one of the examples will demonstrate.

Assuming Order of Execution

Programmers often expect statements to be executed in the order in which they are written, but it is important to remember that multiple threads execute asynchronously. Relying on a particular order of execution is a fundamental flaw in concurrent programming. For example, the multithreaded “Hello, World” program (Challenge One) illustrates the point. Initially, the program looks pretty straightforward. Four threads are created, and each one prints a greeting along with its identification number. Sometimes the program behaves as expected. Sometimes, however, duplicate identification numbers are printed. How is this possible if each thread has a local copy of my_id?




Challenge One: Multithreaded “Hello, World” Program


#include <stdio.h>
#include <pthread.h>

#define N 4

void *hello (void*);

int main() {
int j;
pthread_t tid[N];

for (j = 0; j < N; j++)
pthread_create (&tid[j], NULL, hello, (void *)&j);

for (j = 0; j < N; j++)
pthread_join (tid[j], NULL);
}

void *hello (void *my_id) {
printf (“Hello World from thread %d\n”, *(int *)my_id);
return NULL;
}

The programmer mistakenly assumes that threads begin execution the moment they are created. The printf() statement may indeed be executed before the next call to pthread_create(), but the operating system is free to schedule the threads in any order. It is possible that all threads are created before any of them actually begin executing. Since all copies of my_id point to the same address, which is changing in main(), the threads can print the same identification number.

This error is easy to correct once you have figured out what to look for. Simply make sure that each thread uses a unique address to hold its identification number. In a nutshell, never assume that your threads will execute in a particular order. Unless the programmer explicitly synchronizes the threads, the operating system will determine the order of their execution.

Incorrectly Scoped Mutex Variables

Threads periodically call the function shown in Challenge Two to update a global counter. A critical region of the function (i.e., the statements enclosed in the pthread_mutex_ {lock|unlock}() pair) appears to synchronize access to the global variable. However, mutual exclusion can only be enforced if the mutex variable is visible to all threads attempting to enter the critical region. The critical region is ineffective because the mutex is local scope; each thread initializes a unique mutex. If multiple threads increment the counter concurrently, data loss will occur. Fortunately, this error is easily corrected by expanding the scope of the mutex.




Challenge Two: Ineffective Critical Section


int counter = 0; /* shared counter */

void increment_counter (void) {
pthread_mutex_t *lock;

lock = (pthread_mutex_t *)
malloc( sizeof(pthread_mutex_t) );
pthread_mutex_init (lock, NULL);

pthread_mutex_lock (lock);
counter++;
pthread_mutex_unlock (lock);
}

Reinitializing Global or Static Data

Most programs require that some actions be done once and only once. Initialization of global data is one example. Creating thread-specific data is another (Listing One A, ). The work being done in function init_once() is enclosed in an if-block whose predicate changes inside of that if-block. The logic is correct for sequential or single-threaded execution but it does not guarantee correct behavior when multiple threads call the function concurrently (Table One).




Listing One A: Error Creating Thread-Speciic Data


pthread_key_t tsd;

void init_once () {
static int initialized = 0;

if (!initialized) {
pthread_key_create (&tsd, NULL);
initialized = 1;
}
}

In a multithreaded program, it doesn’t matter which thread performs the initialization. What matters is that the data is not reinitialized. (The ramifications of creating a pthread_ key_t variable twice are severe.)

For example, say that after initialization a static array is used to accumulate data. Reinitializing this array after work has begun causes data loss, which leads to incorrect results but may not crash the program. Finding these kinds of errors is very difficult. Once one is found, however, this type of error is easy to fix using a critical section (Listing One B), the pthread_once() function (Listing One C), or even easier, initializing shared data before creating the threads that will use them.




Listing One B: Protecting The Creation of Thread-Specific Data With A Critical Section


pthread_key_t tsd;
pthread_mutex_t once = PTHREAD_MUTEX_INITIALIZER;

void init_once () {
static int initialized = 0;

if (!initialized) {
pthread_mutex_lock (&once);
if (!initialized)
pthread_key_create (&tsd, NULL);
initialized = 1;
pthread_mutex_unlock (&once);
}
}




Listing One C: Protecting The Creation of Thread-Specific Data Using pthread_once


pthread_key_t tsd;
pthread_once_t once = PTHREAD_ONCE_INIT;

void caller () {
pthread_once (&once, init_once);
}

void init_once (void) {
pthread_key_create (&tsd, NULL);
}

At first glance, Listing One B appears to have the same race condition as Listing One A, only more convoluted. There is a data race on variable initialized, but it is benign in this case because the call to pthread_key_create() is protected. Even though the interleaving shown in Table One is still possible, only one thread at a time can acquire the mutex and enter the critical section. The first thread to enter the critical section initializes the thread-specific data and sets the flag (i.e., initialized = 1). This way, if any subsequent threads happen to enter the critical section, they will not call pthread_key_create() because the if-test is now false.

Lost Signals

By design, condition signaling in Pthreads is something of a free-for-all. Most of the key elements of correct signaling (i.e., tightly coupled mutex and condition variables, as well as an associated predicate) are entirely the programmer’s responsibility. The only certainty is that a thread cannot wait on a condition unless it first holds a mutex. Does this mean that it is the mutex that is associated with the condition? No, any mutex will do. Pthreads does not provide a mechanism to explicitly link condition and mutex variables; it’s left to the programmer.

So if a thread must hold a mutex to wait on a condition, does a thread have to hold the same mutex to signal that condition? No, the signaler doesn’t have to hold a mutex at all. Astute readers will be asking at this point, “What happens if a signal is sent but no threads are waiting?” Nothing. The signal is lost. Aside from additional system overhead, the program is unaffected.

Problems arise when threads wait for signals that will never come. The example program shown in Challenge Three is prone to deadlock, though it sometimes exits normally. One thread waits for data to be initialized. The other thread signals when initialization is complete.




Challenge Three: Potential For Lost Signal and Deadlock


#include <pthread.h>

void *signal (void*);
void *wait (void*);

int flag = 0;
pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cnd = PTHREAD_COND_INITIALIZER;

int main() {
pthread_t tid1, tid2;

pthread_create (&tid1, NULL, signal, NULL);
pthread_create (&tid2, NULL, wait, NULL);

pthread_join (tid1, NULL);
pthread_join (tid2, NULL);
}

void *signal (void *arg) {
initialize_data();
flag = 1;
pthread_cond_signal (&cnd);
return NULL;
}

void *wait (void *arg) {
pthread_mutex_lock (&mtx);
while (!flag)
pthread_cond_wait (&cnd, &mtx);
work_on_data();
pthread_mutex_unlock (&mtx);
return NULL;
}

Statement interleaving helps to explain how deadlock occurs (Table Two). Because the signal thread does not lock the mutex, it can signal the other thread, which is holding the mutex, before that thread is actually waiting. If this happens, the program is now irretrievably deadlocked. One thread is perpetually waiting for a signal that will never come while the main thread tries to join a thread that will never return.





Table Two: One Possible Interleaving of Challenge Three










TimeSignal ThreadWait Thread
T0Initialize data
T1Lock mutex
T2Evaluate condition:
(!flag) is false
T3Set flag = 1
T4Send signal
T5Wait for signal

The solution becomes more apparent when you realize that pthread_cond_wait() releases the mutex (mtx) while it waits for the condition signal (cnd) to be sent. Upon receiving the signal, it immediately reacquires the mutex.

Dangling Locks

Mutex locking and unlocking is a paired operation. As obvious as this sounds, it is very easy to violate this simple rule. Some programmers might assume that a lock is released when a thread terminates. It isn’t. Deadlock can occur whenever threads terminate or leave a critical region while still holding a mutex. The function in Challenge Four deadlocks because the first thread to realize that there is no more work breaks from the infinite loop without releasing the mutex. This type of error is common in sequential programs that are later threaded.




Challenge Four: Deadlock Due To A Dangling Lock


void *check_for_work (void *arg) {
int data;

while (1) {
pthread_mutex_lock (&mtx);
while (!new_data)
pthread_cond_wait (&cnd, &mtx);
data = buffer;
if (data == 0) break;
new_data = 0;
pthread_mutex_unlock (&mtx);
do_work (data);
}
return NULL;
}

Challenge Five shows a program that sometimes deadlocks and sometimes exits normally. The programmer mistakenly assumes that the threads will reach the critical section in the order in which they are created. However, threads do not necessarily begin execution the moment they are created, especially on a computer that has only one processor. If both threads are ready to execute, the operating system decides which to schedule first with no regard for the programmer’s expectations.




Challenge Five: Program Prone To Deadlock Because of Race Condition and Dangling Lock


#include <stdio.h>
#include <pthread.h>

void *race_to_lock (void*);

pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;

void *race_to_lock (void *id) {
int my_id = *(int *)id

pthread_mutex_lock (&mtx)
printf (“Thread %d acquired mutex\n”, my_id);
if (my_id == 2) {
printf (“Last thread in critical section\n”);
return NULL;
}
pthread_mutex_unlock (&mtx);

return NULL;
}

int main() {
int tid1 = 1, tid2 = 2;
pthread_t t1, t2;

pthread_create (&t1, NULL, race_to_lock, (void *)&tid1);
pthread_create (&t2, NULL, race_to_lock, (void *)&tid2);

pthread_exit (NULL);
}

In this example, the second thread always returns without unlocking the mutex, which is not a problem if the threads execute in the expected order. But, if this second thread ever happens to acquire the mutex first, the other thread deadlocks trying to acquire a mutex that it will never get. The best solution is to remember that mutex locking and unlocking is a paired operation and ensure that the matching unlock is always executed.

Locking Hierarchies

Most of the example codes that are in this article use only one mutex variable. In real programs, a single mutex is rarely sufficient, because programmers prefer to associate specific mutexes with particular data in some logical fashion. This isn’t a problem in and of itself, but problems can arise when mutex locking become hierarchical.

Bad locking hierarchies are a common cause of deadlock in concurrent programs. Listing Two shows a contrived, but clear, example of a bad-locking cycle. In this example, it is possible for one thread to acquire both mutexes and avoid deadlock. However, if one thread acquires lock1 and another acquires lock2, deadlock occurs because each is waiting for a resource that will never be available.




Listing Two: Bad Locking Cycle


#include <stdio.h>
#include <pthread.h>

void *thread1 (void*);
void *thread2 (void*);

pthread_mutex_t lock1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t lock2 = PTHREAD_MUTEX_INITIALIZER;

int main() {
pthread_t tid1, tid2;

pthread_create (&tid1, NULL, thread1, NULL);
pthread_create (&tid2, NULL, thread2, NULL);

pthread_join (tid1, NULL);
pthread_join (tid2, NULL);
}

void *thread1 (void *arg) {
pthread_mutex_lock (&lock1);
printf (“Thread 1 holding first lock\n”);
pthread_mutex_lock (&lock2);
printf (“Thread 1 holding second lock\n”);
pthread_mutex_unlock (&lock2);
pthread_mutex_unlock (&lock1);
return NULL;
}

void *thread2 (void *arg) {
pthread_mutex_lock (&lock2);
printf (“Thread 2 holding second lock\n”);
pthread_mutex_lock (&lock1);
printf (“Thread 2 holding first lock\n”);
pthread_mutex_unlock (&lock1);
pthread_mutex_unlock (&lock2);
return NULL;
}

It’s easy to spot the error in Listing Two, but imagine a real program which has different functions that are using different mutexes to protect different data. Now let’s say that conditions arise during execution which cause threads to call these functions simultaneously but in reverse order. Finally, imagine trying to debug this program when it executes correctly 99 times out of 100!

Lost Threads

The program in Challenge Six calculates pi by numerical integration but doesn’t always get the correct answer (results range from zero to 3.141593). The global pi variable can only be zero if the worker threads have not updated it. Can you spot the error in Challenge Six? Here’s a hint: It’s more of a flaw than an error, and every source code listing in this article is guilty of this flaw. Here’s another hint: The operating system is not required to give your program resources (e.g., dynamic memory allocation). It’s only required to notify the program when it doesn’t.




Challenge Six: Multithreaded Calculation of pi by Numerical Integration


#include <pthread.h>
#include <stdio.h>

#define N 32

void *worker (void*);

double delta, pi = 0.0;
int intervals = 100000;
pthread_mutex_t lck = PTHREAD_MUTEX_INITIALIZER;

int main() {
int j, id[N];
pthread_t tid[N];

delta = 1.0 / (double)intervals;

for (j = 0; j < N; j++) {
id[j] = j + 1;
pthread_create (&tid[j], NULL, worker, (void *)&id[j]);
}
for (j = 0; j < N; j++)
pthread_join (tid[j], NULL);

printf (“Pi equals %f\n”, pi * delta);
}

void *worker (void *id) {
int j, start = *(int *)id;
double x, partial_pi = 0.0;

for (j = start; j <= intervals; j+=N) {
x = ( (double)j – 0.5) * delta;
partial_pi += 4.0 / (1.0 + (x * x));
}
pthread_mutex_lock (&lck);
pi += partial_pi;
pthread_mutex_unlock (&lck);
return NULL;
}

The program calls pthread_create() N times and assumes that N threads are created. The program expects, and the results depend, on each thread calculating an integration interval. If one or more threads are missing, results are calculated incorrectly. If you are lucky, the faulty calculation produces results that are obviously wrong. If you’re not so lucky, however, the subtle numerical drift in your complex calculation will only manifest itself when a spacecraft crashes into the planet instead of making the soft landing that you were expecting.

Lost threads can also cause deadlock. For example, multithreaded programs often use barrier synchronization. A barrier blocks threads until a minimum number reach the barrier. If the barrier requires more threads than actually exist, the result will be deadlock. What if threads are waiting for a signal but pthread_create() failed for the signaling thread? The possibilities are virtually endless. Every programmer knows to check error codes when calling system functions. It is even more important to do so in concurrent programs.

What Else Can Go Wrong?

This article focused on self-inflicted errors — namely data races and deadlock — that occur because the programmer assumes that threads will execute in a particular order without explicitly enforcing that order through synchronization. However, libraries can also be a source of concurrent programming errors, which brings us to our final topic — threadsafety.

Threadsafe functions are free of side effects. Side effects include I/O and modifying variables outside of local scope. As an example of the latter, let’s consider a pseudo-random number generator that updates a static seed variable. This constitutes a side effect and makes the random number generator thread-unsafe. If multiple threads call this function concurrently, the seed value becomes indeterminate. Likewise, if multiple threads call a function that performs I/O, the order in which the threads output their data is undefined. This usually renders the output incomprehensible (e.g., multiple threads sending portions of an encoded audio stream in the wrong order).

On a larger scale, many libraries operate on globally accessible data. Image processing libraries are a good example. Images are often large, so the library functions are given pointers to the global data rather than private copies. Every function that modifies the data referenced by these pointers is thread-unsafe, which makes most of the library thread-unsafe too. Even if multiple threads call different functions, the threads can potentially operate on the same data. If one thread applies a mask while another thread applies a filter, for example, the image is corrupted.

If the programmer does not have access to the library source code (which is often the case), calls to library functions must be protected by a critical region. This effectively serializes large portions of the program, which defeats the purpose of multithreading. However, excessive synchronization is better than incorrect code.

Watch Your Step

Concurrent programming poses special difficulties. For a high profile example, see the article “The VxFiles: The Anatomy of a Glitch” by Tom Durkin (Robot Science and Technology, Issue 1, 1998), which describes a race condition that caused the computer on the Mars Pathfinder to periodically reset itself for no apparent reason.

But as you may now realize, difficult does not mean impossible. The techniques outlined in this article can help keep the thorns out of your sides (or keep your spacecraft on target).




Resources





Solutions

Solution to challenge one


#include <stdio.h>
#include <pthread.h>
#define N 4

void *hello (void*);

int main() {
int j;
+ int my_id[N];
pthread_t tid[N];
for (j = 0; j < N; j++)
- pthread_create (&tid[j], NULL, hello,
(void *)&j);
+ pthread_create (&tid[j], NULL, hello,
(void *)&my_id[j]);
for (j = 0; j < N; j++)
pthread_join (tid[j], NULL);
}

void *hello (void *my_id) {
printf (“Hello World from thread %d\n”,
*(int *)my_id);
return NULL;
}

Solution to challenge two


int counter = 0; /* shared counter */
+pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
void increment_counter (void) {
- pthread_mutex_t *lock;
-
- lock = (pthread_mutex_t *)
- malloc( sizeof(pthread_mutex_t) );
- pthread_mutex_init (lock, NULL);
-
- pthread_mutex_lock (lock);
+ pthread_mutex_lock (&lock);
counter++;
- pthread_mutex_unlock (lock);
+ pthread_mutex_unlock (&lock);
}

Solution to challenge three


#include <pthread.h>
void *signal (void*);
void *wait (void*);
int flag = 0;
pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cnd = PTHREAD_COND_INITIALIZER;
int main() {
pthread_t tid1, tid2;
pthread_create (&tid1, NULL, signal, NULL);
pthread_create (&tid2, NULL, wait, NULL);
pthread_join (tid1, NULL);
pthread_join (tid2, NULL);
}
void *signal (void *arg) {
initialize_data();
+ pthread_mutex_lock (&mtx);
flag = 1;
pthread_cond_signal (&cnd);
+ pthread_mutex_unlock (&mtx);
return NULL;
}
void *wait (void *arg) {
pthread_mutex_lock (&mtx);
while (!flag)
pthread_cond_wait (&cnd, &mtx);
work_on_data();
pthread_mutex_unlock (&mtx);
return NULL;
}

Solution to challenge four


void *check_for_work (void *arg) {
int data;
while (1) {
pthread_mutex_lock (&mtx);
while (!new_data)
pthread_cond_wait (&cnd, &mtx);
data = buffer;
- if (data == 0) break;
+ if (data == 0) {
+ pthread_mutex_unlock (&mtx);
+ break;
+ }
new_data = 0;
pthread_mutex_unlock (&mtx);
do_work (data);
}
return NULL;
}

Solution to challenge five


#include <stdio.h>
#include <pthread.h>
void *race_to_lock (void*);
pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
void *race_to_lock (void *id) {
int my_id = *(int *)id
pthread_mutex_lock (&mtx)
printf (“Thread %d acquired mutex\n”, my_id);
if (my_id == 2) {
printf (“Last thread in critical section\n”);
+ pthread_mutex_unlock (&mtx);
return NULL;
}
pthread_mutex_unlock (&mtx);
return NULL;
}
int main() {
int tid1 = 1, tid2 = 2;
pthread_t t1, t2;
pthread_create (&t1, NULL, race_to_lock,
(void *)&tid1);
pthread_create (&t2, NULL, race_to_lock,
(void *)&tid2);
pthread_exit (NULL);
}

Solution to challenge six


#include <pthread.h>
#include <stdio.h>
#define N 32
void *worker (void*);
double delta, pi = 0.0;
int intervals = 100000;
pthread_mutex_t lck = PTHREAD_MUTEX_INITIALIZER;
int main() {
int j, id[N];
pthread_t tid[N];
delta = 1.0 / (double)intervals;
for (j = 0; j < N; j++) {
id[j] = j + 1;
- pthread_create (&tid[j], NULL, worker,
(void *)&id[j]);
+ if (pthread_create (&tid[j], NULL, worker,
(void *)&id[j]) {
+ printf (“Error: Not enough system
resources\n”);
+ exit (1);
+ }
}
for (j = 0; j < N; j++)
pthread_join (tid[j], NULL);
printf (“Pi equals %f\n”, pi * delta);
}
void *worker (void *id) {
int j, start = *(int *)id;
double x, partial_pi = 0.0;
for (j = start; j <= intervals; j+=N) {
x = ( (double)j – 0.5) * delta;
partial_pi += 4.0 / (1.0 + (x * x));
}
pthread_mutex_lock (&lck);
pi += partial_pi;
pthread_mutex_unlock (&lck);
return NULL;
}



Henry Gabb is a parallel applications engineer at the Intel Parallel Applications Center. He can be reached at henry.gabb@intel.com.

Comments are closed.