Linked Lists and Work Queues

The use of standard kernel helper facilities simplifies your code, weeds out redundancies from the kernel, and helps long-term maintenance. And since the helpers are bug-free and optimized, you inherit those benefits for free.
Several useful helper interfaces exist in the kernel to make life easier for developers. One example is the doubly-linked list library. Many parts of the kernel need to maintain and manipulate linked lists of data structures. The kernel’s list… interface routines eliminate the need for chasing list pointers and debugging messy problems related to list maintenance. Another example of a useful helper facility is the seq_file interface, which simplifies the task of creating user space windows into your kernel code.
This month, let’s look at helper interfaces like lists, hlists, and work queues. The next couple of columns will look at helpers like seq files, completion functions, notifier blocks, kthreads, and more.

Linked Lists

To weave doubly linked lists of data structures, you can use the functions provided in include/linux/list.h. Essentially, you embed a struct list_head inside your data structure:
#include <linux/list.h>

struct list_head {
struct list_head *next, *prev;

struct mydatastructure {
struct list_head *mylist; /* Embed */
… /* Data fields… */
The mylist field is the link that chains instances of mydatastructure. If you have multiple list_head variables embedded inside mydatastructure, each of them will constitute a link that makes mydatastructure a member of a new list. You can use list library routines to add or delete membership from individual lists.
To get a picture of how the list library can potentially simplify the implementation of complex data structures, take a peek at the definition of the block device structure (struct cfq_data) in the disk scheduler implementation (found in drivers/block/cfq-iosched.c). Each block device structure can be a member of multiple lists, such as the idle list, the free list, or the busy list, but list management is accomplished easily.
To illustrate list usage, let’s implement an interesting example. The example will also serve as a foundation to understand the concept of work queues discussed later on.
Assume that your kernel driver needs to perform heavy-duty tasks from inside its entry points (such as operations that force the calling thread to sleep wait). Naturally, your driver won’t want to block until each task finishes, since that slows its responsiveness to user applications. Instead, whenever the driver needs to perform an expensive piece of work, it defers execution by adding the corresponding routine to a linked list of work functions. The actual work will be performed by a kernel thread, which traverses the list and executes the work functions in the background.
The driver adds submitted work functions to the” tail” of the list, while the kernel thread works its way from the” head” of the list, thus ensuring that work queued first gets done first. Of course, the rest of the driver needs to be designed to suit this scheme of deferred execution.
Let’s look at the key data structures in your driver:
static struct _mydrv_wq {
struct list_head mydrv_worklist; /* Work List */
spinlock_t lock; /* Protect the list */
wait_queue_head_t todo; /* Sync submitter and worker */
} mydrv_wq;

struct _mydrv_work {
struct list_head mydrv_workitem; /* Link in the work chain */
void (* worker_func) (void *); /* Work to perform */
void * worker_data; /* Parameter to worker_func */
… /* Other fields */
} mydrv_work;
mydrv_wq is global to all work submissions. Its members include a
pointer to the head of the work list, a spinlock to protect accesses
to the list, and a wait queue to communicate between the driver
functions that submit work and the kernel thread that does the work.
Listing One shows how to submit work.
Listing One: Submitting work to execute later

submit_work (void (*func)(void * data), void * data)
struct _mydrv_work * mydrv_work;

/* Allocate the work structure */
mydrv_work = kmalloc (sizeof (struct _mydrv_work), GFP_ATOMIC);
if (!mydrv_work) return -1;

/* Populate the work structure */
mydrv_work->worker_func = func; /* Work function */
mydrv_work->worker_data = data; /* Parameter to pass */

spin_lock (&mydrv_wq.lock); /* Protect the list */
/* Add your work to the tail of the list */
list_add_tail (&mydrv_work->mydrv_workitem,

/* Command the worker thread to get out of bed */
wake_up (&mydrv_wq.todo);
spin_unlock (&mydrv_wq.lock);
return 0;

The code uses list_add_tail() to add a work function to the
tail of the list. (Look at Figure One to see the physical structure
of the work list.)
FIGURE ONE: The structure of the kernel work list

The list helper functions do not protect accesses to list members, so you need to use concurrency mechanisms to serialize simultaneous pointer references. This is done using a spinlock. But before using the spinlock, you must initialize the spinlock, the list head and the wait queue, and kick start the worker thread. That is done in Listing Two in the driver initialization routine, mydrv_init().
Listing Two: Initialize Data structures

static int __init
mydrv_init (void)
/* … */
/* Init the lock to protect concurrent list access */
spin_lock_init (&mydrv_wq.lock);

/* Init the wait queue for communication between
the submitter and the worker */
init_waitqueue_head (&mydrv_wq.todo);

/* Init the list head */
INIT_LIST_HEAD (&mydrv_wq.mydrv_worklist);

/* Start the worker thread */
kernel_thread (mydrv_worker, NULL,

return 0;

To submit a work function, void job(void*), from a driver entry point, do submit_work(job, NULL).
After submitting the work function, Listing One wakes up the actual worker thread. The general structure of the worker thread shown in Listing Three is similar to” standard” threads. The thread works its way through all nodes in the list, using the list_entry() routine. list_entry() returns the container data structure inside which the list node is embedded.
Take a closer look at the relevant line in Listing Three:
mydrv_work = list_entry (mydrv_wq.mydrv_worklist.next, 
struct _mydrv_work, mydrv_workitem);
mydrv_workitem is embedded inside mydrv_work, so list_entry() returns a pointer to the corresponding mydrv_work structure. The parameters passed to list_entry() are the address of the embedded list node, the type of the container structure, and the field name of the embedded list node.
After executing a submitted work function, the worker thread removes the corresponding node from the list using list_del(). The spinlock is released and re-acquired in the time window when the submitted work function is executed, because work functions can go to sleep resulting in potential deadlocks if newly-scheduled code tries to acquire the same lock.
Listing Three: The worker thread

static int
mydrv_worker (void * unused)
DECLARE_WAITQUEUE (wait, current);
void (* worker_func) (void *);
void * worker_data;
struct _mydrv_work * mydrv_work;

set_current_state (TASK_INTERRUPTIBLE);

/* Spin till asked to die */
while (!asked_to_die ()) {
add_wait_queue (&mydrv_wq.todo, &wait);

if (list_empty (&mydrv_wq.mydrv_worklist)) {
/* Woken up by the submitter */
} else {
set_current_state (TASK_RUNNING);
remove_wait_queue(&mydrv_wq.todo, &wait);

/* Protect concurrent access to the list */
spin_lock (&mydrv_wq.lock);

/* Traverse the list and plough through the work functions
* present in each node */
while (!list_empty(&mydrv_wq.mydrv_worklist)) {

/* Grab the first entry in the list */
mydrv_work = list_entry (mydrv_wq.mydrv_worklist.next,
struct _mydrv_work, mydrv_workitem);
worker_func = mydrv_work->worker_func;
worker_data = mydrv_work->worker_data;

/* This node has been processed. Throw it
* out of the list */
list_del (mydrv_wq.mydrv_worklist.next);

/* Execute the work function in this node */
spin_unlock (&mydrv_wq.lock); /* Release lock */
worker_func (worker_data);
spin_lock (&mydrv_wq.lock); /* Re-acquire lock */
spin_unlock (&mydrv_wq.lock);
set_current_state (TASK_INTERRUPTIBLE);

set_current_state (TASK_RUNNING);
return 0;

For simplicity, the listings don’t perform error handling. So, for example, if the call to kernel_thread() in Listing Two fails, you need to free memory allocated for the corresponding work structure. Also, the asked_to_die() function in Listing Three is left unwritten. It essentially breaks out of the loop if it either detects a delivered signal, or receives a communication from the driver release() entry point that the module is about to be unloaded from the kernel.
Before ending this section, let’s take a look at another useful list library routine, list_for_each_entry(). This macro iterates over all elements of a list. You don’t have to use list_entry() inside the loop, so the iteration becomes simpler and more readable.
Use the list_for_each_entry_safe() variant if you’ll delete list elements within the loop. You can replace the following snippet in Listing Three…
while (!list_empty(&mydrv_wq.mydrv_worklist)) {
mydrv_work = list_entry (mydrv_wq.mydrv_worklist.next,
struct _mydrv_work, mydrv_workitem);
… with:
struct _mydrv_work *temp;
list_for_each_entry_safe (mydrv_work, temp,
mydrv_workitem) {
You can’t use list_for_each_entry() in this case because you are removing the entry pointed to by mydrv_work inside the loop. list_for_each_entry_safe() works around this problem using the temporary variable passed as the second argument (temp) to save the address of the next entry in the list.

Hash Lists (hlists)

The doubly linked list implementation discussed above is not optimal for cases where you want to implement linked data structures like hash tables. This is because hash tables need only a single pointer list head. To reduce memory overhead for such applications, the kernel provides hlists, a variation of lists. Unlike lists, which use the same structure for the list head and list nodes, hlists have separate definitions:
struct hlist_head {
struct hlist_node *first;

struct hlist_node {
struct hlist_node *next, **pprev;
To suit the scheme of a single pointer hlist head, the nodes maintain the address of the pointer to the previous node, rather than the pointer itself.
Hash tables are implemented using an array of hlist_head s. Each hlist_head element sources a doubly-linked list of hlist_node s. A hash function is used to locate the array element (or bucket). Once that’s done, hlist helper routines are used to operate on the list of hlist_nodes linked to the chosen bucket. Look at the implementation of the directory cache (dcache) in fs/dcache.c for an example.

Work Queues

Work queues are a way to defer work inside the kernel. Deferred work functions are useful in innumerable situations, such as sending a command to a disk and following through with the storage protocol state machine, triggering restart of a network adapter in response to an error interrupt, and filesystem tasks like syncing disk buffers.
The capabilities of work queues are very similar to the examples described in Listing One, Two, and Three. However, work queues can accomplish the same actions in a much simpler manner.
The work queue helper library exposes two interface structures to users, a workqueue_struct and a work_struct. Follow these steps to use work queues:
1.Create a work queue (or a workqueue_struct). Each workqueue_struct has one or more kernel threads associated with it. To create a kernel thread to service a workqueue_struct, use create_singlethread_workqueue(). To create one worker thread per CPU in the system, use the create_workqueue() variant. The kernel also has default per-CPU worker threads (events/n, where n is the CPU number) that you can share instead of requesting a separate worker thread. Depending on your application, you might incur a performance hit if you don’t have a dedicated worker thread.
2.Create a work element (or a work_struct). A work_struct is initialized using INIT_WORK(), which populates it with the address of your work function and the parameters to be passed to it.
3.Submit the work element to the work queue. A work_struct can be submitted to a dedicated queue using queue_work(), or to the default kernel worker thread using schedule_work().
Listing Four rewrites the first three listings to take advantage of the work queue helper interface. The entire kernel thread, as well as the spinlock and the wait queue, vanish inside the work queue interface. Even the call to create_singlethread_workqueue() can be elided if you use the default kernel worker thread.
Listing Four: Using work queues to defer work

#include <linux/workqueue.h>

submit_work (void (*func)(void * data), void * data)
struct work_struct * hardwork;

hardwork = kmalloc (sizeof (struct work_struct), GFP_KERNEL);

/* Init the work structure */
INIT_WORK (hardwork, func, data);

/* Enqueue Work */
queue_work (wq, hardwork);
return 0;

/* Driver Init Routine */
static int __init
mydrv_init (void)
/* … */
wq = create_singlethread_workqueue (“mydrv”);
return 0;

If you’re using work queue functions, you’ll get linker errors unless you declare your module licensed under the GNU General Public License (GPL). This is because the kernel exports these functions only to GPL’ed code. If you look at the kernel work queue implementation, you’ll see the restriction expressed in statements such as, EXPORT_SYMBOL_GPL(queue_work). To announce that your module is provided under “copyleft” according to the terms of the GPL, use MODULE_LICENSE(“GPL”).

Looking at the Sources

The list and hlist library routines live in include/linux/list.h. They are used throughout the kernel, so you can find usage examples in most subdirectories. (You can also go to http://www.ussg.iu.edu/hypermail/linux/kernel/0007.3/0805.html to follow the discussion thread in the mailing list for an interesting debate between Linus Torvalds and Andi Kleen about the pros and cones of complimenting the list library with hlist helper routines.
The kernel work queue implementation resides in kernel/workqueue.c. To get a hang of the real world use of work queues, look at the PRO/Wireless 2200 network driver, drivers/net/wireless/ipw2200.c.

Sreekrishnan Venkateswaran has been working for IBM India for about ten years. His recent projects include porting Linux to a pacemaker programmer and writing firmware for a lung biopsy device. You can reach Krishnan at class="emailaddress">krishhna@gmail.com.

Comments are closed.