Notifier Chains and Completion Functions

Kernel helper interfaces make life easier for developers. Learn about notifier chains, completion functions and error handling aids.
Over the last couple of months,” Gearheads” showed how you can benefit from kernel helpers, such as linked lists, work queues and seq files. This month, let’s continue and explore more helper interfaces, like notifier chains, completion functions, kernel thread helpers, and error handling aids.

Notifier Chains

Notifier chains send status change messages to code regions that request them. Unlike hard-coded mechanisms, notifiers offer a versatile technique for dynamically loadable code to get notified when events of interest occur. Notifier chains were originally added to pass network events to concerned sections of the kernel, but are now used for many other purposes.
The kernel implements a number of predefined notifiers for significant events. For example:
1.A die notification is sent when a kernel function triggers a trap or a fault, caused by an oops, page fault, or a breakpoint.
2.A network device notification is generated when a network interface goes up or down.
3.A CPU frequency notification is generated when there’s a transition in processor frequency.
4.A Internet address notification is sent when a change is detected in the IP address of a network interface.
To attach your code into a notifier chain, you register an event handler function with the corresponding chain. When the requisite event occurs, the handler is passed two arguments: an event identifier and a chain-specific data argument. To define your own (user-defined) notifier chain, you must also implement the infrastructure to initiate the chain when the event is detected.
Listing One contains examples of using predefined and user-defined notifiers.
*my_die_event_handler() is registered with the die notifier chain, i386die_chain, using register_die_notifier().
To trigger invocation of my_die_event_handler(), introduce an invalid dereference somewhere in your code such as int*q=0;*q=1; When this code snippet is executed, my_die_event_handler() is called, producing a message like this:
my_die_event_handler: OOPs! at EIP=f00350e7
The die event notifier passes the die_args structure to the registered event handler. This argument contains a pointer to the regs structure, which contains a snapshot of processor register contents when the fault occurred. my_die_event_handler() prints the contents of the instruction pointer register.
*B my_dev_event_handler() is registered with the net device notifier chain, netdev_chain, using register_netdevice_notifier(). You can generate this event by changing the state of a network interface, like Ethernet (eth X) or loopback (lo):
bash$ ifconfig eth0 up
The ifconfig command causes the execution of my_dev_event_handler(). The handler is passed a pointer to struct net_device as argument, which contains the name of the network interface:
my_dev_event_handler: Val=1, Interface=eth0
Val=1 corresponds to the NETDEV_UP event as defined in include/linux/notifier.h.
*B my_inet_event_handler() is registered with the internet address chain, inetaddr_chain, using register_inetaddr_notifier(). To produce this event, do:
bash$ ifconfig eth0 1.2.3.4
The invocation of my_inet_event_handler() follows. It’s passed a pointer to struct in_ifaddr as argument. This contains the IP address that was changed or assigned:
my_inet_event_handler: Val=1, IP Address=1020304
*Listing One also implements a user-defined notifier chain, my_noti_chain. Assume that you want this event to be generated whenever a user reads from a file in the process filesystem (procfs). Add the following in the corresponding procfs read() routine:
notifier_call_chain (&my_noti_chain, 100, NULL);
This invokes my_event_handler() whenever you read the corresponding /proc file:
my_event_handler: Val=100
Val contains the identity of the generated event; it’s set to 100 in this example. The data argument is left unused.
Be sure to un-register event handlers from notifier chains when your module is released from the kernel. For example, if you up or down a network interface after unloading the code in Listing One from the kernel, you’ll be rankled by an oops unless you perform an unregister_netdevice_notifier(&my_dev_notifier) from the module release() entry point.
Listing One: Notifier event handlers

#include <linux/notifier.h>
#include <asm/kdebug.h>
#include <linux/netdevice.h>
#include <linux/inetdevice.h>

/* Die Notifier Definition */
static struct notifier_block my_die_notifier = {
.notifier_call = my_die_event_handler,
};

/* Die Notification Event Handler */
int
my_die_event_handler (struct notifier_block *self,
unsigned long val, void *data)
{
struct die_args *args = (struct die_args *)data;

if (val == 1) { /* ’1’ corresponds to an OOPs */
printk (“my_die_event: OOPs! at EIP=%lx\n”, args->regs->eip);
} /* else ignore */
return 0;
}


/* Net Device Notifier Definition */
static struct notifier_block my_dev_notifier = {
.notifier_call = my_dev_event_handler,
};


/* Net Device Notification Event Handler */
int my_dev_event_handler (struct notifier_block *self,
unsigned long val, void *data)
{
printk (“my_dev_event: Val=%ld, Interface=%s\n”, val,
((struct net_device *) data)->name);
return 0;
}


/* Inet Address change Notifier Definition */
static struct notifier_block my_inet_notifier = {
.notifier_call = my_inet_event_handler,
};

/* Inet Address Notification Event Handler */
int my_inet_event_handler (struct notifier_block *self,
unsigned long val, void *data)
{
printk (“my_inet_event: Val=%ld, IP Address=%x\n”, val,
ntohl (((struct in_ifaddr *) data)->ifa_address));
return 0;
}


/* User Defined Notifier Chain Implementation */
struct notifier_block * my_noti_chain;

static struct notifier_block my_notifier = {
.notifier_call = my_event_handler,
};

/* User Defined Notification Event Handler */
int my_event_handler (struct notifier_block *self,
unsigned long val, void *data)
{
printk (“my_event: Val=%ld\n”, val);
return 0;
}

/* Module Init */
static int __init
my_init (void)
{
/* … */

/* Register Die Notifier */
register_die_notifier (&my_die_notifier);

/* Register Net Device Notifier */
register_netdevice_notifier (&my_dev_notifier);

/* Register Inet Address change Notifier */
register_inetaddr_notifier (&my_inet_notifier);

/* Register a user defined Notifier */
notifier_chain_register (&my_noti_chain, &my_notifier);

/* … */
}

Functions that register and un-register notification handlers are protected using a spinlock (notifier_lock), but the routine that walks the notifier chain to dispatch events to registered handlers (notifier_call_chain()) is not. This is because of the possibility that the handler functions may go to sleep, unregister themselves while running, or get called from either process or interrupt context. The lock-less implementation however, introduces race conditions.
There are ongoing efforts to classify notifier chains and add atomic (rwlocked) and blocking (mutex) flavors on top of the raw chains. Go to http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 and follow the thread to find out how the implementation is evolving.

Completion Functions

Many parts of the kernel initiate certain activities as separate execution threads and then wait for those threads to complete. The completion interface is an efficient and easy way to implement such code patterns.
Some example usage scenarios include:
1.You’re writing a portion of a block device driver that queues a read request to the device. This triggers a state machine change implemented as a separate thread or work queue. The driver wishes to wait until the operation completes before proceeding with another activity. Look at drivers/block/floppy.c and drivers/mtd/devices/blkmtd.c for examples.
2.Your module is assisted by a kernel thread. If the module is rmmod- ed from the kernel, the release() entry point is invoked before removing the module code from kernel space. The release routine asks the thread to kill itself and blocks until the thread completes its exit. Listing Two implements this case.
Listing Two: Synchronizing using completion functions

static DECLARE_COMPLETION (my_thread_exit); /* Completion */
static DECLARE_WAIT_QUEUE_HEAD (my_thread_wait); /* Wait Queue */
int pink_slip = 0; /* Exit Flag */

/* Assistant Thread */
static int
my_thread (void * unused)
{
DECLARE_WAITQUEUE (wait, current);

daemonize (“my_thread”);
add_wait_queue (&my_thread_wait, &wait);

while (1) {
/* Relinquish processor till event occurs */
set_current_state (TASK_INTERRUPTIBLE);
schedule ();
/* Control gets here when the thread is woken
up from the my_thread_wait wait queue */

/* Quit if let go */
if (pink_slip) {
break;
}

/* Do the real work */
/* … */

}

/* Bail out of the wait queue */
__set_current_state (TASK_RUNNING);
remove_wait_queue (&my_thread_wait, &wait);

/* Atomically signal completion and exit */
complete_and_exit (&my_thread_exit, 0);
}

/* Module Init */
static int __init
my_init (void)
{
/* … */

/* Kick start the thread */
kernel_thread (my_thread, NULL,
CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD);

/* … */
}

/* Module Release */
static void __exit
my_release (void)
{
/* … */
pink_slip = 1; /* my_thread must go */
wake_up (&my_thread_wait); /* Activate my_thread */
wait_for_completion (&my_thread_exit); /* Wait till it quits */
/* … */
}

A completion can be declared statically using DECLARE_COMPLETION() or created dynamically using init_completion(). A thread can signal completion with the help of complete() or complete_all(). A caller can wait for a completion via wait_for_completion().
In Listing Two, my_release() raises an exit request flag by setting pink_slip before waking up my_thread. It then calls wait_for_completion() to wait till my_thread completes its exit. my_thread, for its part, wakes up to find pink_slip set, and must signal completion to my_release by calling complete(), and then kill itself by calling exit().
my_thread accomplishes these two steps atomically using complete_and_exit(). Using complete_and_exit() shuts the window between module exit and thread exit that can occur with the separate invocations of complete() and exit().

Kthread Helpers

Kthread helpers add a coating over the raw thread creation routines to simplify the task of thread management.
Listing Three rewrites Listing Two using the kthread helper interface. Lines starting with (minus) mark the code in Listing Two that’s to be replaced, while those starting with + are newly added lines. my_init() now uses kthread_create() instead of kernel_thread(). You can pass the thread’s name to kthread_create() rather than explicitly call daemonize() within the thread.
The kthread interface gives you free access to a built-in exit synchronization mechanism implemented using the completion interface. So, as shown in function my_release() in Listing Three, you can directly call kthread_stop() instead of laboriously setting pink_slip, waking my_thread, and waiting for it to complete using wait_for_completion(). Similarly, my_thread can make the neat call to kthread_should_stop() to check whether it ought to call it a day.
Listing Three: Synchronizing using kthread helpers

#include <linux/kthread.h>

/* Assistant Thread */
static int
my_thread (void * unused)
{
DECLARE_WAITQUEUE (wait, current);
– daemonize (“my_thread”);

– while (1) {
+ /* Continue work if no other thread has
+ * invoked kthread_stop() */
+ while (!kthread_should_stop ()) {
/* … */
– /* Quit if let go */
– if (pink_slip) {
– break;
– }
/* … */
}
__set_current_state (TASK_RUNNING);
remove_wait_queue (&my_thread_wait, &wait);

– complete_and_exit (&my_thread_exit, 0);
+ return 0;
}

+ struct task_struct *my_task;

/* Module Init */
static int __init
my_init (void)
{
/* … */
– kernel_thread (my_thread, NULL,
– CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD);
+ my_task = kthread_create (my_thread, NULL, “%s”, “my_thread”);
+ if (my_task) wake_up_process (my_task);

/* … */
}

/* Module Release */
static void __exit
my_release (void)
{
/* … */

– pink_slip = 1;
– wake_up (&my_thread_wait);
– wait_for_completion (&my_thread_exit);
+ kthread_stop (my_task);

/* … */

}

Instead of creating the thread using kthread_create() and activating it via wake_up_process(), as done in Listing Three, you can use the single call:
kthread_run (my_thread, NULL, "%s", "my_thread");

Error Handling Aids

Several kernel functions return pointer values. While a caller usually checks for failure by comparing the return value to NULL, each caller typically needs more information to decipher the exact nature of the error that has occurred. Since kernel addresses have redundant bits, they can be overloaded to encode error semantics with the help of a set of error handling helper routines. Listing Four implements a simple usage example.
Listing Four: Using Error Handling Aids

#include <linux/err.h>

char *
grab_data ()
{
char * buffer;

/* … */
buffer = kmalloc (100, GFP_KERNEL);
if (!buffer) { /* Out of memory */
return ERR_PTR (-ENOMEM);
}

/* … */
if (copy_from_user (buffer, userbuffer, numbytes)) {
return ERR_PTR (-EFAULT);
}
/* … */

if (too_many_bytes) {
return ERR_PTR (-EOVERFLOW);
}
/* … */

return (buffer);
}

int
my_function ()
{
char * buf;

/* … */
buf = grab_data ();
if (IS_ERR (buf)) {
printk (“Error returned is %d!\n”, PTR_ERR (buf));
}
/* … */
}

If kmalloc() fails in grab_data() in Listing Four, the following message:
Error returned is -12!
However, if grab_data() returns successfully, it returns a valid pointer to a data buffer.
As another example, let’s add error handling using IS_ERR() and PTR_ERR() to the thread creation code in Listing Three:
my_task = kthread_create (my_thread, NULL, "%s", "mydrv");

+ if (!IS_ERR (my_task)) {
+ /* Success */ wake_up_process (my_task);
+ } else {
+ /* Failure */
+ printk ("Error value returned=%d\n", PTR_ERR (my_task));
+ } >

Looking at the Sources

Before you delve into the sources, here are a couple of tips to help your research.
You can ask gcc to generate preprocessed source code using the –E option. The include paths to specify (using the –I option) depend on the header files included by your code. Here is a usage example to preprocess drivers/char/mydrv.c and produce exploded output in mydrv.i:
bash$ gcc –E drivers/char/mydrv.c –D__KERNEL__ –Iinclude –Iinclude/asm/mach–default > mydrv.i 
–E reduces the need to hop-skip through nested header files to expand multiple levels of macros.
You can also ask gcc to generate assembly listings using the –S option. To generate an assembly listing in mydrv.s for drivers/char/mydrv.c, do:
bash$ gcc –S drivers/char/mydrv.c –D__KERNEL__ –Iinclude –Ianother/include/path 
The kernel notifier chain implementation lives in kernel/sys.c and kernel/include/notifier.h. Look at kernel/sched.c and linux/completion.h for the guts of the completion interface. kernel/kthread.c contains the sources for kthread helpers, while include/linux/err.h includes definitions of error handling aids.

Sreekrishnan Venkateswaran has been with IBM India for ten years. His recent projects include porting Linux to a pacemaker programmer and writing firmware for a lung biopsy device. You can reach Krishnan at class="emailaddress">krishhna@gmail.com.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62