The Design of an In-Kernel Server

In last month's column, we took a look at the practice of invoking system calls from within kernel code. This month's column will deal with how a complete network server can be implemented as a kernel thread. The sample code shown throughout this column implements a simplified TFTP server. See the Features of the Sample Code sidebar, pg. 80, for more information.

In last month’s column, we took a look at the practice of invoking system calls from within kernel code. This month’s column will deal with how a complete network server can be implemented as a kernel thread. The sample code shown throughout this column implements a simplified TFTP server. See the Features of the Sample Code sidebar, pg. 80, for more information.

Normally, you shouldn’t bring user-space processes into kernel space, but there are times when, for performance or size reasons, this may be a good idea. The former reason is what led to kHTTPd. The latter may be relevant to avoiding the size overhead of a user-space application (based on libc, for example) in small-embedded systems devoted to a single task.

This column refers to the kernel-based Web server released in version 2.4.0-test9 of the kernel, which can be found in the net/khttpd directory of the source tree for that release. This is the kHTTPd program by Arjan van de Ven, not the Tux reimplementation by Ingo Molnar. While the latter program exhibits higher performance, it is also a bit more complex.

Kernel Threads

The first step a programmer must take in running a server in kernel space is to fork a process. To create a new thread, you must call the function kernel_thread. Since this is usually performed by the initialization function of a kernel module, the programmer must also detach the thread from the context of the insmod or modprobe command that was executing the initialization function.

Listing One illustrates how module initialization forks a new thread and how the thread detaches itself from the user process that forked it.

Listing One: Forking A New Thread

* in init_module(): fork the main thread
kernel_thread(ktftpd_main, NULL /* no arg */,
0 /* no clone flags */);

* in ktftpd_main(): detach from the original process
sprintf(current->comm,”ktftpd-main”); /* comm is 16 bytes */
lock_kernel(); /* This seems to be required for exit_mm */
/* close open files too (stdin/out/err are open) */
for (i=255; i>=0; i–)
if (current->files->fd[i])

To handle several clients at the same time, a daemon usually forks several copies of itself, each in charge of a single connection. This is accomplished by calling kernel_ thread from within the main loop each time a new connection is accepted. However, there is no need to do special processing at the beginning of the thread this time, since there is no user process to detach from.

You shouldn’t be shy when forking copies of your kernel daemon, because the resources consumed by each of them are much less than those associated with forking a user-space server. A kernel thread requires no memory-management overhead because all of them share their address space with that of process 0.

An atomic_t data item is used to count the number of running threads. I called it DaemonCount (the same name used by kHTTPd). atomic_t variables can be atomically modified (using atomic_read, atomic_set, etc.) by more than one thread; they provide efficient shared variables without requiring the use of explicit locks.

Before the module is unloaded, you’ll need to stop all the threads (as the code they are executing will disappear). There are several ways to accomplish this task. The kHTTPd server uses a sysctl entry point (/proc/sys/net/khttpd/stop), so the user can tell kernel code to stop server activity before unloading the module (each thread increases the module’s usage count to prevent users from unloading it before the threads are all stopped).

Sysctl is an interesting feature, but would increase the complexity of the sample ktftpd module. Those interested in sysctl should refer to http://www.linux.it/~rubini/sysctl for more information. To keep code shorter and simpler, I chose a different approach; the individual threads don’t add to the usage count for the module, and the cleanup function sets a global flag and then waits for all the threads to terminate.

Listing Two shows the code that deals with thread termination.

Listing Two: Thread Termination

int ktftpd_shutdown = 0; /* set at unload time */

* In the code of each thread, the main loop depends
* on the value of ktftpd_shutdown
while (!signal_pending(current) && !ktftpd_shutdown) {
/* …. */

* The following code is part of the cleanup function
/* tell all threads to quit */
ktftpd_shutdown = 1;
/* kill the one listening (it would take too much time to exit) */
kill_proc(DaemonPid, SIGTERM, 1);
/* and wait for them to terminate (no signals accepted) */
wait_event(ktftpd_wait_threads, atomic_read(&DaemonCount));

Additionally, the user is allowed to terminate each thread by sending it a signal (as you may have imagined by looking at the condition around the main loop in Listing Two). Trivially, the thread exits when a signal is pending. This behavior is the same signal handling implemented in the kernel Web server, and boils down to the few lines of code shown in Listing Three. The instructions shown are part of the initialization code of the main thread. Other threads are created with the CLONE_ SIGHAND flag, so sending a signal to any of them will kill them all.

Listing Three: Signal Handling

/* Block all signals except SIGKILL, SIGTERM */
siginitsetinv(&current->blocked, sigmask(SIGKILL) |

Managing Network Connections

The main task of a kHTTPd (and most similar network services) consists of transferring data to/from the network and from/to the local filesystem. We will discuss network access but we won’t go into filesystem access, because system calls like open, read, and close are accessible from kernel space, and we described how to invoke them last month.

As far as network access is concerned, what a server generally should do reduces to the following few system calls:

fd = socket();
bind(fd); listen(fd);
while (1) {
newfd = accept(fd);
if (fork()) {
/* …. */
} else {

Performing the same task from kernel space reduces to similar code, with fork replaced by kernel_thread. The main difference is that we use in-kernel file descriptors and socket structures that can be manipulated directly (thus avoiding the need to lookup the file descriptor table every single time).

The file net/khttpd/main.c, as found in the kernel source, shows how to handle a TCP session from kernel space. The implementation for UDP is similar. Listing Four shows how the first part of the task (preparing to receive packets) is implemented in the sample code. Next, a TCP server would sleep in the accept system call. However, unless you want to run a separate thread for every active network connection, you’ll need to multiplex operation of a single thread across several file descriptors (by calling either select or poll).

Listing Four: Getting Ready for Packets

/* Open and bind a listening socket */
error = sock_create(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
if (error < 0) {
printk(KERN_ERR “ktftpd: can’t create socket: errno == %i\n”, -error);
goto out;

/* Same as setsockopt(SO_REUSE). Actually not needed for tftpd *
/* sock->sk->reuse = 1; — needed for multi-thread TCP servers */
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = INADDR_ANY;
sin.sin_port = htons((unsigned short)KTFTPD_PORT);
error = sock->ops->bind(sock,(struct sockaddr*)&sin,sizeof(sin));
if (error < 0) {
printk(KERN_ERR “ktftpd: can’t bind UDP port %i\n”, KTFTPD_PORT);
goto out;

#if 0 /* There is no need to listen() for UDP. It would be needed for TCP */
error = sock->ops->listen(sock,5); /* “5″ is the standard value */
if (error < 0) {
printk(KERN_ERR “ktftpd: can’t listen()\n”);
goto out;

A kernel-space server should resort to neither select nor poll, as these calls feature non-negligible overhead. Instead, kHTTPd performs non-blocking calls for each pending operation and counts the successes. If no operation performed was successful, then the daemon sleeps for at least one timer tick, giving other processes the option to run before polling its file descriptors again.

If the service is based on the UDP protocol, the thread will usually sleep on recvfrom immediately after bind returns. This is what the sample server does. It avoids both select and a polling loop similar to the one described above by forking a new thread for each new connection being processed.

Generally speaking, sleeping on a system call that is invoked from kernel space is no different from sleeping in user space; the system call handles its own wait queue. The difference lies (as outlined last month) in the need to use set_fs and get_fs if the system call is expected to read or write a “user” buffer.

The kHTTPd daemon uses two functions to receive data. One, DecodeHeader, is used to collect all the HTTP headers; it uses the MSG_PEEK socket flag to avoid flushing the input queue until all headers are received. The other function is ReadRest ; it uses the MSG_DONTWAIT socket flag to avoid blocking when no data is ready to be read.

The procedure used by ktftpd to receive a data packet is somewhat simpler and uses no socket flags. It will simply sleep, waiting for a packet, until one is received or a signal is caught. This is illustrated in Listing Five. This procedure is not as generic as recvfrom as it requires the use of IP addresses.

Listing Five: Receiving Data Packets

* This procedure is used as a replacement for recvfrom(). Actually it is
* based on the one in kHTTPd which in turn is based on sys_recvfrom.
static inline int ktftpd_recvfrom(struct socket *sock,
struct sockaddr_in *addr,
unsigned char *buf)
struct msghdr msg;
struct iovec iov;
int len;

if (sock->sk==NULL) return 0;

msg.msg_flags = 0;
msg.msg_name = addr;
msg.msg_namelen = sizeof(struct sockaddr_in);
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_iov = &iov
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = buf;
msg.msg_iov->iov_len = PKTSIZE;

oldfs = get_fs(); set_fs(KERNEL_DS);
len = sock_recvmsg(sock,&msg,1024,0);
return len;

Sending Packets

The code to transmit packets is very similar to the receive code. It exploits sock_sendmsg, which mimics sock_ recvmsg.

The main difference is in how blocking is managed. The kernel thread that pushes data to a TCP socket should avoid a situation where data is only partially written, as that situation would require extra data management.

In the implementation of kHTTPd (in the file called datasending.c), the program issues a request to determine how much free space is in the TCP window associated with the socket and tries to push only that many bytes. This procedure is represented in Listing Six.

Listing Six: Transmitting Data Packets

int ReadSize,Space;
int retval;

Space = sock_wspace(sock->sk);
ReadSize = min(4*4096, FileLength – BytesSent);
ReadSize = min(ReadSize , Space );

if (ReadSize>0) {
oldfs = get_fs(); set_fs(KERNEL_DS);
retval = filp->f_op->read(filp, buf, ReadSize, &filp->f_pos);
if (retval>0) {
retval = SendBuffer(sock, buf, (size_t)retval);
if (retval>0) {
BytesSent += retval;

With UDP, each packet is sent as an individual item, so there is no need to check sock_ wpace. The ktftpd sample daemon reads 512 bytes from the filesystem at a time and builds a packet according to the amount that is read (512 or less). It then waits for the acknowledgment packet using the same ktftpd_ recvfrom function shown above. A TCP server doesn’t deal with acknowledgments, since reliability of the connection is built into the TCP protocol stack.

Handle With Care

Again, it’s not a great idea to just throw all sorts of different server daemons into kernel space. However, it’s nice to know that you can if you need to. It’s the kind of technique that gives you a competitive edge under certain special circumstances.

Features of the Sample Code

Code excerpts included in this column are part of a ktftpd module, available from my download directory: ftp://ftp.linux.it/pub/People/rubini/ktftpd.tar.gz. This kernel-space daemon loosely mimics what kHTTPd implements but has simplicity as a primary objective. The choice to use UDP instead of TCP was made to simplify things and to avoid replicating much of kHTTPd. Serving a file through TCP can be performed using do_generic_file_ read (the engine that drives the sendfile system call ), but this is yet another can of worms.

The tftp daemon implemented can serve world readable files from the /tftp file tree (and you can set up /tftp as a symbolic link to another directory if you prefer). Even though the tftp protocol supports data transfers in both directions, the sample daemon refuses to write to its own filesystem. Also, it doesn’t implement any packet retransmission (contrary to what RFC 783 requires). The daemon also doesn’t keep real logs. It merely prints a little information using conventional printk calls.

The sample module has been designed so that it is small enough to be thoroughly understood in a reasonable time, but this article can’t describe every aspect of its implementation. Reading this column, ktftpd, kHTTPd, and the Tux patch (in that order) makes for a good knowledge base.

Alessandro Rubini is an independent consultant who lives and works in Italy. He learns programming by reading free software and is usually late with his deadlines. Alessandro can be reached at rubini@gnu.org.

Comments are closed.