Multiplexed I/O

Linux programs often need to access more than one file at the same time. This can require a bit of coordination -- or multiplexing -- of the I/O. Under normal circumstances, the order in which different files get accessed by a program is determined by a simple algorithm. For example, the cat command employs the following algorithm to achieve this:

Linux programs often need to access more than one file at the same time. This can require a bit of coordination — or multiplexing — of the I/O. Under normal circumstances, the order in which different files get accessed
by a program is determined by a simple algorithm. For example, the cat command employs the following algorithm to achieve this:

1) read some information from standard in

2) write the same information from standard out

3) if standard in still has information, go back to step 1

While there are two files being used here, cat can use normal, blocking I/O and get the job done. If there isn’t any data available in step 1, cat wants to wait around for data to appear anyway — it doesn’t have anything else to do. If a program wanted to do some other work while it was waiting for data to appear on standard in, it could use nonblocking I/O (which I discussed in last month’s issue) and simply check for data at some convenient time. Similarly, if cat can’t write the data out right away (if standard out refers to a pipe that is currently full, for example), then just blocking until the data can be written is perfectly appropriate.

Let’s look at a case where the normal read() and write() paradigm doesn’t work very well. The telnet client, which lets users log into remote machines, has two file descriptors it needs to read from. The first is standard in for the telnet client. This is used for the input from the user which it then sends to the remote system. The other file descriptor it’s concerned with is the socket which connects the telnet client to the remote machine (for the purposes of this discussion, think of the socket as a pipe which just happens to have the other end on another computer). When the remote machine needs to send information to the user (like a shell prompt), it uses the socket, and when the telnet client reads that information from the socket it displays whatever it reads to the user.

How could we write such a client? Here’s one approach:

1) read data from standard in

2) send that data to the remote system

3) read data from the socket

4) display data to the user

While that seems to be the gist of what we’re trying to do, it has a big problem. What if the user isn’t currently typing anything, but the system wants to display a notification? If the algorithm is currently “reading data from user” at step 1, the client won’t notice that the remote system is trying to say something until the user enters something — we’ve blocked reading from standard in, obviously not quite the behavior we want.

Nonblocking I/O is the most obvious way of fixing this problem. Let’s mark both standard in and the socket we’re reading from as nonblocking, and change our algorithm to look like this:

1) try to read data from standard in

2) if any bytes were read, send that data to the remote system

3) try to read data from the socket

4) if any bytes were read, display data to the user

This algorithm does, in fact, work properly. It won’t block waiting for data, and it solves the problem our first, naive version had. This isn’t a terribly friendly way to write a program, however.

Think about what happens when the user is typing anything, and the remote system isn’t sending any data. Our algorithm bounces from step 1 to 3, to 1, to 3, to 1, to 3, to 1…anyway, you get the idea. We designed it to never block, so it never does.

Whenever the kernel runs our program, this algorithm makes it constantly scan for any input it should handle from either standard in or the socket, even when there is none. This will happen repeatedly as the program is dispatched by the kernel, which thinks the program is busy doing real work and not just polling for work that isn’t there. So when there is no work to do, our program will waste as much system time as it can! Clearly, we really do want our program to block, to avoid this constant and useless polling behavior. We just don’t want it to block on the socket (as the user may have data for us first) or on standard in (as the socket may have data first). What we need is a way to block on both, and have our program resume when either file descriptor has data ready to read. Here’s the algorithm we’d like:

1) give up control until standard in or the socket has data ready to read

2) if standard in has data, read it and send that data to the remote system

3) if the socket has data, read it and display that data to the user

4) go to step 1

This blocks properly (which our second example did not), unblocks properly (which our first example did not), and works exactly as desired — perfect! So, how do we get that blocking behavior we so obviously need?

As this is a common problem, Linux provides a system call that allows us to do exactly what we want to — two system calls in fact! Before the 2.2 kernel, only the select() system call was supported. With 2.2, poll() allows the same behavior (note that many versions of the glibc libraries provide a poll() call which actually uses select(), which makes it easier to write applications that work on a wide variety of Linux systems).

Before you can make much use of select(), you’ll need to get used to the idea of an fd_set data type, which is nothing more then a set of file descriptors. It’s an opaque type (which means you don’t access it directly) that you manipulate with the following set of macros provided in the <sys/select.h> header:

FD_ZERO(fd_set *set);
FD_SET(int fd, fd_set *set);
FD_CLR(int fd, fd_set *set);
FD_ISSET(int fd, fd_set *set);

The first of these, FD_ZERO, clears the set. No file descriptors are included in the set after using FD_ZERO.FD_ SET lets you add a particular file descriptor to a set, and FD_CLR lets you remove one. FD_ISSET lets you test whether a file descriptor is the member of a set.

Let’s look at a code snippet to make fd_set a bit more concrete. Here’s how we would construct an fd_set that contains standard in and standard error (no, I don’t know why you would actually want to do this):

fd_set myset;



Once myset is constructed as shown, FD_ISSET (STDIN_FILENO, &myset) is true but FD_ISSET(STDOUT_ FILENO, &myset) is false (as the set doesn’t contain standard out). The select() system call uses fd_set parameters to know which file descriptors it should monitor and to tell the caller which file descriptors are ready for action. The prototype for select() can be seen in Figure 1.

Figure 1: select() Prototype

 int select (int numfd, fd_set * readfds, fd_set * writefds,
fd_set * exceptfds, struct timeval * tv);

When select() is called, it blocks until a given set of file descriptors are ready for reading or writing, or a timeout expires (whichever comes first). The last parameter, tv, defines the timeout value and is a struct timeval, which is defined in <sys/time.h>.

 struct timeval {
long tv_sec;
long tv_usec;

The tv_sec field contains a time in seconds, and the tv_usec contains a time in microseconds (one-millionth of a second). The select() call will wait for activity on the indicated file descriptors for the time specified by the struct timeval. If the last parameter is NULL, select() will block indefinitely.

The readfds, writefds, and exceptfds parameters tell select() which file descriptors to watch, they all are pointers to fd_set structures. select() returns when any of the file descriptors in readfds are ready for reading or the any of the file descriptors in writefds are ready for writing. The file descriptors in exceptfds are watched for “exceptional conditions,” which under Linux only occur on sockets that have received out-of-band data. Any of readfds, writefds, and exceptfds may be NULL, in which case select() won’t look for that type of action. For the rest of this article, we’ll assume that exceptfds is NULL. Finally, pipes or sockets that have been closed on the other end appear to be ready for reading, but the read() will return 0 bytes.

One of the problems with select() is that fd_sets are not terribly efficient for the kernel to deal with. The only way to check which system file descriptors are set in an fd_set is to check each descriptor for membership. Since processes can have quite a few file descriptors open (at least 1024 with most Linux distributions), this makes for quite a bit of checking. To make things faster, select() makes the application tell it how many system file descriptors to check (starting from file descriptor 0) for membership in the fd_sets; this count is passed as the first parameter to select(), numfds. Think of numfds as the maximum file descriptor which is in any of the fdsets, plus 1. Admittedly, this is all a bit of a hack, but it does help performance. select() will return 0 if the timeout was reached, -1 if an error occurs, and a value greater than 0 if one or more of the file descriptors is in the state select() is looking for (ready to read from or ready to write to). If a value greater than 0 is returned, the fd_sets will be updated to include only those file descriptors ready for reading or writing. If a file descriptor was being watched for both reading and writing, it could be included in both readfds and writefds on return.

select() was poorly defined for a long time, so the exact return value is a bit hard to pin down in a portable manner. Linux follows the modern definition, which is to return the number of file descriptors included in readfds plus the number in writefds. However, it is unwise to rely on this behavior in programs designed for portability. The timeout value may also be changed on return depending on the version of the Linux kernel you are using, so you shouldn’t rely on what the struct timeval looks like after calling select().

Listing One: The select() System Call at Work

 #include <fcntl.h>
#include <stdio.h>
#include <sys/select.h>
#include <sys/time.h>
#include <unistd.h>

/* For simplicity, all error checking has been
left out */

int main(int argc, char ** argv) {
int fd;
fd_set fds;
char buf[1024];
int i;

fd = open(argv[1], O_RDONLY);

while (1) {
FD_SET(fd, &fds);

select(fd + 1, &fds, NULL, NULL, NULL);

i = read(STDIN_FILENO, buf, 1024);

if (!i) {
printf(“stdin closed\n”);
return 0;
write(1, buf, i);

if (FD_ISSET(fd, &fds)) {
i = read(fd, buf, 1024);
if (!i) {
printf(“file closed\n”);
return 0;
write(1, buf, i);

Obviously, select() is a pretty complicated system call. To understand things a little better, have a look at the sample program in Listing One. It expects a single filename, and prints out whatever information appears either in the file or in standard in.

The easiest way to test this program is by using a named pipe for the file specified on the command line. Here’s how (assume you have the source code in the file testselect.c):

 # make testselect
# mknod p mypipe
# ./testselect mypipe

Open up a second terminal window (with another xterm, or switch to another virtual console), and run the following:

 # cat > mypipe

Whatever you type in either terminal will be echoed by the program. Once you finish, press ^D in either terminal and the program will terminate.

Hopefully this article has made the idea of multiplexing I/O under Linux a bit clearer. This is a complicated topic, and the only way to really understand what’s going on is tory it, so I encourage you to spend some time experimenting with select().

Erik Troan is a developer for Red Hat Software and co-author of Linux Application Development. He can be reached at ewt@redhat.com.

Comments are closed.