dcsimg

Porting Linux 2.0 Drivers to Linux 2.2

The desire for more speed and better multi-processor support has caused inevitable changes in Linux, resulting in the development of the current kernel -- Linux 2.2. As a driver author, you may initially avoid taking advantage of the latest kernel changes. Ultimately, however, you'll probably end up re-writing your driver to stay current with kernel design, to improve your driver's performance, and to take advantage of the ever-increasing opportunities that appear on the horizon for Linux users in 1999.

The desire for more speed and better multi-processor support has caused
inevitable changes in Linux, resulting in the development of the current kernel — Linux 2.2. As a
driver author, you may initially avoid taking advantage of the latest kernel changes. Ultimately,
however, you’ll probably end up re-writing your driver to stay current with kernel design, to
improve your driver’s performance, and to take advantage of the ever-increasing opportunities that
appear on the horizon for Linux users in 1999.

This is a comprehensive background on the changes that you, as a driver author, will need to
make in order to port a driver from Linux 2.0 or 2.1 to the new 2.2 kernel. Even if you’re writing
2.0 based add-on drivers and not necessarily familiar with the Linux kernel to begin with, you’ll
have a fighting chance of making your driver work with 2.2. And hopefully you’ll learn a little
something along the way.

None of the changes in 2.2 are gratuitous. Where possible compatibility modes exist for old
methods. These modes provide warnings that the driver ought to be updated to the new methods.
Several drivers in 2.2.0 still produce these warnings, so you shouldn’t feel bad if your driver
produces them, especially if all you want is to make it work.

Access To User Space

When comparing older kernels with 2.2, the most obvious change you will see is that the pair
verify_area() and memcpy_to/from_user have mutated. This is because of the good
old need for speed, and also creates a convenient place to clean up some complicated SMP race
conditions. Contrary to rumor, it doesn’t exist just to annoy device driver writers.

With Linux 2.0 the processor walked the list of memory owned by a process to determine if an
access to user space was legal. This was done to ensure that the read or write didn’t succeed when
it shouldn’t, because when an illegal read or write does succeed, the resulting “Ooops”
message isn’t pretty.

Linux 2.2 changes the rules of the game. Since the memory management hardware can do most of the
checking on a 486 or higher, it would be silly to do it with the software as well, especially since
most accesses are legal.

Instead the kernel builds tables that contain information about what addresses may fault, and
where to jump if they do.

When a user passes an invalid address, a basic sanity check is performed to ensure that it is
not a kernel address. Once verified, the kernel can trust the values given, knowing it can still
recover. The actual mechanics are not trivial, involving some interesting abuses of the ELF binary
format and some clever inline assembler tricks. Fortunately they’re all wrapped up nicely for
you.

Figure 1A contains an example of a driver written for 2.0. Under 2.2, the same driver will be
written as in Figure 1B.




Figure 1A


 struct thing my_thing;

if(verify_area(VERIFY_READ, userptr, sizeof(my_thing)))
return -EFAULT;
memcpy_fromuser(&my_thing, userptr, sizeof(my_thing));




Figure 1B


 #include <asm/uaccess.h>

struct thing my_thing;

if(copy_from_user(&my_thing, userptr, sizeof(my_thing)))
return -EFAULT;

The copy_from_user function returns zero on a successful copy, or if it faults, it
returns the number of bytes it was unable to copy. It is both cleaner and faster, with all the
magical fault catching concealed by the driver author. Copy_to_user works the same way.

Linux 2.0 also has a set of functions — get_user() and put_user() — that
did the same things for native C types as the memcpy functions. These still exist, but
their behavior has changed (and may be why you now have hundreds of warnings in your partially
ported driver!).

Previously, get_user() returned the value of the object. So it would read something
like Figure 2A.




Figure 2A


 if(verify_area(VERIFY_READ, pointer, sizeof(*pointer)))
return -EFAULT;
c=get_user(pointer);
switch(c)
{
..




Figure 2B


 if(get_user(c, pointer))
return -EFAULT;
switch(c)
{
..

In 2.2 the get_user function handles the fault checking, so it needs to return
two different pieces of information. The arguments have changed, and it now returns zero on a
successful read and -EFAULT otherwise. Figure 2A is replaced by Figure 2B.

The 2.0 put_user function has been given the same treatment. The fact that it returns
-EFAULT or zero can be very useful since many routines can now simply use:


return put_user(value,
pointer)

to get the desired error/success return to userspace.

File Operations Changes

Almost every device driver, except for the network drivers, interacts with the file system. The
file system layers have changed somewhat, although the impact on a device driver that doesn’t wish
to get involved are minimal.

First, many drivers need to obtain the inode of a passed file handle. In 2.0 this was done
with:


struct file *filp;
struct inode *inode;

inode = filp->f_inode;

In 2.2 these are handled via the directory cache (dcache), a namespace cache of active and
recently accessed files. This makes things like the find command much faster. Fortunately, the
change from a driver point of view is nice and simple:


inode =
filp->f_dentry->d_inode;

For file systems the changes are major, and a review of the changes involved in porting file
systems deserves an article unto itself.

The read and write operations have changed only a little. They now pass the file offset pointer
as an argument instead of relying on the one in the file handle. It may well be that the pointer
indicates the offset in the file handle, but you don’t need to worry about that, because the POSIX
standard defines pread/pwrite operations that allow you to automatically seek and fetch data at a
given position. In the conventional UNIX API, the seek (selecting offset) and the read of the data
were separate events. Care had to be taken with a threaded program so that when two threads accessed
a file that they didn’t end up seeking and then having the other thread move the file position before
they could read it. Pread/pwrite negates this problem.

The drivers that care about file position (which is not all of them — a file position is not
meaningful to a tty, for example) should be using the passed offset pointer instead of changing
filp->f_pos.

The release (close) operation is called, as before, on the last close of a file. A small change
here is that it is entitled to return a failure code, which can be returned via close().
The handle must still be closed, but it allows you to report that the close stumbled across a
problem.

There is also a flush operation which is invoked when any given process closes its copy of the
file handle. At the moment this is only used for NFS writes where the close of the file may be the
only point at which you discover that a write fails because the remote disk is full.

In most cases this functionality shouldn’t be needed.

Finally, the disappearance of the select method will be very visible to device drivers. This
method, and indeed the whole of select in the kernel has been replaced by the more scalable, but
arguably less elegant, system 5 based poll. The change is not visible to end users because
the kernel emulates the old select call with poll to extend compatibility.

The changes made for poll in most cases can be applied fairly mechanically to any device. The
fundamental API change is mostly invisible to the device driver author.

2.0 based driver code was called with a select_table as the final argument. This has
become a poll_table, although the functionality is basically the same. It is used to keep a
list of events that may cause the status of the poll() return to change. The wait
queue which indicates something may have occurred is added to the poll table using:


struct file *filp;
poll_table *wait;
struct wait_queue *queue;

poll_wait(filp, queue,wait);

The poll handler should then check what events are presently true. The main events are listed in
Figure 3.




Figure 3


 POLLERR – an error is pending
POLLHUP – a hangup occurred

POLLIN – input data exists
POLLRDNORM – normal readable data exists

POLLPRI – a “priority” message is waiting (used for urgent data on sockets)

POLLOUT – output is possible (there is room)
POLLWRNORM – there is space to output normal data.

Finally, the poll handler returns a mask of these events. The poll function will be called
whenever a process is polling a file and the kernel code thinks the status may have changed.

When the required events are true, the poll system call will clean up the tables without driver
assistance and then return to the user.

Figure 4 contains a simple example for a read only device (the bus mouse driver) from both
kernels 2.0 and 2.2.




Figure 4


Linux 2.0 – select

/*
* select for mouse input
*/
static int mouse_select(struct inode *inode, struct file *file,
int sel_type, select_table * wait)
{
if (sel_type == SEL_IN) {
if (mouse.ready)
return 1;
select_wait(&mouse.wait, wait);
}
return 0;
}

Linux 2.2 – poll

/*
* poll for mouse input
*/
static unsigned int mouse_poll(struct file *file, poll_table * wait)
{
poll_wait(file, &mouse.wait, wait);
if (mouse.ready)
return POLLIN | POLLRDNORM;
return 0;
}

Init Functions

A lot of drivers contain code executed only at start-up time. In Linux 2.2 based drivers, you
can mark these functions and code with __init and __initdata. The kernel build
uses more ELF and compiler tricks to collect these functions at link time and throws them away after
booting to make more memory available for applications. Some platforms, however, don’t support
__init and __initdata. For those platforms, they are ignored.

Including <asm/init.h>and marking initialization data and code with these can
often save you 5 to 10 percent of the total size of a device driver. A typical 2.2 kernel build
throws some 40K of initialization code away at boot time.

Interrupt Handlers

With older systems you could assume (although you probably shouldn’t have) that a PC would have
16 interrupts. You cannot assume this with Linux 2.2. Because Linux 2.2 uses the APIC interrupt
controller on multiprocessor machines, you might have 64 interrupt lines or more. In other words, do
not assume anything about the number of interrupts.

In the new 2.2 kernel, the notion of fast interrupts is gone. If you set the
SA_INTERRUPT flag to indicate your interrupt is fast, then interrupts will be disabled on
that processor while your interrupt is handled, but the remaining semantics of a “fast” interrupt
are not emulated. And normally, this shouldn’t matter.

A lot of 2.0 based code looks something like Figure 5A. The dev_id field in the
interrupt structure is specifically intended to pass this kind of information — thus avoiding the
need for device<->interrupt tables. Such tables do not work for PCI where an interrupt is
likely to be shared by two instances of the same device. Instead use the call described in Figure
5B.




Figure 5A


void my_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
struct my_device *dev=my_devices[irq];




Figure 5B


request_irq(irq, my_interrupt, SA_SHIRQ, “mythingy”, dev);

This will call the interrupt handler withdev_id holding the value of dev that is passed
to the request function.

There is one last thing to worry about with interrupts in 2.2 based drivers, and if you are
using multi-processor machines it may require some thought. Under Linux 2.2 an interrupt can be
executing in parallel with other kernel code. This is different from 2.0, which used global locking
to make the SMP transition simple.

You are still guaranteed that cli() and sti() will protect a section of code
and prevent the kernel from running an interrupt handler during the protected block, but you are no
longer guaranteed that an interrupt handler itself will prevent other kernel code from running. To
handle this you will need to use spinlocks.

Spinlocks and SMP

For the sake of expediency, we’ll only review the basic spinlocks as a recipe for handling
interlocking between an interrupt handler and the kernel code. On a single processor machine these
functions are turned into the conventional cli/sti functions and have no overhead. However, you
should probably test them with an SMP build (even on a single CPU machine) to be sure they work
correctly.

A spinlock is a type: spinlock_t. It is initialized with the function:


spinlock_t lock;
spin_lock_init(&lock);

This sets the lock up and indicates that it is not being held.

When you want to use a spinlock you must grab it. The function sits in a tight loop until it
grabs the lock. In the event that you’re using the lock from both interrupt and non-interrupt
contexts, you’ll need to disable that interrupt or all local interrupts when grabbing the lock.
This is common enough that a number of functions cover it.

To grab a lock:


spin_lock(&lock);

To release a lock:


spin_unlock(&lock);


To grab a lock, save the irq mask and disable local interrupts:


unsigned long flags;
spin_lock_irqsave(&lock,flags);

and to restore it:


spin_unlock_irqrestore(&lock,flags);

The normal use of such code can be seen in Figure 6.




Figure 6


void my_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
struct my_device *dev=dev_id;
spin_lock(&dev->lock);

/* Do the same things as we always did in 2.0
knowing user code grabbing the lock will be held up until … */

spin_unlock(&dev->lock);
}

Figure 7 contains a non-interrupt context where you need to protect small sections of code from
the interrupt handler running in parallel. Note the use of the irq disabling version of the lock. This
is very important — without it we may take the lock in the user code, then start an interrupt. The
interrupt routine will spin forever, trying to get a lock that is not going to be released (because
we’re stuck in the interrupt so we can’t be running the user code). When this happens, you have to
reboot, and you won’t be happy. Use the right version of the spinlocks for the sake of general user
happiness.




Figure 7


 struct my_device *dev;
unsigned long flags;

spin_lock_irqsave(&dev->lock, flags);

/* The interrupt cannot interfere here */
/* Do the things we did in 2.0 */

spin_unlock_irqrestore(dev->lock, flags);


The spinlocks guarantee one additional thing. They are marked with the required magic to tell
gcc that they are memory barriers. Even if you are not using volatile types, gcc will write any
values from registers to their final destination before unlocking. It will also read values directly
from memory, not from saved copies in registers made before the lock is taken. This means that you
don’t need to worry about any misery-producing optimization surprises the compiler might otherwise
invent. In Linux 2.0 the cli() sti() and restore_flags() functions have this
property, and in Linux 2.2 this continues to be true.

The io_request lock

The io_request lock is a spin lock that is taken by the kernel when queuing a request
to a block device (a hard disk, a floppy disk, or similar devices which can contain a file
system). If the driver is unaware of
the lock it will perform the way it
does with 2.0. The I/O operation will

remain single-threaded. If the driver is aware of and uses the lock, then it can get the
advantages of parallel I/O operations across multiple processors.

The lock protects the request queue,so a driver can safely drop it once it has copied or
processed the request queue entry. In some cases this is done by the device driver. In others, SCSI
for example, by the supporting code.

There are two reasons to drop the lock. First, it results in better performance from your device
driver. Secondly, it keeps interrupts enabled during your device operation. You may need to do this
because the device is very slow or because you have to use busy loops with timeouts.

The lock is dropped with:


spin_unlock_irq
(&io_request_lock)

and taken with:


spin_lock_irq
(&io_request_lock)

These are variants on the functions covered in the Spinlock section. The spin_lock_irq
always disables the interrupts on that processor; the spin_unlock_irq always restores
them.

Several SCSI drivers make use of this because of things like timeout handling. The NCR5380, for
example, drops the lock during the various delay loops required to control the relatively primitive
controller it uses.




Figure 8


spin_unlock_irq(&io_request_lock);

while(!(NCR5380_read(INITIATOR_
COMMAND_REG)& ICR_ARBITRATION_PROGRESS)
&& time_before(jiffies,timeout));

spin_lock_irq(&io_request_lock);

An example can be seen in Figure 8. This allows the timer to continue running, and other
processes can continue while the ancient 5380 hardware whirs into action. Because it drops the I/O
request lock in its own handlers, it also claims it again in its interrupt function. The interrupt
function in this case also manipulates the request queue and so must protect itself from another
processor which may also be queuing blocks for the device.

And that’s it: a general overview of some of the changes you’ll need to know about in order to
take advantage of the latest advances in the Linux 2.2 Kernel.





Alan Cox is a well-known Linux kernel hacker currently working on writing drivers, security
auditing, Linux/SGI porting, and modular sound. He can be reached at
alan@lxorguk.ukuu.org.uk.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62