Kernel Debuggers

Debuggers make kernel internals more transparent. On Linux, they come in different flavors: The Kernel debugger, kdb, the Kernel GNU debugger, kgdb, the GNU debugger, gdb, and JTAG- based debuggers. Learn how to use the former three in this month’s column.
Investing time in logic design and software engineering before code development starts and then staring hard at your code after development can minimize and even eliminate bugs. But since that’s easier said than done, and since all developers make mistakes, debugging tools are there to come to the rescue. This month, let’s explore some kernel debugging techniques.

As Valuable as an X-Ray

Kernel debuggers make kernel internals more transparent. You can single step through instructions, disassemble instructions, display and modify kernel variables, and look at stack traces.
The Linux kernel has no built-in debugger support. (Whether to include a debugger as part of the stock kernel or not is an oft-debated point in kernel mailing lists.) As of today, whether you want use the instruction-level Kernel debugger, kdb, or the source-level Kernel GNU debugger, kgdb, — the two main Linux kernel debuggers — you must first download the relevant patches and apply them to your kernel sources.
If you want to avoid the hassle of patching your sources with a kernel debugger, you can glean information about kernel panics and peek at kernel variables via the plain GNU debugger, gdb.
(JTAG-based debuggers, named after the Joint Test Action Group, which developed the standard, use hardware assisted debugging, and are powerful, but expensive. See the sidebar “JTAG Debuggers” for more information.)
You can enter a kernel debugger in multiple ways. One way is to pass command-line parameters that force the kernel to enter the debugger during boot up. Another way is via software or hardware breakpoints. A breakpoint is an address where execution should stop and transfer control to the debugger.
A software breakpoint replaces the instruction at that address with something else that causes an exception. You can set software breakpoints either using debugger commands or by inserting them into your code.
For Intel-based systems, you can set a software breakpoint with the code asm(“int $3”);. You can also use the BREAKPOINT macro, which translates to the appropriate architecture-dependent instruction.
Hardware breakpoints can be used instead of software breakpoints, if the instruction where you need to stop is on flash memory and the corresponding instruction cannot be replaced by the debugger.
A hardware breakpoint needs processor support. The corresponding address has to be added to debug registers, and you can have only as many hardware breakpoints as the number of debug registers supported by the processor.
You can also ask the debugger to set watchpoints on variables, which stops execution whenever an instruction modifies data at the watchpoint address.
Yet another way to enter the debugger is by hitting an attention key, but there are instances when this won’t work. If the code is sitting in a tight loop after disabling interrupts, the kernel won’t get a chance to process the attention key and enter the debugger.
For example, you can’t enter the debugger via an attention key if your code does something like this:
save_flags(flags); 
cli();
while (1);
sti();
Once in the debugger, no matter how control is transferred, you use the debuggers commands to deduce the errors.

The Kernel Debugger, kdb

kdb is an instruction-level debugger that is popularly used for debugging kernel code and device drivers. Before you can use kdb, you need to patch your kernel sources with kdb’ s patches and recompile the kernel (see the section “Further Reading” below for information on downloading the kdb patches).
The main advantage of kdb is that’s easy to set up, since you don’t need an additional machine to do the debugging (unlike kgdb). kdb’ s main disadvantage is that you need to manually and mentally correlate your source code with disassembled assembly code (unlike kgdb).
Let’s try kdb with the help of an example. Assume that you are modifying the kernel serial driver to run on your hardware. The driver isn’t working and you’d like kdb to help.
Start by putting a breakpoint at the serial driver’s open() entry point. Remember that since kdb is not a source level debugger, you need to open your sources and try to match the assembly language instructions with your C code. For convenience, Listing One show a source snippet from the serial driver’s open() entry point defined in drivers/char/serial.c.
LISTING ONE: A code snippet from the serial driver

static int rs_open(struct tty_struct *tty, struct file * filp)
{
struct async_struct *info;

/* … */
retval = get_async_struct (line, &info);
if (retval) return (retval);
tty->driver_data = info;

/* Point A */
/* … */
}

Press the PAUSE key to enter kdb and use the command id rs_open to look at the disassembled rs_open() function. Figure One shows the result.
FIGURE ONE: Disassembling a function in kdb
Entering kdb (current=0xc03f6000, pid 0) on processor 0 due to
Keyboard Entry

kdb> id rs_open
0xc01cce00 rs_open: sub $0x1c, %esp
0xc01cce03 rs_open+0x03: mov $ffffffed, %ecx

0xc01cce4b rs_open+0x4b: call 0xc01ccca0, get_async_struct

0xc01cce56 rs_open+0x56: mov 0xc(%esp,1), %eax
0xc01cce5a rs_open+0x5a: mov %eax, 0x9a4(%ebx)

Point A in the source code is a good place to put a breakpoint, because you can peek at both the tty structure and the info structure to see what’s going on.
Looking side by side at the source and the disassembly, the address rs_open+0x5a corresponds to Point A. (Correlation is easier if the kernel is compiled without optimization flags.) Put a breakpoint at rs_open+0x5a (which is address 0xc01cce5a) using the command bp, and then continue execution by exiting the debugger with the command go.
kdb> bp rs_open+0x5a
kdb> go
Next, you need to get the kernel to execute rs_open() to hit the breakpoint. To trigger this, execute the appropriate user space program. In this case, just echo some characters to the corresponding serial port (ttySX).
bash> echo "linux magazine" > /dev/ttySX
The breakpoint gets hit and kdb gets control:
Entering kdb on processor 0 due to Breakpoint @ 0xc01cce5a
kdb>
Take a look at the contents of the info structure. If you look again at the disassembly at one instruction before the breakpoint (rs_open+0×56), you’ll see that the register eax contains the address of the info structure. So, look at the register contents with the r command.
kbd> r 
eax = 0xcf1ae680 ebx = 0xce03b000 ecx = 0x00000000

0xcf1ae680 is the address of the info structure. You can dump its contents with the command md:
kbd> md 0xcf1ae680
0xcf1ae680 00005301 0000ABC 00000000 10000400

To make sense of this dump, look at the corresponding structure definition. info is defined as struct async_struct in include/linux/serialP.h as follows:
struct async_struct {
int magic;
unsigned long port;
int hub6;

}
If you match the dump with the definition, 0×5301 is the magic number and 0xABC is the I/O port. Well, that’s interesting! 0xABC doesn’t look like a valid port! If you’ve done enough serial port debugging, you know that the I/O port base addresses and IRQs are configured in the rs_table structure in include/asm/serial.h and would want to recheck them. Change the port definition to the correct value, recompile the kernel, and continue your testing!

The Kernel GNU Debugger, kgdb

kgdb is a source-level debugger. It’s easier to use than kdb, since you don’t have to spend time correlating assembly code with your source code. However, it’s more difficult to set up since an additional machine is needed to do debugging.
The GNU Debugger (gdb) is used in tandem with kgdb to step through kernel code. gdb runs on a host machine and the kgdb- patched kernel (see below for information on downloading the kgdb patches) runs on the target hardware. The host and the target are connected via a serial cable as shown in Figure Two.
FIGURE TWO: Connecting two machines to debug the kernel using kgdb



The kernel has to be informed about the identity and baud rate of the serial port via command-line parameters. Depending on the bootloader used, add kernel parameters…
kgdbwait kgdb8250=X,115200
… to either syslinux.cfg, lilo.conf, or the GRUB configuration file. kgdbwait asks the kernel to wait until a connection is established with the host-side gdb, X is the serial port connected to the host, and 115200 is the baud rate used for communication.
Now, configure the same baud rate on the host:
bash> stty speed 115200 > /dev/ttySX
If your host is a laptop that lacks a serial port, you can use a USB-to-serial converter for the debug session. In that case, instead of /dev/ttySX, use the /dev/ttyUSBX device created by the usbserial.o driver.

Stepping Through kgdb

Let’s learn some kgdb basics using the example of a buggy kernel module function. Modules make debugging faster, since the entire kernel need not be recompiled while making code changes. Remember to compile your module with the –g option to generate symbolic information. And since modules are dynamically loaded, the debugger needs to be informed of symbol information contained in the module.
Listing Two is the buggy function, my_function(), defined in drivers/char/mymodule.c.
LISTING TWO: A “buggy” kernel module function

char buffer;

int
my_function ()
{
int * my_variable = 0xAB, i;

/* … */
Point A:
i = *my_variable; /* Kernel Panic: my_variable points to bad memory */

return (i);
}

Insert my_module on the target using the –m option to decipher section addresses, as shown in Figure Three.
FIGURE THREE: Inserting a module
bash> insmod my_module –m
Using /lib/modules/2.x.y/kernel/drivers/char/my_module.o
Sections: Size Address Align
.this 00000060 e091a000 2**2
.text 00001ec0 e091a060 2**4

.rodata 0000004c e091d1fc 2**2
.data 00000048 e091d260 2**5
.bss 000000e4 e091d2c0 2**5

Next, invoke gdb on the host side machine as follows:
bash> gdb vmlinux  
(gdb) target remote /dev/ttySX
Since you passed kgdbwait as a kernel command line parameter, gdb gets control when the kernel boots on the target. Now, inform gdb about the above section addresses using the add-symbol-file command, shown in Figure Four.
FIGURE FOUR: Adding symbols to aid in debugging
(gdb) add-symbol-file drivers/char/mymodule.o 0xe091a060 
–s .rodata 0xe091d1fc –s .data 0xe091d260 –s .bss 0xe091d2c0

add symbol table from file "drivers/char/my_module.o" at
.text_addr = 0xe091a060
.rodata_addr = 0xe091d1fc
.data_addr = 0xe091d260
.bss_addr = 0xe091d2c0
(y or n) y
Reading symbols from drivers/char/mymodule.o ...done.
To debug the kernel panic, set a breakpoint at my_function() and then continue the kernel with the c command.
(gdb) b my_function 
(gdb) c
When kgdb hits the breakpoint, look at the stack trace, single step until Point A, and display the value of my_variable, as shown in Figure Five.
FIGURE FIVE: Walking through the code at the breakpoint
(gdb) bt    /* Back (stack) trace */
#0 my_function () at my_module.c :124
#1 0xe091a108 in my_parent_function (my_var1=438, my_var2=0xe091d288)


(gdb) step /* Single step till Point A */
(gdb) p my_variable
$0 = 0
There’s an obvious bug in the code. my_variable points to NULL since no memory was allocated to it Using kgdb, you can allocate the memory, circumvent the kernel crash, and continue testing. (See Figure Six.)
FIGURE SIX: Allocating some space to the variable in gdb
(gdb) p &buffer  
$1 = 0xe091a100 ""
(gdb) set my_variable=0xe091a100 /* my_variable = &buffer */
(gdb) c

The GNU Debugger, gdb

As mentioned in the outset, gdb can be used stand-alone to gather some debug information frm the kernel. However, you can’t step through kernel code, set breakpoints or modify kernel variables.
Let’s use gdb to debug the kernel panic caused by the buggy function in Listing Two. Assume now that my_function() is compiled as part of the kernel and not as a module, since you can’t easily peek into modules using plain gdb. Figure Six is part of the oops message generated when my_function() executes.
FIGURE SIX: An “oops” occurs when my_function() executes
Unable to handle kernel NULL pointer dereference at 
virtual address 000000ab

eax: f7571de0 ebx: ffffe000 ecx: f6c78000 edx: f98df870

Stack: c019d731 00000000

bffffbe8 c0108fab
Call Trace: [<c019d731>] [<c013b8ac>] [<c0108fab>]

Copy this cryptic oops message to oops.txt and use the ksymoops utility to obtain more verbose output. You might need to hand copy the message if the system is hung.
bash> ksymoops oops.txt 
Code; c019d710 <my_function+0/10>
00000000 <_EIP>:
Code; c019d710 <my_function+0/10>
0: a1 ab 00 00 00 mov 0xab,%eax
Code; c019d715 <my_function+5/10>
5: c3 ret
Looking at the ksymoops output, the oops has occurred inside my_function(). You can use gdb to get more information. In the invocation in Figure Seven, vmlinux is the uncompressed kernel image and /proc/kcore is the kernel address space. The first command p xtime just tests the waters.
FIGURE SEVEN: Debugging the kernel with gdb
bash> gdb vmlinux /proc/kcore
(gdb) p xtime
$0 = 1113173755

(gdb) x/2i my_function
0xc019d710 <my_function>: mov 0xab, %eax
0xc019d715 <my_function+5>: ret
my_function() looks very laconic when seen in assembly due to compiler optimizations. my_function() is effectively copying the contents of address 0xab to the eax register, since eax holds the return value from functions on Intel-based systems. But 0xab doesn’t look like a valid kernel address! Fix the bug by allocating valid memory space to my_variable, recompile, and continue your testing.

Further Reading

You can download kdb patches from http://oss.sgi.com/projects/kdb. Each supported kernel version has two patches: a common patch and an architecture dependent patch.
http://kgdb.sourceforge.net is the homepage for the kgdb debugger. The web site also has documentation on configuring and using kgdb.
If your Linux distribution does not include gdb, you can obtain it from http://www.gnu.org/software/gdb/gdb.html.
Next month’s column will continue with more debugging techniques including trace tools that can be used when the kernel is running at a customer site where it may not be possible to use a debugger.

Sreekrishnan Venkateswaran has been working for IBM India since 1996. His recent Linux projects include putting Linux onto a wristwatch, an MP3 player, and a pacemaker programmer. You can reach Krishnan at class="emailaddress">krishhna@gmail.com.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62