Manipulating “Seq” Files

The process filesystem is a window into the mystical innards of the kernel. The seq file interface simplifies the job of process file creators.
The process filesystem, procfs, is a virtual filesystem that provides windows into the innards of the kernel. The data you see when you peek through procfs is generated by the kernel on-the-fly. Sometimes, especially when the amount of data is large, the corresponding procfs read() functions become complex. The seq file interface is a kernel helper mechanism designed to simplify such implementations. Seq files make procfs operations cleaner and easier.
Before delving into seq files, let’s take a brief look at procfs, then gradually introduce complexities to a procfs read() routine and see how the seq file interface simplifies the implementation. Finally, we’ll update one of the few remaining 2.6 drivers that does not yet take advantage of seq files.

procfs Basics

Files in procfs are used to configure kernel parameters, look at kernel structures, glean statistics from device drivers, and retrieve general system information.
Procfs is a pseudo-filesystem, meaning that files resident in procfs aren’t associated with a physical storage device, such as a hard disk. Instead, data in procfs’s files are dynamically created on demand by corresponding entry points in the kernel. Because of this, file sizes in procfs get shown as zero. Procfs is usually mounted under the /proc directory during kernel boot up; you can see this by invoking mount.
To get a feel of the capabilities of procfs, run cat on files /proc/cpuinfo, /proc/meminfo, /proc/interrupts, /proc/tty/driver/serial, /proc/bus/usb/devices, and /proc/stat. Certain kernel parameters can be changed at run time by writing to files under /proc/sys/. For example, you can change kernel printk() log levels by echoing a new set of values to /proc/sys/kernel/printk. Many utilities, including ps and system performance tools such as sysstat, internally derive information from files residing in /proc/.

Reading from procfs

Let’s start with a basic procfs read function and gradually increase its complexity. Let’s then use the seq file interface to transform the labored routine into a graceful one.
As is common with many device drivers, assume that you have a linked list of data structures and that each node in the list contains a string field (called info). The example code in Listing One uses a procfs file named /proc/readme to export these strings to user space. When a user reads this file, the procfs read() entry point, readme_proc(), gets invoked. This routine traverses the linked list and appends the info field of each node to the filesystem buffer passed to it.
Listing One: Reading data via procfs

/* Private Data structure */
struct _mydrv_struct {
/* … */
struct list_head list; /* Link */
char info [10]; /* Info to pass via procfs/seq files */
/* … */
};

static LIST_HEAD (mydrv_list); /* List Head */

/* Init */
static int __init
mydrv_init (void)
{
int i;
static struct proc_dir_entry *entry = NULL ;
struct _mydrv_struct *mydrv_new;

/* … */
/* Create /proc/readme */
entry = create_proc_entry(“readme”, S_IWUSR, NULL);

/* Attach it to readme_proc() */
if (entry) {
entry->read_proc = readme_proc;
}

/* Manually populate mydrv_list for testing.
In the real world, device driver logic will
maintain the list and populate the info field */
for (i=0; i < 100; i++) {
mydrv_new = kmalloc (sizeof (mydrv_struct), GFP_ATOMIC);
sprintf (mydrv_new->info, “Node No: %d\n”, i);
list_add_tail (&mydrv_new->list, &mydrv_list);
}

return 0;
}

/* The procfs read entry point */
static int
readme_proc(char *page, char **start, off_t offset,
int count, int *eof, void *data)
{
int i = 0;
off_t thischunk_len = 0;
struct _mydrv_struct *p;

/* Traverse the list and copy info into the supplied buffer */
list_for_each_entry (p, &mydrv_list, list) {
thischunk_len += sprintf (page+thischunk_len, p->info);
}

*eof = 1; /* Indicate completion */

return thischunk_len;
}

Boot the kernel with these changes and peek into /proc/readme:
bash> cat /proc/readme
Node No: 0
Node No: 1
...
Node No: 99
When a procfs read entry points is called, it’s supplied one page of memory to pass information to user space. As you can see in Listing One, the first argument passed to readme_proc() is a pointer to this page-sized buffer. The second argument, start, is used to aid the implementation of /proc files larger than a page. (The use of this parameter will become clear when you look at the example in Listing Two.)
The next two arguments specify the offset from where the read operation is requested and the number of bytes to be read, respectively. The eof argument tells the caller whether there is more data to be read. If *eof is not set before returning, the procfs read entry point is called again for more data. In Listing One, if you comment out the line that sets *eof, readme_proc() is called again with the offset argument set to 1190 (which is the number of ASCII bytes contained in the strings, Node No: 0 to Node No: 99).
Finally, the read() entry point has to return the actual number of bytes that have been copied into the supplied buffer.
The amount of data generated by the procfs read routine in Listing One falls within the one page limit. However, if you increase the number of nodes in the linked list from 100 to 500 (in mydrv_init()), the amount of data generated reading /proc/readme crosses one page, and triggers the following output:
bash> cat /proc/readme
Node No: 0
Node No: 1

Node No: 322
proc_file_read: Apparent buffer overflow!
An overflow occurs after one page (4096 in this case) worth of ASCII characters have been produced.
To handle such large /proc files, you need to re-fashion the code in Listing One to use the start parameter mentioned earlier. This makes the function somewhat complicated — as is shown in Listing Two.
Listing Two: Implementing large procfs reads

static int
readme_proc(char *page, char **start, off_t offset,
int count, int *eof, void *data)
{
int i = 0;
off_t thischunk_start = 0;
off_t thischunk_len = 0;
struct _mydrv_struct *p;

/* Loop thru the list grabbing device info */
list_for_each_entry (p, &mydrv_list, list) {
thischunk_len += sprintf (page+thischunk_len, p->info);

/* Advance thischunk_start only to the extent that the next
* read will not result in total bytes more than (offset+count)
*/
if (thischunk_start + thischunk_len < offset) {
thischunk_start += thischunk_len;
thischunk_len = 0;
} else if (thischunk_start + thischunk_len > offset+count) {
break;
} else {
continue;
}
}

/* Actual start */
*start = page + (offset – thischunk_start);

/* Calculate number of written bytes */
thischunk_len -= (offset – thischunk_start);
if (thischunk_len > count) {
thischunk_len = count;
} else {
*eof = 1;
}

return thischunk_len;
}

The modified implementation works as follows:
*Put the data from the requested offset at an actual offset that is the distance between the requested offset and the start of the current data chunk. Even when the requested offset is much larger than a page, the actual offset is well within the page. Put this actual start address in *start.
*Return the number of bytes stored at *start.
*If you don’t signal eof, the routine is called again with the offset advanced by the number of bytes that you last returned.
With this hack, your /proc file can supply large amounts of data to user space without size limitations:
bash> cat /proc/readme
Node No: 0
Node No: 1

Node No: 499

Introducing Seq Files

The seq file interface comes to the rescue when you are faced with the prospect of awkwardly implementing large /proc files as in Listing Two. As its name implies, the seq file interface views the contents of /proc files as a sequence of objects. Programming interfaces are provided to iterate through this object sequence, and all your code has to supply is iterator methods expected by the seq interface.
Specifically, you must supply the following iterator methods:
1.start() is called first by the seq interface. This iterator method initializes the position within the iterator sequence and returns the first iterator object of interest.
2.next() increments the iterator position and returns a pointer to the next iterator. This function is agnostic to the internal structure of the iterator and considers it opaque.
3.show() interprets the iterator passed to it and generates output strings to be displayed when a user reads the corresponding /proc file. This method makes use of helpers like seq_printf(), seq_putc() and seq_puts() to format the output.
4.stop(), which is called at the end to clean up.
The seq file interface automatically invokes these iterator methods to produce output in response to user operations on related /proc files. Using the seq file interface, you do not have to worry about page-sized buffers and signaling end of data.
Let’s re-write Listing Two to use seq files. The result is shown in Listing Three. The code views the linked list as a sequence of nodes. The basic iterator object is the node, and each invocation to the next() method returns the next node in the list.
Listing Three: Using seq files to simplify Listing Two

#include <linux/seq_file.h>

/* start() */
static void *
mydrv_seq_start(struct seq_file *seq, loff_t *pos)
{
struct _mydrv_struct *p;
loff_t off = 0;

/* The iterator at the requested offset */
list_for_each_entry(p, &mydrv_list, list) {
if (*pos == off++) return p;
}
return NULL;
}

/* next() */
static void *
mydrv_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
/* ’v’ is a pointer to the iterator returned by start() or
* by the previous invocation of next() */
struct list_head *n = ((struct _mydrv_struct *)v)->list.next;

++*pos; /* Advance position */

/* Return the next iterator, which is the next node in the list */
return (n != &mydrv_list) ?
list_entry(n, struct _mydrv_struct, list) : NULL;
}

/* show() */
static int
mydrv_seq_show(struct seq_file *seq, void *v)
{
const struct _mydrv_struct *p = v;

/* Interpret the iterator, ’v’ */
seq_printf (seq, p->info);
return 0;
}

/* stop() */
static void
mydrv_seq_stop(struct seq_file *seq, void *v)
{
/* no cleanup needed in this example */
}

/* Define Iterator Operations */
static struct seq_operations mydrv_seq_ops = {
.start = mydrv_seq_start,
.next = mydrv_seq_next,
.stop = mydrv_seq_stop,
.show = mydrv_seq_show,
};

static int
mydrv_seq_open(struct inode *inode, struct file *file)
{
/* Register the operators */
return seq_open(file, &mydrv_seq_ops);
}

static struct file_operations mydrv_proc_fops = {
.owner = THIS_MODULE,
.open = mydrv_seq_open, /* User supplied */
.read = seq_read, /* Interface supplied */
.llseek = seq_lseek, /* Interface supplied */
.release = seq_release, /* Interface supplied */
};

static int __init
mydrv_init (void)
{
/* … */

/* Replace the assignment to entry->read_proc in Listing Two,
* with a more fundamental assignment to entry->proc_fops */
– entry->read_proc = readme_proc; /* Instead of this, */
+ entry->proc_fops = &mydrv_proc_fops; /* do this */

/* … */
}

Updating the NVRAM Driver

The seq file interface has been around since the latter versions of the 2.4 kernel, but its use has become widespread only with 2.6.
Let’s update the nvram driver, one of the few remaining drivers that haven’t yet switched over to use seq files. For this, you can use the extra simple flavor of seq files that uses only the show() iterator method. The single_open() routine can be used to directly register this method.
Listing Four shows the differences to update the nvram driver. Since the interface won’t sleep between calls to iterator methods, you can hold locks inside the methods.
Listing Four: Update drivers/char/nvram.c using Seq Files

+static struct file_operations nvram_proc_fops = {
+ .owner = THIS_MODULE,
+ .open = nvram_seq_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};

–static struct file_operations nvram_fops = {
– .owner = THIS_MODULE,
– .llseek = nvram_llseek,
– .read = nvram_read,
– .write = nvram_write,
– .ioctl = nvram_ioctl,
– .open = nvram_open,
– .release = nvram_release,
–};

+static int nvram_seq_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, nvram_show, NULL);
+}

+static int nvram_show(struct seq_file *seq, void *v)
+{
+ unsigned char contents[NVRAM_BYTES];
+ int i;
+
+ spin_lock_irq(&rtc_lock);
+ for (i = 0; i < NVRAM_BYTES; ++i)
+ contents[i] = __nvram_read_byte(i);
+ spin_unlock_irq(&rtc_lock);
+
+ mach_proc_infos(seq, contents);
+ return 0;
+ }

static int __init
nvram_init(void)
{

+ ent = create_proc_entry(“driver/nvram”, 0, NULL);
+ if (!ent) {
+ printk(KERN_ERR “nvram: can’t create /proc/driver/nvram\n”);
+ ret = -ENOMEM;
+ goto outmisc;
+ }
+ ent->proc_fops = &nvram_proc_fops;

– if (!create_proc_read_entry(“driver/nvram”, 0, NULL,
– nvram_read_proc, NULL)) {
– printk(KERN_ERR “nvram: can’t create /proc/driver/nvram\n”);
– ret = -ENOMEM;
– goto outmisc;
– }
/* … */
}

–#define PRINT_PROC(fmt,args…) \
– /* … */

–static int
–nvram_read_proc(char *buffer, char **start, off_t offset,
– int size, int *eof, void *data)
–{
– /* … */
–}

Lines that begin with are the original code in drivers/char/nvram.c that must be replaced, while lines prefixed with + are newly-added code. In addition to applying these differences, change all references to PRINT_PROC() in the original driver to seq_printf(). The original and new drivers produce the same output if you read /proc/driver/nvram.

Looking at the Sources

Perusing kernel source can be a bit overwhelming, but a couple of useful tools can help you wade through large code trees in search of symbols and other code elements.
cscope builds a symbol database from all the files in the source tree so that you can quickly locate declarations, definitions, regular expressions, and more. From the root of your kernel tree, issue the cscope –kqRv command to build the cross-reference database. The –q option generates more indexing information, so searches become noticeably faster at the expense of extra initial startup time. The detailed invocation syntax can be obtained from the cscope man page.
The ctags utility generates cross-reference tags for many languages, so you can locate symbol and function definitions in a source tree from within an editor like vi. Do make tags from the root of your kernel tree to “ctag” all source files. etags generates similar indexing information that is understood by emacs.
Look at Documentation/filesystems/proc.txt for more information on procfs. The fs/proc/ subdirectory contains the code that implements the procfs core. The seq file interface lives in fs/seq_file.c. Users of procfs and seq files are sprinkled all over the kernel sources. Each file that you see under /proc on your system has corresponding entry points implemented in the source tree. Locate them using cscope or etags.

Sreekrishnan Venkateswaran has been working for IBM India for ten years. His recent projects include porting Linux to a pacemaker programmer and writing firmware for a lung biopsy device. You can reach Krishnan at class="emailaddress">krishhna@gmail.com.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62