Jeff Layton talks to Valerie Aurora, file system developer and open source evangelist, about a wide range of subjects including her background in file systems, ChunkFS, the Union file system and how the developer ecosystem can chip in.
There are many people who are influential in Linux. Many of them you are already aware of, but there are some, who are very influential, that you might not be aware. Valerie Aurora is a very influential thought leader in the Linux and FOSS community. She is responsible for the ChunkFS concepts and for much of the current work in Union file system work within Linux.
For many years, file systems on Linux was stalled. The community at large was happy with ext2 and ext3 and that’s where things stayed. There were some efforts with ReiserFS to develop something new, but, overall, the community had stopped FS development. Valerie Aurora pointed this out and started cajoling the community and organized the first ever Linux File Systems Workshop to help jump start the community. In addition she was on the program committee for Fast09.
Valerie has helped raise awareness of the need for file system development and progress. Not to embarrass Valerie, but take a look at her consulting site and you will see the depth of her experience. (Author’s note: The links in the interview are from the author and are intended as pointers for starting your own searches.)
Jeff Layton Please introduce yourself and tell us a bit about your background and what you doing these days.
Valerie Aurora I grew up in New Mexico (the state between Texas and Arizona), raising goats and training horses. My education was irregular, including a stint as the least religious wacky home-schooled kid in the neighborhood, but good enough to get me into the not-very-exclusive New Mexico Tech state university. I graduated with a degree in computer science and mathematics and got my first programming job by reading the “Help Wanted” ads in the Albuquerque Journal.
Today I live in a managed apartment complex in downtown San Francisco with two cats and a Roomba. Being a special unique snowflake, I work part-time for Red Hat as a file systems developer and part-time as a science writer and Linux consultant. I love having more than one job; boredom is my greatest enemy and switching gears every week keeps me interested and entertained.
JL You mention your work in file systems, could you go into some detail with your background on file systems? In particular, what file systems have you worked? Which ones do you prefer and why? (always good to find out what professionals like to use).
VA Perhaps a touch long, but here ya go.
I had no particular interest in file systems until I got a job working on one – ZFS, in 2002. Since I didn’t know much about file systems, I read and reread file systems research papers and went to the USENIX file systems conferences (JL: see Fast09 for the latest one). The few good operating systems books talked a little about file systems, but usually only described the Sun-style VFS architecture and FFS-style on-disk format.
After Sun, I went to work at IBM with Theodore Y. T’so’s group. We were basically level infinity customer support for Linux, which could be quite fun, especially since I didn’t get to talk to customers at all when I was at Sun. In our spare time, Ted and I came up with crazy ideas for adding file systems features to ext2 and ext3 that didn’t require 20-30 staff years to implement. At the time, only Namesys was making any significant investment in Linux file systems development, and their business model didn’t seem sustainable.
I moved to Intel and had time to implement some of our crazy ideas and come up with new ones. One was the ext2 dirty bit, which let you skip fsck after a crash if the file system was idle at the time of the crash. Another was relative atime, which, 3 years later, is now the default atime setting. While discussing why the dirty bit idea didn’t work at the block group level, Arjan van de Ven and I came up with the idea for ChunkFS, which I began work on.
All of this seemed pointless, though – how can Linux stay competitive in file systems without Linux developers being paid to work on file systems full-time? So, with Zach Brown and Arjan, I cobbled together the first Linux File Systems Workshop out of shoestring and duct tape, hoping that if the existing Linux file systems developers could just talk to each other, we could come up with a way to improve funding for Linux file systems development. I don’t know that it helped, but if it did, I consider this to be my most important contribution to Linux file systems.
I started a consulting business and did more chunkfs and ext2/3/4 work (parallelizing fsck), and now work for Red Hat. Currently I’m working on union mounts with Jan Blunck, a VFS-based approach to unioning file systems.
On my own systems, I always run ext3 with noatime or relative atime if it’s available. I also disable the paranoia file system checks, with “tune2fs -i 0″ and “tune2fs -c 0″. And I turn off SELinux, which uses extended attributes heavily as well as generally being a pain. For my recent work on 64-bit ext4 file systems, I found it easiest to create sparse 16TB+ files on an XFS partition and mount them loopback to fake up a 16TB+ device. (You can do this on ext4 using md, but it’s an enormous pain.) In general, I use ext3 by default and XFS if ext3 can’t handle what I’m doing. When I move to a new file system, it will be btrfs.
After 2.6.30, I started adding data=ordered too, since the new default for ext3 is data=writeback, which can result in nasty file data corruption after a crash.
JL You mentioned switching to btrfs. People are always asking for a comparison between btrfs and ZFS and the discussion usually get a little heated but can you tell us your opinion(s) around the two file systems? Do you see any file system on the horizon potentially being “better” than btrfs or ZFS (chose any definition of “better” that you want).
VA For some reason, file systems bring out the zealot in all of us. In my old age, I have come to find zealotry tiring, so I’ll try to stick to useful facts.
ZFS and btrfs are similar in structure and goals in a lot of ways. Features they share are copy-on-write everything, checksums, writable snapshots, storage pooling, simple administration, ad nauseum. More interesting are the major differences in architecture. At the highest level, ZFS uses plain ol’ trees of pointers to blocks, FFS-style, and variable block sizes, inspired by the SLAB kernel memory allocator. Btrfs uses a specialized, COW-friendly form of b-trees (as presented by Ohad Rodeh at LSF ’07) and extents. Btrfs is actually slightly more exciting than this: every single piece of data or metadata in btrfs is an item in a b-tree, and items are packed together indiscriminately, without regard to their types. ZFS reduced all file system metadata and data to objects and related operations; btrfs reduced them all to items in a b-tree. Now all the interesting decisions are about how to assign keys to items and order them inside the b-tree.
Now for personal opinion/zealotry time. Initially, I thought the ZFS approach was the simpler and cleaner – at the time, the code to manage b-trees and extents was complicated enough, and adding copy-on-write and checksums to that made your brain want to explode. It took Ohad Rodeh’s simplified, robust b-tree algorithms and Chris Mason’s everything-is-an-item insight to change my mind. I’m a convert.