dcsimg

The Colonel of the Kernel: Q&A with Andrew Morton

2.6 is coming, and Andrew Morton, a modest, humble, approachable, and very capable system software developer is leading the charge. Hand-picked by Linus Torvalds for the task, Morton talks about the next production kernel, the kernel development process, and what would happen if SCO won its case against IBM.

Just a short time ago on July 10, Linus Torvalds posted the following message to the kernel developers’ mailing list: “OK. This is it. We (Andrew and me) are going to start a ‘pre-2.6′ series, where getting patches in [to the kernel] is going to be a lot harder. This is the last 2.5.x kernel, so take note.” In other words, the 2.6 kernel is coming.

The “Andrew” Linus referred to is none other than Andrew Morton, a longtime system software developer, and a significant contributor to the Linux kernel since 2000. Most recently, Andrew has been enhancing the file system and virtual file system, maintaining Linux’s memory management system, and addressing general issues with the I/O subsystem.

interview_01

Perhaps more importantly, Morton has also been the conduit for a vast number of contributed features and changes for the 2.5 kernel, which are first integrated and tested in Morton’s tree before being sent ahead to Torvalds. Of course, now that Morton has been tasked with the release of 2.6, those changes will remain with him and others, while Torvalds starts on the next development version of Linux, 2.7.

And thanks to the Open Source Development Lab (OSDL), the leading advocate of Linux in enterprise computing, Morton, like Torvalds, will be able to focus full-time on his new task. In early July, two weeks after Torvalds joined OSDL as a fellow, OSDL announced that it would also sponsor Morton’s full-time work on the Linux kernel.

Linux Magazine Editor-in-Chief Martin Streicher met with Morton on a sunny morning in Palo Alto, CA, to talk about 2.6, the kernel development process, and the basis of the SCO dispute. In between playing with his children and meeting Torvalds for lunch, Morton talked at length about his career, about what to expect from 2.6, and about being tasked as the “Colonel of the Kernel.”

Linux Magazine: Andrew, how did you get started with the Linux kernel development team?

Andrew Morton: In March 2000, after being on the kernel developers’ mailing list and building kernels for some time, I pulled down the source for 2.3.49 and discovered that the driver for my Ethernet card had disappeared. Evidently, Alan Cox had marked it as obsolete. Obviously, it wasn’t, since I was still using it! [Smiles.]

I hadn’t done coding for about four years, and I was a bit rusty, but I fixed the driver, and sent a 2,500-line patch to Linus. Surprisingly, he took it. After that I fixed a bug or two more in that driver, and was then asked to assume the maintenance of the 3Com Ethernet driver — a rather important driver. I muddled through that for a few months, making it work a lot better.

Starting with the release of 2.4.0, I spent the next year-and-a-half fixing bugs all over the kernel. I didn’t really own any part of the kernel — people would find bugs, and if the bugs looked interesting to me, I’d fix them. Fixing bugs was an incredible way to become familiar with large amounts of code. I’d take on anything.

More recently, I helped bring Ext3 back from the grave, and I spent the better part of the last year working full-time on the kernel. Back in January, Linus asked me to take over the 2.6 kernel.

LM: What will your new role entail?

Morton: I’m responsible for delivering and maintaining the 2.6 kernel. Linus will go off and start the next development kernel, 2.7.

In practice, my role won’t change all that much. For about the past year, I’ve maintained my own kernel tree, and have taken in the work of a lot of other contributors. Of the 1,800 changes I’ve checked in, I’d say 500-600 of them are mine, and the rest are from other developers. When I think stuff is good and stable and ready to go, I’ll shoot the changes to Linus in batches of 20-30 patches. The exchange between Linus and me happens almost continuously.

The small change in my role is that changes destined for 2.6 now remain with me.

LM: What do you think are the most interesting features in 2.6?

Morton: Nothing in 2.6 is earth-shattering. After all, the Linux kernel is a 12-year-old piece of software. There’s not a lot of room for self-expression in a UNIX-like operating system. Often times, you have to run uname to even know what kernel you’re running.

But the feedback I get from those migrating from 2.4 is that the desktop is a lot smoother; the CPU scheduler is more responsive; and large stores and large applications have been solved in the virtual memory (VM) and virtual file system (VFS) layers. Generally, lowering latency and improving responsiveness are two things that have received a lot of attention.

The other major change is that the kernel runs much, much better on multiprocessor systems. The core kernel and the CPU scheduler have changed; the kernel has new, per-CPU run queues, and lock contention has been largely resolved. How far it scales, that I don’t really know. The 2.4 kernel is limited to four-processor systems; the 2.5 kernel comfortably scales to sixteen-processor systems, and perhaps even 32-way machines.

LM: Is 2.6 better suited for enterprise applications?

Morton: The support for multiprocessor systems certainly helps. And Linux has proven itself to be very stable.

There is, however, a fundamental difference between Linux and, say, Solaris: the two systems were designed with completely different aims in mind. And design decisions do affect the end result.

Linux now runs on something like twenty architectures — few, if any other operating systems were designed with such broad portability in mind. So, there are compromises made in the Linux kernel to support that design goal.

Also, Linux scales down. I can run it on a cell phone, or on a personal computer, and all of those systems can derived from a single code base fairly easily. But then again, scalability — like portability — has also been a longtime design objective of Linux.

Regarding Solaris, if your operating system is supporting at most two kinds of CPUs, and you’re deriving most of your revenue from enterprise customers, there’s a strong case to put in stupid hacks to make the operating system go faster. You can afford to add things like special system calls just for Oracle.

Now, 2.5 has a few new functions specific to Oracle, but in general, the kernel developers resist those kinds of features. In general, Linux will accept a performance hit in the interest of providing longevity.

LM: Switching to the kernel development process, you mentioned earlier that when the code in your tree is stable, it gets sent ahead to Linus for integration into his tree. Given the complexity of the kernel, how do you determine that something’s stable?

Morton: I depend largely on external testers. One will do one’s own testing, but I can’t always test — and I may not be inclined to test — code I get from other people. So, I put my kernel out twice a week or more, and lots of people pick up the builds.

Effectively, you have a million monkeys — the applications and systems that depend on the kernel — pounding on each new build. If a new build breaks, we hear about it. The converse is also true: no news is good news. After a week or two with no complaints, I assume everything is stable. It’s actually a very powerful system.

You also have to trust people, and you can always read the code.

LM: Are vendors treated as equals?

Morton: Yes. Vendors are equals, there are no favorites.

Now, when code comes from someone you’re familiar with, whose work you’ve already accepted before, it tends to make things easier. On the other hand, if you receive a massive amount of code from someone you’ve never heard of — that tends to add to the amount of work required to confidently integrate the code — there will be some resistance.

Also, if the code addresses something that no one really cares about, or implements some strange file system, there’s little motivation to fold the code in, judge it, and make sure it has a testing base. In some cases, it makes good sense to omit the code from the kernel.

LM: How about conflicts, say, between competing implementations of the same feature, or between competing, conflicting sets of requirements?

Morton: When all’s said and done, the kernel is a mature code base. Even with a lot of changes going in to the kernel, conceptually, its bits and pieces don’t change a lot.

The only conflict that comes to mind occurred about six months ago when the kernel team received two competing implementations of the Linux Logical Volume Manager (LVM). The IBM team had EVMS, and the authors of the 2.4 LVM had created their own, second-generation device manager. Although each system had slightly different features, the two code bases offered much the same features. Now, I wasn’t involved in that decision, but I know Linus said, “Look, I don’t know what to do here, because I don’t use either of these things, and I don’t have any expertise in this area.” So, Linus asked some of the [kernel] old-timers to help make a decision, and the preference and the final decision was to go with the “Device Mapper” code from the team that developed the 2.4 LVM.

Now, normally what would happen in the free software world is a fork: the team that lost would go into a huddle and maintain their own code forever, maintaining that the other guy’s code is a load of crap. But EVMS was from IBM, so there weren’t any childish reactions. The IBM team dropped their code and honored Linus’ decision.

My understanding is that IBM had a large body of code, but they’re now reworking it on top of the adopted Linux code. IBM had the end-result in mind, even if the features of the new Device Mapper code aren’t as rich.

LM: How much code comes from corporate developers versus volunteers?

Morton: A minority comes from corporations, although it is a large minority. Of course, a lot of the code that comes in from corporations is architecture-specific. By proportion, I’d say about 30 percent of new code comes from corporate developers.

Now, I imagine that many of the volunteers are effectively paid to be volunteers, as I am now. But lots of developers do work in the evening in their spare time.

LM: What are the hurdles to doing development on the kernel?

Morton: The kernel’s simply become bigger and more complex. In some ways things are getting a bit better — I’m fascist about code comments and documenting locking rules — but there is an assured lack of big-picture, “how it all hangs together” materials. Now, I really don’t see a way to reduce the curve. It’s just the way it is.

LM: Is there a dearth of kernel developers, then?

Morton: Maybe. We’re thin at the top. The loss of Linus, Viro, Axboe, Miller, or me would be a big dent. But people do pick the kernel code up.

A number of the IBM developers who were new to the kernel at the outset of 2.5 are now strong contributors. A year’s full-time effort for an experienced systems programmer seems be the cost of getting fully up to speed.

There are various bits of the kernel that really do need someone to pick them up: devfs, the block loop driver, the TTY layer, and lots of device drivers. But we muddle through.

LM: Do you have any insights into the SCO v. IBM case?

Morton: No, not really. Some of us have sat around, scratching our heads, wondering, “What on Earth [is SCO] talking about?” Honestly, we couldn’t work it out, and we remain in wait-and-see mode.

If we had to pull out read-copy-update (RCU), that’d be a bit of a hassle. It would probably take a week or two to get the kernel tree back into shape, as RCU is used in a number of places. JFS has been mentioned as potentially offending code, but JFS code could simply be deleted with one command.

My expectation is that if SCO’s claim proves to be valid, the offending code will vanish from the tree within four hours, and then someone will reimplement the code from scratch in one or two days. The total value of the code? Perhaps $500.


Martin Streicher is the Editor-in-Chief of Linux Magazine.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62