Linux 2.4 has been out for a year now, and the 2.5 release is just around the corner.
Every January, Linux Magazine surveys the state of the Linux kernel, bringing our readers up to date on what new kernel features and improvements to expect in the year ahead. For 2002, the crystal ball is cloudier than usual because at press time, work on Linux 2.5 has not yet begun. Nevertheless, some definite and tentative plans had come to light. We spoke with several key kernel developers to learn more about their plans and hopes for Linux 2.5.
The Linux 2.4 Kernel
The current version of the Linux kernel is 2.4, which was first released about one year ago. Most Linux distributions, including Red Hat 7.2, are now based on Linux 2.4. At press time, the fourteenth release of the Linux 2.4 kernel (2.4.14) had just been announced. The Linux 2.4 kernel introduced a variety of features. Among the most significant were:
- Improved support for multiprocessor systems and systems with large amounts of RAM, facilitating enterprise use of Linux
- Improved support for NFS 3, making Linux NFS more compatible with other NFS implementations
- Support for network packet filtering, making it possible to build sophisticated Linux firewalls that protect systems and networks against unauthorized use
- Support for framebuffer video, a simple and flexible interface between Linux and graphics adapters
- Support for ACPI power management, which facilitates the use of Linux on portable PCs
- Support for USB, making it possible to use a variety of USB devices with Linux
- A kernel-based HTTP daemon, making it possible for Linux to serve static Web pages very quickly.
During the year or so since Linux 2.4 was released, kernel developers have continued to extend and improve it. For example, the kernel’s virtual memory system has been redesigned to better utilize the relatively large amounts of RAM often installed today. And, support for high-performance filesystems has improved. Both ReiserFS and EXT3 filesystems are now widely used. Red Hat Linux, for example, now creates EXT3 rather than EXT2 partitions by default. These filesystems are more robust and efficient than the previously dominant EXT2 filesystem.
The virtual memory changes proved to be controversial. The 2.4 kernel originally shipped with virtual memory (VM) code written by Conectiva developer Rik van Riel. But van Riel’s code continued to face performance problems even after months of widespread use. So in last September’s 2.4.10 release, Linus Torvalds abruptly dropped van Riel’s code in favor of a new hack, this one written by SuSE developer Andrea Arcangeli.
Torvalds’s move upset a number of kernel developers who felt that something as major as a virtual memory rewrite should have been saved for the developmental 2.5 version of the kernel, rather than being dumped midstream in the supposedly stable 2.4 branch. (For an explanation of why he did this, see the Q&A with Linus Torvalds sidebar, pg. 16). Alan Cox, however, did not initially adopt Arcangeli’s VM code into his “-ac” version of the kernel, which caused some confusion about which VM was actually part of Linux and lent credence to the inevitable Cassandra prophecies that Linux was forking.
By November, Cox had stepped down as maintainer of the stable 2.4 kernel release (he says, “the VM had nothing to do with the decision”) and the 2.5 kernel had still not appeared. An 18-year-old Conectiva developer, Marcelo Tosatti, was subsequently selected to replace Cox, and it remains to be seen whether or not he can inspire the confidence that Cox had earned.
The worst part of the whole VM debacle is probably the fact that it has delayed the release of the developmental 2.5 kernel. Developers are champing at the bit for a forum to release their newest, coolest kernel hacks, and until 2.5 is released, there is simply no central place for them to integrate their work into what will eventually become standard Linux. Noted kernel hacker Paul “Rusty” Russell says the politics over what VM code was eventually chosen are not an issue. “We’re all impatient for 2.5,” he says. “The VM work has probably slowed that down, but other than that, it’s not a core concern for most of us.”
2.5 and Beyond
Because the developmental 2.5 kernel had not been released at press time, it is impossible to get a definite picture of everything that is being worked on for it. Interesting projects tend to jump out of the woodwork when the odd-numbered kernels are released. After pestering a number of kernel developers and observers, Linux Magazine sniffed out a number of very promising development projects that are currently being targeted for the 2.5 kernel.
Increasingly, the Linux kernel team is being reinforced by developers from system vendors like IBM, Compaq, and SGI, who are looking to add the kind of enterprise features that will help Linux compete with the proprietary Unices. With the 2.4 kernel, this led to improved SMP and large memory support. With 2.5, the results will be improvements like Asynchronous I/O, clustering, and improved NUMA (Non-Uniform Memory Architecture) support.
In the opinion of kernel developer Alan Cox, one of the most interesting areas of development will involve clustering technologies. “Fault tolerant clustering and single system image stuff is one of the areas that Compaq is currently working on, and it offers a lot of promise for the enterprise,” he says. In June, Compaq began releasing code from its proprietary NonStop clusters for Unixware into the Linux community. Since then, it released its Single System Image clustering technology, which gives administrators a single view of multiple clustered servers. At the kernel level, the challenge for Compaq — and for all system vendors who would like to see Linux become a viable high-availability and clustering platform — will be to create a generic framework that would enable a variety of clustering technologies to work with Linux. It remains to be seen whether or not this can be achieved in the 2.5 timeframe.
Asynchronous I/O, called “the Holy Grail of performance” by kernel maintainer Marcelo Tosatti, is one of the most eagerly anticipated developments for the new kernel. It allows kernel developers to write code that initiates an input/ output operation but continues to execute while the operation is performed.
A special system call lets a process check whether or not a previously initiated input/output operation has completed. Async I/O lets kernel developers write code that executes faster, improving the performance of the Linux kernel. Red Hat Kernel Engineering Manager Michael K. Johnson says that his company is looking forward to asynch I/O for a number of reasons.
“The poll() system call is pretty efficient, but it still introduces synchronization points and multiplexing overhead that for some applications can be radically reduced by asynchronous I/O,” he says. Johnson adds that applications that need asynchronous I/O will no longer need to use full-blown threads to achieve this, which will make them easier to debug and more scalable.
A kernel release number, such as 2.4.12, tells you quite a bit about the release. That release number has three parts:
- the major release number (2)
- the minor release number (4)
- the patch level (12)
Major releases connote fundamental change in the kernel architecture. So far, there have been only two major releases of the Linux kernel. Minor releases connote less radical architectural change, though even minor releases of a Linux kernel version can bring significant improvements in performance and features. So far, version 2 of the Linux kernel has had four minor releases. The patch level of a kernel denotes an official release that groups a set of patches (fixes and modifications) made to the kernel.
Generally, work on the Linux kernel proceeds along two parallel paths. A development kernel emphasizes cutting-edge features and performance. Consequently, development kernels resemble beta-test software; they sometimes contain troublesome bugs along with their cutting-edge features and performance. Development kernels are assigned an odd minor release number (the next development kernel will be Linux 2.5). Eventually, they form the basis for the next stable kernel release. A stable kernel emphasizes continuing refinement and eradication of defects. Stable kernels, which are assigned even minor release numbers, do not generally contain serious bugs.
The stable code from the Linux 2.5 kernel will eventually become either the Linux 2.6 or 3.0 kernel, depending on whether or not Linus Torvalds sees the upgrade as being truly major.
There are other I/O improvements planned for the 2.5 kernel, including increasing the size of Linux’s raw I/O transfers from its current 64 KB units and plans to revise the block I/O system. This work should improve the kernel’s ability to handle enterprise-scale applications involving many devices and very high-speed data transfers.
Linux’s non-uniform memory architecture (NUMA) is also expected to improve with 2.5. Though Linux does run on a number of NUMA machines today, it is still important for developers to work on a solid NUMA-aware scheduler. A general system for discovering the layout of NUMA systems (i.e., which CPUs belong to which nodes) and relaying this information to the Linux kernel is something else that will be needed.
IBM Software Engineer Patricia Gaughen says that, right now, “most NUMA work is focused on getting better per-formance on machines with non-uniform memory access times.” She says that the interesting thing about Linux’s NUMA community is that it has managed to bring together contributors from a variety of Unix backgrounds and that this diversity, “is helping make the NUMA solution for Linux much more robust than on any one proprietary Unix.”
Linux to the MACs
On the security front, the National Security Agency is sponsoring the development of SELinux, or Security-Enhanced Linux, a project the NSA says is designed to be a reference implementation to, “influence not only the Linux developer community, but also other operating system maintainers.”
SELinux involves some rather invasive changes to the Linux kernel, particularly the addition of mandatory access controls (MAC). The SELinux team argues that MAC should reduce the scope of the damage that a malicious process could actually do on a Linux system. MAC is important, for instance, to systems that store classified information.
A MAC implementation defines a hierarchy of data sensitivity — for example, public, confidential, classified, and top secret. Under MAC, the system — not the user — decides who specifically is allowed to access data. This prevents users from inappropriately publishing sensitive data.
Another kernel-level security mechanism that is featured in SELinux is the access control list (ACL). Many people are already familiar with ACLs (they’re used in Windows NT/2000, too), which let a user protect an object — a file, say — by specifying exactly what kind of operations a specific user may perform on it. Essentially, access control lists extend the standard Unix security model by allowing the specification of many distinct operations rather than merely the Unix read-write-execute triad. See our “Security Enhanced (SE) Linux” article (http://www.linux-mag.com/2001-09/se_linux_01.html) in the September 2001 issue for more on SELinux.
Want to know when 2.6 may be released? Don’t hold your breath. Linus says there may not even be a Linux 2.6.
Linux Magazine: It seems like a lot of the work that will be happening on 2.5 is for really big machines and enterprise users. What do you see as the next great kernel enhancement or enhancements that will affect so-called normal users?
Linus Torvalds: Well, most of the work there will certainly be in user space. Normal users simply don’t need very much kernel support any more. The exceptions are things like device drivers and other hardware support. Those are continually maintained and updated as new stuff comes along.
LM: Are there any problems that you are starting to think of as 3.0-type things? Do you have a sense of what the major changes for the 3.0 kernel will be and of how far off it is?
LT: There may never be a 2.6. It may be that the next stable release is called 3.0. The 3.0 requirements would be stuff like good NUMA (Non-Uniform Memory Architecture) support, and that obviously depends a bit on hardware availability. We’ll have more and more NUMA code during 2.5.x, but whether or not it is “good” will depend on if there are enough machines out there to make real testing feasible.
LM: In retrospect, was it a mistake to include Rik van Riel’s VM code in the 2.4 release? Some have suggested that when you dropped van Riel’s code in favor of Andrea Arcangeli’s, you should have made that the starting point for the experimental 2.5 kernel. Why didn’t you?
LT: I don’t think there were any real mistakes in the VM side of things. It was just unfortunate that many of the fundamental issues ended up being uncovered during an existing 2.4.x instead of long before. C’est la vie.
2.2.x actually saw many of the same kind of issues — if not to quite the same degree. With 2.2, people may not have been aware of just how much the VM changed, because there wasn’t the same kind of “excitement” with competing VMs as with the 2.4 kernel.
As to making a 2.5.x earlier and doing the VM work there — we could have done it, but would then have had to back-port it to 2.4.x anyway, and it would have had its own set of problems. It would have been the more structured approach, but it would probably also have caused a lot less people to work on it and test it.
It was just incredibly useful to have a number of people who were doing serious benchmarking under their own loads and doing the comparative analysis — things that might well not have happened as much if we had done it in a development series.
It was certainly disruptive, but the new VM is a much nicer base to work off.
LM: What does Alan Cox’s abdication of his 2.4 maintainer role really mean to the development of the kernel? Why did he step down?
LT: “Why” is the much easier question; he wasn’t thrilled about doing maintenance for another few years in the first place — he’s been doing it for a long time, after all — and the fact that 2.4.x dragged on gave him time to decide that he would rather avoid it.
We had already agreed on Marcelo as the VM maintainer a few months ago, which probably made it a bit easier for Alan to step down. As mentioned, the VM was the biggest part of 2.2.x maintenance: not in lines-of-code — drivers always overwhelm everything else in that respect — but in terms of really serious issues.
As to what it means for the development of the kernel — it might actually mean that Alan has more time to do significant kernel development instead of just continuing to work on maintenance. So for the development side, I don’t see any real problems.
So I’m not worried. It would have been very convenient to have Alan continue doing maintenance — and not really so much for technical reasons, but simply because people are used to him. I don’t have any technical worries about the change of maintainership, but I do realize that it will take people some time to adjust to it and to get used to how Marcelo works.
Alan will still be around for the conversion process, so I doubt it will be all that painful.
In an effort to improve Linux’s network performance, developers Jamal Hadi Salim, Robert Olsson, and Alexey Kuznetsov have released a new networking patch (called New API or NAPI) that is designed to speed up Linux’s performance, particularly under high-networking loads. Analysis of experimental implementations employing the team’s code showed a tenfold improvement in the number of packets that a Linux system can handle. Based on their work, it seems rather likely that Linux 2.5 will provide much more efficient network input/output than the Linux 2.4 kernel, whose performance is already formidable when compared to that of the majority of other operating systems.
NAPI is not the only networking feature that could wind up in Linux 2.5. La Monte Yarroll is working on a Linux-based reference implementation of the Stream Control Transmission Protocol (SCTP), which is a high-level protocol designed to support streaming media. One of SCTP’s innovative features is automatic failover when network conditions result in missing or delayed packets. La Monte’s work, along with that of the other developers who are working on SCTP, is important in ensuring that the Linux community does not fall behind in supporting this emerging IETF standard, which appears to be gaining widespread acceptance.
Kernel developers are also working at removing unnecessary global locks. One challenge to operating system developers is supporting large-scale SMP (symmetric multiprocessing) without simultaneously diminishing performance on smaller systems. Most Linux kernel developers are especially concerned with this important tradeoff.
If this effort is successful, you can expect to see improved Linux performance on high-end systems without any decrease in performance on the more mundane PCs that are used by the majority of Linux users. Red Hat’s Johnson says, “This is generally going to be done by changing data structures in order to eliminate the need for locks or redo the amount of code under locks.”
Will all of these projects make it into the next stable kernel? Probably not. But in the ever-changing world of the Linux kernel, a brief snapshot is the best one can ever really hope for.
Other noteworthy 2.5 hacks:
- User Mode Linux Port — Lets the Linux kernel run within Linux as a user-mode process. For more on User Mode Linux, see our April 2001 issue (http://www.linux-mag.com/2001-04/user_mode_01.html).
- build 2.5 and CML2 — A new build and kernel configuration system.
- Distributed File System Support — InterMezzo, OpenGFS, and NVSv4.
- Hotplugging for a variety of device types, including USB, SCSI, PCI, and FireWire.
Bob McMillan is editor at large with Linux Magazine, and Bill McCarty is an associate professor at Azusa Pacific University. Bob can be reached at [email protected], and Bill can be reached at [email protected].
Linux Magazine /
January 2002 / FEATURES
The State of The Kernel