The Linux kernel has several different IO schedulers. This article provides an introduction to the concept of schedulers and what options exist for Linux.
It almost goes without saying that the Linux kernel is a very complex piece of software. It is used in embedded devices that need real-time performance, hand-held devices, laptops, desktops, general servers, database servers, video servers, DNS systems, very large supercomputers, and on and on. All of these uses for the kernel have very different requirements. Some require the system be responsive to user input so you don’t interrupt streaming music or video or other interactivity. At the same time there are requirements for good IO performance (throughput, IOPS, etc.) and for some workloads, these requirements are very high. To make sure that there is balance within the system for all users and processes, there is a concept of schedulers within the kernel.
The schedulers do exactly what the title says — they schedules activities within the kernel. Since this column is all about IO, the scheduler of interest is, aptly enough, the IO scheduler. This article discusses the IO scheduler concepts and the various options that are available.
Introduction – IO Scheduler Concepts
Virtually all applications running on Linux do some sort of IO. Even surfing the web produces a great number of small files that are written to disk. Without an IO scheduler, every time there is an IO request, there is an interrupt to the kernel and the IO operation is performed. More over, you can get a great mix of IO operations that move the disk head around the disk to satisfy read and write operations to different blocks on the drives. Perhaps more importantly, over time the disparity in the performance of disk drives and the rest of the system has grown very rapidly meaning that IO has become more important to overall system performance. As you can imagine when the kernel has to address the interrupt so any kind of processing or interactive work is paused. Consequently the system may appear unresponsive, or it may appear that the system has slowed down.
How do you schedule the IO requests to preserve the interactivity while also ensuring good IO performance? The answer, as with most things, depends upon the workload. In some cases it would be nice to be able to do IO while doing other things. In other cases, it is desired to do IO as fast as possible. To balance these two very different workloads or to ensure that one workload is not emphasized for others (unless you intend it that way), the concept of the IO scheduler was born (actually it’s a pretty old concept).
Scheduling IO events has many pieces to it that must be addressed. For example, the scheduler may need to store the events for some future execution in some sort of queue. How it stores the events, possibly reordering the events, the length of time it stores the events, does it execute all stored events when some condition is reached, does it execute events at some regular interval, etc, are all very crucial aspects of the scheduler. Exactly how these various aspects of the scheduler are implemented can have a huge impact on the overall IO performance of the system and the perception people have when interacting with the system.
Defining the function or role of the system is probably the best place to start when considering scheduler design or tuning existing schedulers. For example, you should know if the target system is an embedded device, a hand-held device, a laptop, desktop, server, supercomputer, database server, video server, and on and on. Knowing this allows you to define what your goals are for the scheduler.
For example, if the target system is a desktop that is doing some web surfing as well as perhaps watching a video or listening to music, and maybe even playing a game. Seems simple, but this has enormous implications. For example, if you watching a video or listening to music or playing a game, you don’t want it to be interrupted and you don’t want any frames to be dropped. Nothing like a video that pauses, plays, pause, plays, to make you sea-sick in a hurry. Or you might be ready to blow the head of a mutant zombie and the system pauses while you are firing and when the system comes back up the zombie has removed your character’s head. And while “stuttery” music may be a genre to some, in general, it’s quite annoying. So, if your target system is a desktop and you want to have as little interactive interruption as possible, then this has a great influence on the design of the scheduler.
One important advantage that IO scheduling gives the system is that it allows you to store events and even possibly reorder them for faster IO. Since the time it takes disk IO to happen can be much slower than other aspects of the system, this can produce contiguous IO requests which can improve performance. Newer file systems are even incorporating some of these concepts so that they can reorder the operations to make things easier and faster for the storage devices. You can even extend these concepts to make the system better adapt to the unusual properties of SSDs.
There are some typical techniques that can be to help IO schedulers. These techniques are:
- Request Merging: In this concept, adjacent requests are merged together to reduce disk seeking and to increase the size of the IO syscalls (usually resulting in higher performance).
- Elevator: The requests are ordered based on their physical location on the disk so that the seeks are in one direction as much as possible.
- Prioritization: This allows the requests to be put into some sort of priority order. The details of the ordering are up to the IO scheduler.
In addition almost all IO schedulers take into account resource starvation so that all requests are eventually serviced.
Linux IO Schedulers
There are currently four IO schedulers in the Linux kernel.
- NOOP
- Anticipatory
- Deadline
- Completely Fair Queuing (CFQ)
Comments on "I Have a Schedule to Keep – IO Schedulers"
Very informative!
One question: when I cat /sys/block/sda/queue/scheduler it shows additional entries:
$ cat /sys/block/sda/queue/scheduler
noop fifo anticipatory [deadline] cfq vr
@speed145a
Glad you liked the article. I\’m hoping to expand a little more in future articles to talk about how to tune various schedulers and how to measure the impact on workloads (probably just some benchmarks).
I\’m not sure where the other two schedulers come from. A quick google didn\’t turn up too much. Can you tell us about your distro and perhaps what the system is doing?
Thanks!
Jeff
I also tried a google search without any luck :-)
I\’m running Arch with the 2.6.31-zen1 kernel. I\’m sure it\’s something in the ZEN kernel… or perhaps something to do with the BFS in the kernel.
@speed145a,
I did some things around Xen but I wasn\’t sure. In particular i did find references to vr-sched.c but I didn\’t look at the code or find details of the scheduler. But I do think it\’s around Xen somehow but I don\’t know the purpose of the additional schedulers. Might be worth posting to an archlinux group.
Jeff
Do we really need to run update-grub after modifying menu.lst (it\’s not lilo) Isn\’t?
BTW, very nice articles by Jeff on Linux Magazine.. Actually I have become a fan of you! Thanks Jeff!
A little too late, but it might be useful anyway :)
It\’s not Xen but Zen: a kernel specifically patched to behave better on desktops, it includes several different patches and that\’s where the extra schedulers come from.
Here are some URLs:
http://www.zen-kernel.org/
http://www.zen-kernel.org/about
http://www.zen-kernel.org/included-code
@suresh: it\’s a Debian (and Ubuntu, etc.) thing, you don\’t modify the boot parameters directly but some sort of a token (kopt) and then update-grub updates all your kernel entries.
@richarson: ah, ok. Thanks! I wasn\’t aware of this.
Great introductory article.
Deadline here on Ubuntu 8.04.I\’d be interested to know the advantages/benefits of using specific schedulers, and how the Linux ones compare with other OS kernel schedulers, is Linux an innovator here?
Re: Using NOOP on an SSD.
I would worry that the lack of write ordering in NOOP would mean *more* writes to the SSD, since only adjacent requests would be merged. The other schedulers would re-order all the outstanding writes and merge all the adjacent requests.