I Have a Schedule to Keep – IO Schedulers

The Linux kernel has several different IO schedulers. This article provides an introduction to the concept of schedulers and what options exist for Linux.

NOOP IO Scheduler
The NOOP IO scheduler is a fairly simple scheduler. With this scheduler all incoming IO requests are put into a simple First-In, First-Out (FIFO) queue and then executed. Note that this happens for all processes running on the system regardless of the IO request (read, write, lseek, etc.). It also does something called request merging. This is a feature that takes adjacent requests and merges them into a single request. This reduces seek time and improves throughput.

According to this article, the NOOP scheduler “… uses the minimal amount of CPU/instructions per I/O to accomplish the basic merging and sorting functionality to complete the I/O.” The IO scheduler assumes that some other device will optimize the IO performance. For example, an external RAID controller or a SAN controller could perform this optimization.

Potentially, the NOOP scheduler could work well with storage devices that don’t have a mechanical component to read data (i.e. the drive head). The reason is that the NOOP scheduler does not make any attempts to reduce seek time beyond simple request merging which also helps throughput. So storage devices such as flash drives, SSD drives, USB sticks, etc. that have very little seek time could benefit from using a NOOP IO scheduler.

Anticipatory IO Scheduler
The Anticipatory IO Scheduler, as the name implies, anticipates subsequent block requests. It implements request merging, a one-way elevator (basically an elevator), and read and write request batching. After the scheduler services an IO request, it anticipates that the next request will be for the subsequent block by pausing for a small amount of time. If the request comes, the disk head is in the correct location and the request is very quickly serviced. This approach does add a little latency to the system because it pauses slightly to see if the next request is for the subsequent block. However, this can possibly be out-weighed by the increased performance for neighboring requests.

Putting on your storage expert hat one can see that the anticipatory scheduler works really well for certain workloads. For example it has been observed that the Apache web server may achieve up to 71% more throughput using the anticipatory IO scheduler. On the other hand, it has been observed that the anticipatory scheduler has caused up to a 15% slowdown on a database run.

Deadline IO Scheduler
The Deadline IO Scheduler was written by Jens Axboe, a well know kernel developer. The fundamental principle of the scheduler is to guarantee a start time for servicing an IO request. It combines request merging, a one-way elevator, and imposes a deadline on all requests (hence the name). It maintains two deadline queues in addition to the sorted queues for reads and one for writes. The deadline queues are sorted by their deadline times (time to expiration) with shorter times moving to the head of the queue. The sorted queues are sorted based on their sector number (the elevator approach).

The deadline scheduler really helps for the cases of a remote read (remote meaning fairly far out on the disk or with a large sector number). Reads can sometimes block applications because they have to be actually read from the disk while the application waits. On the other hand writes can be quickly returned to the application because they are in the cache (unless you turn off the cache or use Direct IO). Even worse, remote reads get serviced very slowly because they constantly get moved to the back of the queue as requests for closer parts of the disk get serviced first. So, the deadline scheduler makes sure that all IO requests are serviced, even these distant read requests.

The general process of the scheduler is fairly straight forward. The scheduler decides on the next request by first deciding which queue to use. It keeps a higher priority to reads because, as mentioned, applications usually block on read requests. Next, it checks the first request to see if it has expired. If so, it is immediately serviced. Otherwise the scheduler serves a batch of requests from the sorted queue. For both cases, the scheduler also services a batch of requests following the chosen request in the sorted queue.

The deadline scheduler is very useful for some applications. In particular, real-time systems use the deadline scheduler because in most cases, it keeps the latency low (all requests are services within a short time frame). It’s also been suggested that it also works well for database systems that have TCQ aware disks.

CFQ IO Scheduler
The Completely Fair Queue (CFQ) IO scheduler, is the current default scheduler in the Linux kernel. It uses both request merging and elevators. It synchronously puts requests from processes into a number of per-process queues. Then it allocates timeslices for each of the queues to access the disk. The details of the length of the time slice and the number of requests a queue is allowed to submit, are all dependent on the IO priority of the given process. Asynchronous requests for all processes are batched together in a fewer number of queues with one per priority.

Jens Axboe is the original author for the CFQ IO scheduler and it incorporates something that Jens called the “elevtor linus“. It develops upon idea of an elevator by adding features to prevent starvation for worst case situations as could happen with distant reads. The previous link is also a good discussion of the design of the CFQ IO scheduler and the intricacies of scheduler design (also it discusses the design of the deadline scheduler – it’s well worth reading).

What CFQ does, is to give all users (processes) of a particular device (storage) about the same number of IO requests over a particular time interval. This can help multi-user systems since all users will see about the same level of responsiveness. More over, CFQ achieves some of the good throughput characteristics of the anticipatory scheduler because it allows a process queue to have some idle time at the end of a synchronous IO request creating some anticipatory time waiting for some IO that might be close to the just finished request.

Changing the Scheduler

The 2.6 kernel series actually allows you to change the IO scheduler in several ways. For example, you can change the default scheduler for the entire system using the “elevator=” option at the “kernel” line during the boot process or in the grub configuration. This can be done manually during boot or can be done in the grub configuration file.

If you change the default IO scheduler by editing grub, be sure to edit the /boot/grub/menu.lst file adding the option “elevator=” to the end of the line. For example, you could change it from cfq to deadline by adding the option “elevator=deadline” to the line that begins with “kernel”. If you change it, be sure to run the “update-grub” command afterwards.

A second way to change the IO scheduler is to actually change it on the fly for specific devices. For example, you can determine which IO scheduler is being used by looking at the file, ” /sys/block/[device]/queue/scheduler” where [device] is the name of the device. For example, on my laptop,

root@laytonjb-laptop:~# cat /sys/block/sdb/queue/scheduler
noop anticipatory deadline [cfq]

Notice that the current IO scheduler is cfq. You can change the scheduler by just echoing the name of the desired scheduler to “/sys/block/[device]/queue/scheduler”. For example, I can change the IO scheduler on my laptop to deadline.

root@laytonjb-laptop:~# echo deadline > /sys/block/sdb/queue/scheduler
root@laytonjb-laptop:~# cat /sys/block/sdb/queue/scheduler
noop anticipatory [deadline] cfq

Notice how the IO scheduler has changed to deadline. When the change in scheduler is performed the “old” scheduler completes all of it’s requests before control switches over to the new scheduler (ain’t Linux grand?).

Summary

This article is just a quick introduction to IO schedulers in Linux. Today’s systems can have large number of users, very IO intensive workloads, requirements for high levels of interactivity, real-time requirements, plus a large number of disks and/or file systems. Given the enormous strains that current systems impose on IO subsystems, some way of controlling IO requests is mandatory. This is where the IO scheduler comes helps.

The IO scheduler is not a new concept, but it is a very important one. These schedulers can be designed to influence IO and system behavior in whatever manner you desire. Currently there are four IO schedulers in the Linux kernel: (1) NOOP, (2) Anticipatory, (3) Deadline, and (4) Completely Fair Queue (CFQ). Various aspects of the schedulers were discussed at a fairly high level in this article.

While not discussed in this article you can tune the various schedulers for your workload. Take a look at the documentation that comes with the source for your current kernel. For example, on my laptop, the documentation is found in the directory, /usr/src/linux-source-2.6.27/Documentation/block. In addition, there are a great number of articles around the web that discuss tuning.

One easy thing you can try is changing the IO scheduler associated with a particular device. It’s an easy process that just echos the name of the IO scheduler to the particular file in the /sys file system. This is fairly easy to do and can give you some interesting results (hint, hint).

Jeff Layton is an Enterprise Technologist for HPC at Dell. He can be found lounging around at a nearby Frys enjoying the coffee and waiting for sales (but never during working hours).

Comments on "I Have a Schedule to Keep – IO Schedulers"

speed145a

Very informative!

One question: when I cat /sys/block/sda/queue/scheduler it shows additional entries:

$ cat /sys/block/sda/queue/scheduler
noop fifo anticipatory [deadline] cfq vr

Reply
laytonjb

@speed145a

Glad you liked the article. I\’m hoping to expand a little more in future articles to talk about how to tune various schedulers and how to measure the impact on workloads (probably just some benchmarks).

I\’m not sure where the other two schedulers come from. A quick google didn\’t turn up too much. Can you tell us about your distro and perhaps what the system is doing?

Thanks!

Jeff

Reply
speed145a

I also tried a google search without any luck :-)

I\’m running Arch with the 2.6.31-zen1 kernel. I\’m sure it\’s something in the ZEN kernel… or perhaps something to do with the BFS in the kernel.

Reply
laytonjb

@speed145a,

I did some things around Xen but I wasn\’t sure. In particular i did find references to vr-sched.c but I didn\’t look at the code or find details of the scheduler. But I do think it\’s around Xen somehow but I don\’t know the purpose of the additional schedulers. Might be worth posting to an archlinux group.

Jeff

Reply
suresh17

Do we really need to run update-grub after modifying menu.lst (it\’s not lilo) Isn\’t?

Reply
suresh17

BTW, very nice articles by Jeff on Linux Magazine.. Actually I have become a fan of you! Thanks Jeff!

Reply
richarson

A little too late, but it might be useful anyway :)

It\’s not Xen but Zen: a kernel specifically patched to behave better on desktops, it includes several different patches and that\’s where the extra schedulers come from.

Here are some URLs:

http://www.zen-kernel.org/
http://www.zen-kernel.org/about
http://www.zen-kernel.org/included-code

Reply
richarson

@suresh: it\’s a Debian (and Ubuntu, etc.) thing, you don\’t modify the boot parameters directly but some sort of a token (kopt) and then update-grub updates all your kernel entries.

Reply
suresh17

@richarson: ah, ok. Thanks! I wasn\’t aware of this.

Reply
grabur

Great introductory article.

Deadline here on Ubuntu 8.04.I\’d be interested to know the advantages/benefits of using specific schedulers, and how the Linux ones compare with other OS kernel schedulers, is Linux an innovator here?

Reply
idallen

Re: Using NOOP on an SSD.
I would worry that the lack of write ordering in NOOP would mean *more* writes to the SSD, since only adjacent requests would be merged. The other schedulers would re-order all the outstanding writes and merge all the adjacent requests.

Reply

Leave a Reply to idallen Cancel reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>