The Linux kernel has several different IO schedulers. This article provides an introduction to the concept of schedulers and what options exist for Linux.
NOOP IO Scheduler
The NOOP IO scheduler is a fairly simple scheduler. With this scheduler all incoming IO requests are put into a simple First-In, First-Out (FIFO) queue and then executed. Note that this happens for all processes running on the system regardless of the IO request (read, write, lseek, etc.). It also does something called request merging. This is a feature that takes adjacent requests and merges them into a single request. This reduces seek time and improves throughput.
According to this article, the NOOP scheduler “… uses the minimal amount of CPU/instructions per I/O to accomplish the basic merging and sorting functionality to complete the I/O.” The IO scheduler assumes that some other device will optimize the IO performance. For example, an external RAID controller or a SAN controller could perform this optimization.
Potentially, the NOOP scheduler could work well with storage devices that don’t have a mechanical component to read data (i.e. the drive head). The reason is that the NOOP scheduler does not make any attempts to reduce seek time beyond simple request merging which also helps throughput. So storage devices such as flash drives, SSD drives, USB sticks, etc. that have very little seek time could benefit from using a NOOP IO scheduler.
Anticipatory IO Scheduler
The Anticipatory IO Scheduler, as the name implies, anticipates subsequent block requests. It implements request merging, a one-way elevator (basically an elevator), and read and write request batching. After the scheduler services an IO request, it anticipates that the next request will be for the subsequent block by pausing for a small amount of time. If the request comes, the disk head is in the correct location and the request is very quickly serviced. This approach does add a little latency to the system because it pauses slightly to see if the next request is for the subsequent block. However, this can possibly be out-weighed by the increased performance for neighboring requests.
Putting on your storage expert hat one can see that the anticipatory scheduler works really well for certain workloads. For example it has been observed that the Apache web server may achieve up to 71% more throughput using the anticipatory IO scheduler. On the other hand, it has been observed that the anticipatory scheduler has caused up to a 15% slowdown on a database run.
Deadline IO Scheduler
The Deadline IO Scheduler was written by Jens Axboe, a well know kernel developer. The fundamental principle of the scheduler is to guarantee a start time for servicing an IO request. It combines request merging, a one-way elevator, and imposes a deadline on all requests (hence the name). It maintains two deadline queues in addition to the sorted queues for reads and one for writes. The deadline queues are sorted by their deadline times (time to expiration) with shorter times moving to the head of the queue. The sorted queues are sorted based on their sector number (the elevator approach).
The deadline scheduler really helps for the cases of a remote read (remote meaning fairly far out on the disk or with a large sector number). Reads can sometimes block applications because they have to be actually read from the disk while the application waits. On the other hand writes can be quickly returned to the application because they are in the cache (unless you turn off the cache or use Direct IO). Even worse, remote reads get serviced very slowly because they constantly get moved to the back of the queue as requests for closer parts of the disk get serviced first. So, the deadline scheduler makes sure that all IO requests are serviced, even these distant read requests.
The general process of the scheduler is fairly straight forward. The scheduler decides on the next request by first deciding which queue to use. It keeps a higher priority to reads because, as mentioned, applications usually block on read requests. Next, it checks the first request to see if it has expired. If so, it is immediately serviced. Otherwise the scheduler serves a batch of requests from the sorted queue. For both cases, the scheduler also services a batch of requests following the chosen request in the sorted queue.
The deadline scheduler is very useful for some applications. In particular, real-time systems use the deadline scheduler because in most cases, it keeps the latency low (all requests are services within a short time frame). It’s also been suggested that it also works well for database systems that have TCQ aware disks.
CFQ IO Scheduler
The Completely Fair Queue (CFQ) IO scheduler, is the current default scheduler in the Linux kernel. It uses both request merging and elevators. It synchronously puts requests from processes into a number of per-process queues. Then it allocates timeslices for each of the queues to access the disk. The details of the length of the time slice and the number of requests a queue is allowed to submit, are all dependent on the IO priority of the given process. Asynchronous requests for all processes are batched together in a fewer number of queues with one per priority.
Jens Axboe is the original author for the CFQ IO scheduler and it incorporates something that Jens called the “elevtor linus“. It develops upon idea of an elevator by adding features to prevent starvation for worst case situations as could happen with distant reads. The previous link is also a good discussion of the design of the CFQ IO scheduler and the intricacies of scheduler design (also it discusses the design of the deadline scheduler – it’s well worth reading).
What CFQ does, is to give all users (processes) of a particular device (storage) about the same number of IO requests over a particular time interval. This can help multi-user systems since all users will see about the same level of responsiveness. More over, CFQ achieves some of the good throughput characteristics of the anticipatory scheduler because it allows a process queue to have some idle time at the end of a synchronous IO request creating some anticipatory time waiting for some IO that might be close to the just finished request.
Changing the Scheduler
The 2.6 kernel series actually allows you to change the IO scheduler in several ways. For example, you can change the default scheduler for the entire system using the “elevator=” option at the “kernel” line during the boot process or in the grub configuration. This can be done manually during boot or can be done in the grub configuration file.
If you change the default IO scheduler by editing grub, be sure to edit the /boot/grub/menu.lst file adding the option “elevator=” to the end of the line. For example, you could change it from cfq to deadline by adding the option “elevator=deadline” to the line that begins with “kernel”. If you change it, be sure to run the “update-grub” command afterwards.
A second way to change the IO scheduler is to actually change it on the fly for specific devices. For example, you can determine which IO scheduler is being used by looking at the file, ” /sys/block/[device]/queue/scheduler” where [device] is the name of the device. For example, on my laptop,
root@laytonjb-laptop:~# cat /sys/block/sdb/queue/scheduler
noop anticipatory deadline [cfq]
Notice that the current IO scheduler is cfq. You can change the scheduler by just echoing the name of the desired scheduler to “/sys/block/[device]/queue/scheduler”. For example, I can change the IO scheduler on my laptop to deadline.
root@laytonjb-laptop:~# echo deadline > /sys/block/sdb/queue/scheduler
root@laytonjb-laptop:~# cat /sys/block/sdb/queue/scheduler
noop anticipatory [deadline] cfq
Notice how the IO scheduler has changed to deadline. When the change in scheduler is performed the “old” scheduler completes all of it’s requests before control switches over to the new scheduler (ain’t Linux grand?).
This article is just a quick introduction to IO schedulers in Linux. Today’s systems can have large number of users, very IO intensive workloads, requirements for high levels of interactivity, real-time requirements, plus a large number of disks and/or file systems. Given the enormous strains that current systems impose on IO subsystems, some way of controlling IO requests is mandatory. This is where the IO scheduler comes helps.
The IO scheduler is not a new concept, but it is a very important one. These schedulers can be designed to influence IO and system behavior in whatever manner you desire. Currently there are four IO schedulers in the Linux kernel: (1) NOOP, (2) Anticipatory, (3) Deadline, and (4) Completely Fair Queue (CFQ). Various aspects of the schedulers were discussed at a fairly high level in this article.
While not discussed in this article you can tune the various schedulers for your workload. Take a look at the documentation that comes with the source for your current kernel. For example, on my laptop, the documentation is found in the directory, /usr/src/linux-source-2.6.27/Documentation/block. In addition, there are a great number of articles around the web that discuss tuning.
One easy thing you can try is changing the IO scheduler associated with a particular device. It’s an easy process that just echos the name of the IO scheduler to the particular file in the /sys file system. This is fairly easy to do and can give you some interesting results (hint, hint).