For reasons that I don't really understand, November seems to be disk month for me. A year ago in this column, we looked at the Linux Logical Volume Manager, which allows you to combine and subdivide sets of disks in arbitrary ways. This month, we will consider disk striping while focusing primarily on how this is provided by the Linux disk striping facility.
For reasons that I don’t really understand, November seems to be disk month for me. A year ago in this column, we looked at the Linux Logical Volume Manager, which allows you to combine and subdivide sets of disks in arbitrary ways. This month, we will consider disk striping while focusing primarily on how this is provided by the Linux disk striping facility.
Disk striping is a technique for replicating or dividing I/O operations among multiple disks to meet performance or fault tolerance goals. To accomplish this, the Linux disk striping facility implements the RAID standard. RAID stands for Redundant Array of Inexpensive Disks.
The most prevalent form and usage of RAID is I/O fault tolerance. A RAID device consists of two or more physical disks that are combined into a single abstract unit as far as the rest of the system is concerned. The individual disks can be combined in several ways:
- They can be housed in a separate unit containing not only the disks but also the controller and accessing hardware. This standalone device plugs into the computer in the same way as any other external disk. This offers the best performance at the worst (highest) price.
- They can be a series of internally installed disks that are all attached to a high-end RAID controller card, which performs the disk striping independent of the operating system.
- The disks may also be completely autonomous from a hardware viewpoint, which means that data is striped by system software. This requires kernel support to function. This last option is what Linux software RAID provides.
RAID can be implemented in several different variations. These variations are known as RAID levels. The levels supported by the Linux operating system are:
Linear RAID: Multiple disks are combined into a single logical unit. This is conceptually similar to volume groups or logical volumes when using a Logical Volume Manager. Since the latter is a superior way to achieve the same result, we will not consider linear RAID further.
RAID 0: This is simple disk striping where each I/O operation is split into the same number of chunks as there are disks in the RAID device. Each chunk is then written to the appropriate disk in parallel. Theoretically, this could speed up the I/O operation proportionate to the number of disks in the striped set. For example, I/O could be three times as fast as a three-way striped disk as to a single disk. In practice, since RAID does involve some overhead, perfect speedups are not attained, but Linux software RAID can still achieve 75 to 80 percent (plus) of the theoretical maximum. Ideally, each component disk in a striped set will be placed on its own disk controller in order to maximize throughput.
Be aware that RAID 0 is designed for I/O performance and provides no fault tolerance against data loss due to disk failures. In fact, if just one component disk in the striped set fails, then all of the data will become inaccessible. You’re making backups, right?
RAID 1: This performs disk mirroring. A RAID 1 configuration is typically composed of two disks. Each write operation is performed on both disks, which then yields two identical copies of the data. If either disk should fail, the other disk will automatically take over with little or no effect on the system. Once the faulty disk is replaced, the mirror can be regene-rated. (However, regeneration places a significant load on the system, so it is best performed during periods of low demand). Ideally, the disks in a mirrored set should be placed on separate disk controllers to guard against controller and disk failure.
RAID 4:This is disk striping with a parity disk. The RAID set consists of three or more disks, one of which is used as a parity disk. Each time a write operation occurs, the data is split into N-1 chunks (where N is the number of disks in the set), which are then written to the separate disks in parallel. A parity block is also computed for the data and is written to the parity disk. The presence of the parity block allows all data to be reconstructed if any single disk should fail. Also, even if a disk fails, the RAID device can keep functioning (albeit slower) as long as the parity disk is there.
|Figure One: How data is written to a RAID 5 device.|
RAID 5: This is disk striping with a rotating parity block. It’s very similar to RAID 4. However, in this case, the parity block is computed as for RAID 4, but its placement rotates among the component disks in the set (as illustrated in Figure One). This prevents the parity disk from becoming a bottleneck during write operations. Also, like RAID 4, the disk space consumed for fault tolerance is only 1/Nth of the available space (where N is again the number of disks in the set).
RAID 0+1:This is a hybrid RAID level where an array of striped sets (RAID 0) is organized under a mirrored set (RAID 1). When data is written, the RAID 1 layer sees it first and mirrors it across to the RAID 0 striped sets it’s attached to. This RAID variation provides high performance and the fault tolerance equivalent of RAID 5.
RAID 1+0: This is a backward version of RAID 0+1. This time it’s a striped set that has mirrored sets as its segments. It provides equivalent performance and slightly better fault tolerance in that it is easier to rebuild the RAID device after a single disk failure (since the data on only one disk is affected).
Table One summarizes a few of the advantages and disadvantages of the RAID levels 0, 1, and 5, which are the most common configurations.
Table One: Advantages and Disadvantages of the Various RAID Levels
|RAID 0||Best large transfer I/O bandwidth, No loss of storage capacity||No fault tolerance|
|RAID 1||Best data redundancy, Good performance on small transfers||Least efficient disk usage|
|RAID 5||Best trade-off of fault tolerance, Optimizes I/O operations/sec||Significant kernel overhead and disk usage|
In order to use the Linux software RAID facility, you must install the component disks, enable RAID support in the kernel, and set up the RAID configuration. We’ll look at the last two items individually.
|Figure Two: Enabling RAID support in the Linux kernel.|
Figure Two illustrates the kernel parameters that must be enabled to provide RAID support. The dialog in the figure is a result of the make xconfig command, and the front window comes up when you choose the Block Devices category from the main menu. The Multiple devices driver support item must be enabled to access all of the other RAID-related items that follow. In this case, we have enabled all of the available RAID options and specified that support be built into the kernel. However, many of the RAID levels can also be supported via kernel modules.
RAID devices use special files of the form /dev/mdn (where n is an integer) and are defined in the /etc/ raidtab configuration file. Once they are defined, you can create them using the mkraid command and start and stop them with the raidstart and raidstop commands. (These programs are from the raidtools package, which can be found at ftp://ftp.kernel.org/pub/ linux/daemons/raid/alpha). Alternatively, you can define them with the persistent superblock options, which enables automatic detection, mounting, and dismounting of RAID devices by the kernel.
The best way to understand the /etc/raidtab file is to examine some sample entries. Listing One shows an entry corresponding to a striped disk using two component disks that I have annotated.
Listing One: /etc/raidtab (RAID 0 with Two Component Disks)
raiddev /dev/md0 # Defines RAID device 0
raid-level 0 # RAID level
nr-raid-disks 2 # Number of component disks
chunk-size 64 # I/O chunk size (in K)
persistent-superblock 1 # Enable the persistent superblock feature
device /dev/sdc1 # Specify the first component disk …
raid-disk 0 # and number it
device /dev/sdd1 # Same for all remaining component disks
If we wanted to define a two-way mirrored set using the same disks instead, we would omit the chunk-size parameter and change the raid-level parameter from 0 to 1 in the first section. The rest of the entry would remain the same.
The entry in Listing Two defines a RAID 5 configuration containing five component disks as well as a spare disk to be automatically used should any of the active disks fail.
Listing Two: /etc/raidtab (RAID 5 with Five Component Disks and a Spare)
raid-level 5 # Use RAID level 5
nr-raid-disks 5 # Number of disks in the device
device /dev/sdc1 # Specify the 5 component disks
device /dev/sdh1 # Specify the spare disk
spare-disk 0 # You can use multiple spares if you want.
Finally, let’s consider creating a RAID 0+1 disk (mirrored striped disks) in Listing Three. This device is set up in the same way as any other RAID device. The only difference is that the component disks are themselves RAID devices. In this case, they will need to have been previously defined as RAID 0 devices.
Listing Three: /etc/raidtab (RAID 0+1)
device /dev/md0 # The component disks are RAID devices
Some Performance Considerations
We conclude this month’s brief look at Linux software RAID with a few general comments about performance/ cost tradeoffs.
First, be careful not to overload disk controllers when using software RAID as this will significantly degrade performance for all RAID levels. Putting disks on separate controllers is almost always a smart move.
Second, while software RAID 0 is a completely viable (not to mention cost-effective) method for increasing I/O performance, be aware that the chunk size you select is really important. The optimum value to choose is highly dependent on the typical I/O operation type and especially the transfer size the striped disk will see in normal use. Unfortunately, there is no substitute for preliminary testing (i.e. trial and error) when it comes to selecting the best value.
This is especially important for applications that perform very large sequential I/O operations for which the default value is quite poor. Also, software disk striping is really designed for two or three (or maybe four) disks. Beyond that, any additional performance gains are small and generally not worth the effort.
The sad fact is that if you want both high performance and fault tolerance, software RAID (especially RAID 5) is likely to be a poor choice. The additional overhead that RAID 5 places on the operating system is considerable at about 23 percent more CPU usage than required for normal I/O operations. The bottom line for RAID 5 is to spend the money to get a hardware solution; use software RAID 5 only if you can’t afford anything better.
If you don’t need RAID 5, though, software RAID is still a good alternative for just doing disk striping or simple two-way disk mirroring. It’s cheap (or even free) and with all things considered, performs reasonably well.
Æleen Frisch is the author of Essential System Administration. She can be reached at firstname.lastname@example.org.