RAID and LVM: Part One

Preparing to implement RAID and LVM
This month’s column is the first of a three-part series on redundant array of independent disks (RAID) and logical volume management (LVM) technologies. This month’s “Guru Guidance” describes the basics of RAID and LVM — why you might use them and how you should prepare your system before deployment. Next month’s column will delve into RAID’s gory details, and the following column will cover LVM.
You should first understand what each technology does. RAID and LVM collectively solve three problems associated with disk storage: speed, reliability, and flexibility. RAID enhances speed and reliability by enabling you to combine partitions from two or more disks into one virtual partition. Depending on how the space is combined, disk access speed may be improved, reliability may be improved, or both may be improved. LVM enhances flexibility by enabling creation of partition-like units that can be more easily resized. Combining these two technologies can be a big help with your disk problems. Unfortunately, both RAID and LVM require configuration effort, and if you have an existing Linux installation, you’ll need to jump through some hoops to get everything working.

RAID Basics

On the surface, RAID is the more complex of the two technologies, because several different types of RAID arrays exist, many of which are supported by the Linux kernel:
*Linear(append)mode. This technique creates a single virtual partition from two or more input partitions. Data isn’t interleaved or duplicated, so linear mode provides none of RAID’s benefits, aside from the ability to create partitions larger than your biggest physical hard disk.
*RAID 0(striping). Level 0, the lowest level of RAID, interleaves data from the constituent partitions. When reading a large block of (apparently) contiguous data from a RAID 0 array split across two disks, an application reads a few blocks from disk 1, a few blocks from disk 2, a few blocks from disk 1, and so on. This access pattern results in better overall throughput than reading everything from one disk, because the interleaved access reduces the impact of disk speed bottlenecks. RAID 0 provides no data integrity checking, though, and so it doesn’t improve reliability. In fact, if anything it could be said to degrade reliability because either of two disks can fail.
*RAID 1(mirroring). This approach creates an exact copy of the data on one disk on one or more additional disks to improve reliability. If one disk fails, the computer can read the data from the second disk. Unfortunately, RAID 1 degrades write performance, at least when implemented in the Linux kernel, since the system must write data twice.
*RAID 4/5. This type of RAID attempts to gain the benefits of both RAID 0 and RAID 1. RAID 4/5 stripes data, much as in RAID 0, but adds checksums, which can be used to regenerate data in the event of a disk failure. RAID 4 stores the checksums on a single dedicated drive, whereas RAID 5 stores them on the component drives. In either event, RAID 4/5 improves both disk access time and reliability, but at the cost of the need for an extra drive: N+1 drives store the data that could be stored on N drives without RAID 4/5. Therefore, three drives are the minimum practical configuration for a RAID 4/5 array.
*RAID 6. RAID 4/5 is great, but what if two drives fail? You lose data. RAID 6 exists to provide still more redundancy, but at the cost of the need to use yet another drive (N+2 drives are required to obtain the capacity of N drives).
*RAID 10. This level of RAID, like levels 4, 5, and 6, combines the benefits of RAID 0 and RAID 1, but in a different way. This support is experimental as of the 2.6.14 kernel, so you should avoid it on production systems.
In most cases, RAID arrays are constructed from identical or near-identical drives. Although this isn’t strictly required, using drives of disparate size or speed typically results in lost storage space or degraded performance compared to using a single fast drive. An exception might be if you want to leave significant space in a non-RAID configuration (say, partitions devoted to a non-Linux OS on a multi-boot computer); you might then reasonably use drives of different sizes.
RAID is also best implemented using SCSI disks, because SCSI handles transfers to and from multiple devices on the same chain well. When using ATA disks, RAID arrays should use disks on different physical cables — placing two disks used for RAID on a single cable means Linux won’t be able to transfer data from two disks simultaneously, resulting in a speed hit.
Some disk controllers advertise that they support RAID directly. In some cases this means that the controller has enough smarts to handle the disks and present a single virtual disk to the host OS. Such controllers can be worth using, particularly with RAID 1 or higher, because they can help offload the computational and disk access work of implementing RAID, thus improving performance. Frequently, though, so-called RAID controllers are nothing but ordinary disk controllers with a few hooks and software RAID drivers for Windows. Such controllers offer no benefits to Linux.

LVM Basics

LVM’s purpose is to improve partitioning flexibility. LVM begins with one or more partitions, each of which is known as a physical volume in LVM terminology. These physical volumes are combined together into a volume group, which can then be re-allocated as logical volumes. This process may seem convoluted and pointless, but its advantage is that you can easily re-allocate the space within logical volumes. For instance, consider the following Linux partitions:
hda1: 5MB  /boot
hda2: 5GB /
hda5: 10GB /usr
hda6: 40GB /home
This arrangement might work well for an initial installation, but what happens if you discover that you need more space for /usr and less space for /home? Resizing the partitions is possible using tools such as GNU Parted (described in the April 2003″ Guru Guidance,” available online at http://www.linux-mag.com/2003-04/guru_01.html), but using such tools is tricky and usually requires rebooting the computer.
LVM is more flexible. It enables you to shrink the /home logical volume and increase the size of the /usr logical volume without moving the volumes. If the filesystem you use permits it, you can even resize your logical volumes without unmounting them! Presently ext2fs and ext3fs must be unmounted to be resized; ReiserFS can be resized when mounted or unmounted; and JFS and XFS must be mounted to be resized. XFS partitions may only be grown, not shrunk. Experimental tools to enable resizing ext2fs and ext3fs while mounted are under development. If you’ve ever run out of space on a partition and wanted to resize it quickly, you no doubt see the appeal of this feature.
In addition, Linux’s LVM tools provide a feature that’s similar to RAID 0: You can tell the system to interleave accesses from two or more component partitions, thus improving access speed if those partitions are on separate physical drives.
For maximum flexibility and performance, you can combine RAID and LVM. The most common way to do this is to create a RAID array and then apply LVM to the RAID array. This approach gives you whichever RAID benefits apply to the RAID version you use, and gives you LVM’s resizing flexibility. In theory, it’s also possible to apply LVM to your disks and then create RAID arrays atop the logical volumes; however, this approach gives you (in theory) the ability to resize your constituent RAID volumes, not the Linux filesystems stored on them. In practice, RAID may not respond well to its volumes being resized.

Limitations and Precautions

RAID and LVM both rely on kernel support. (A partial exception is hardware RAID implemented by a RAID controller. You don’t need Linux’s kernel RAID support to use such hardware, but you probably do need a kernel driver for the RAID controller itself.) Because of the need for kernel support, you should be cautious about placing the most basic and fundamental partitions in a RAID array or LVM configuration. In particular, the root (/), /boot, /etc, /root, /bin, /sbin, /mnt, and /dev directories are best kept out of RAID and LVM. Although these directories can be placed in RAID or LVM configurations, doing so means that you won’t be able to easily access them from emergency tools that lack the appropriate support. Booting from a kernel stored in a RAID or LVM array can also be complicated.
If you want to configure a Linux system so that the entire disk is in a RAID array, one tip is to place the root (/) partition, including all the specified directories, in a RAID 1 partition. Emergency tools can then access the constituent partitions individually, and you can easily configure a boot loader to point to just one of the relevant kernels in the duplicated partitions. To truly maximize your RAID experience, configure only the /boot partition as RAID 1 and put everything else in a higher RAID level.

Preparing Your System for RAID and LVM

To use RAID or LVM, you must do four things:
1.Add RAID and/or LVM support to your kernel.
2.Install the RAID and/or LVM support packages.
3.Reconfigure your system’s partitions in preparation for using RAID and/or LVM.
4.Configure the RAID and/or LVM features.
The remainder of this month’s column is devoted to the first and second steps. (The third step can actually be performed before the first, but this is most common when implementing RAID or LVM when you install your system.)
The RAID and LVM kernel options are found under the Device Drivers> Multi-device Support(RAID and LVM) category in the kernel configuration menus, as shown in Figure One. Activate the “RAID Support” option for RAID and the “Device Mapper Support” option for LVM. (These options apply to the 2.6.9 and later kernels. Older kernels provided another LVM option, but this option was used for older Linux LVM packages. The “Device Mapper Support” option is used for the current LVM2 software.)
Figure One: Linux RAID and LVM Kernel Options

In addition to the main “RAID Support” and “Device Mapper Support” options, you should select appropriate sub-options. Specifically, you must activate support for the RAID level you want to use. Figure One shows RAID 1 being built into the kernel and RAID 4/5 being built as a module. No sub-options of the “Device Mapper Support” option are required for a basic LVM configuration, at least as of the 2.6.14 kernel, but you might want to read the descriptions of the sub-options in case one of the features appeals to you.
It’s recommended to compile the RAID and LVM kernel options into your main kernel file. This will simplify the boot procedure, particularly if you decide to place your root (/) filesystem or other critical directories on a RAID or LVM device. If you compile these options as kernel modules instead, you must be able to load the modules before you’ll be able to access RAID or LVM devices, which means the modules (typically stored in /lib/modules/) must be on a conventional partition or you must use a boot-time RAM disk.
Many distributions ship with RAID and LVM support in their stock kernels, or at least available as kernel modules. Thus, you may not need to recompile your kernel to add this support. Check your kernel’s configuration to be sure it’s present, though. For RAID, you can type cat /proc/mdstat. If the file exists, RAID support exists in your kernel and the file contains information on the RAID levels that are available to you.
Once you’ve built your new kernel, you can install it, add it to your boot loader, and reboot your computer to test it. If you’re modifying a working kernel, you shouldn’t have any problems with the new kernel. If it won’t boot, reboot using your old kernel and try again.
After you reboot with the new kernel, you should install the RAID and LVM packages. In most cases, the RAID tools ship in a package called mdadm, while the LVM2 software ships in a package called lvm2. (Older RAID and LVM packages — raidtools and lvm-user — are also available, but are not described here.) If you can’t find these packages on your distribution media or if you prefer to go to the original sites, check http://www.cse.unsw.edu.au/~neilb/source/mdadm/ for mdadm or http://sources.redhat.com/lvm2/ for lvm2.
With the software installed, you can begin reconfiguring your partitions. This task can be tedious, particularly if you want to reconfigure a working system without adding new hardware. Be sure to back up your data before proceeding! Reconfiguring existing hard disks requires deleting one or more of their partitions, creating new partitions, and restoring data. This process won’t be complete until you’ve configured a working RAID and/or LVM system, as described in the next two months, so don’t jump into this process until you’ve read the relevant future columns. If you intend to move your system to new hard disk (s), you can begin preparing them now and complete the transition later.
The configuration described here is suitable for an LVM-on-RAID system: It uses low-level RAID partitions to store LVM logical volumes. As a safeguard and to permit minimal access to the system in the event of a problem with the RAID or LVM configuration, the root (/) filesystem is stored in a conventional partition, while the bulk of the data in /usr and /home is stored in RAID/LVM volumes. The basic partition layout looks like Figure Two.
FIGURE TWO: A map of partitions
Partition  Size      ID  System
/dev/hda1 64228 16 Hidden FAT16
/dev/hda2 2048287 0B W95 FAT32
/dev/hda4 7879882 05 Extended
/dev/hda5 48163 83 Linux /boot
/dev/hda6 5020281 FD Linux RAID
/dev/hda7 104391 82 Linux swap

/dev/hdb1 6144831 A5 FreeBSD
/dev/hdb2 71360730 05 Extended
/dev/hdb5 2150420 83 Linux /
/dev/hdb6 5020281 FD Linux RAID
The list of partitions in Figure Two omits some non-Linux partitions for simplicity’s sake, but it demonstrates the fact that a Linux RAID/LVM configuration can coexist with other operating systems — DOS, Windows, and FreeBSD partitions exist on these disks along with the Linux partitions. Also, both / and /boot exist as standard partitions, although both are small in size. They could be moved within the RAID/LVM configuration, but at the cost of greater complexity, particularly for /boot.
One swap partition exists outside of the RAID/LVM configuration and (as described in subsequent columns) another exists within it, although this RAID/LVM swap space isn’t apparent in the partition list. This setup has no particular advantage, unless perhaps you want swap space for a small emergency Linux system without RAID/LVM support. Indeed, the example system is configured as such just to illustrate that swap space can exist in or out of a RAID/LVM system.

Next Month

Next month looks at the RAID side of the configuration in more detail. If you want to implement a RAID system only (with no LVM features), you should be able to do so after reading next month’s column. If you want a complete RAID/LVM configuration, though, you’ll have to wait for the next two months’ columns to finish the job.

Roderick W. Smith is the author or co-author of over a dozen books, including Linux in a Windows World and Linux Power Tools. He can be reached at class="emailaddress">rodsmith@rodsbooks.com.

Comments are closed.