Software RAID on Linux with mdadm

Now that we've completed our initial examination of the basics of RAID levels (including Nested RAID) it's time to turn our attention to RAID functionality on Linux using software. In this article we will be discussing mdadm -- the software RAID administration tool for Linux. It comes with virtually every Linux distribution and has some unique features that many hardware RAID cards don't.

Notice that the mdadm command begins with the “–create” option that tells mdadm that it will operate in “create” mode. I like to then define the specific md-device as well but be sure you aren’t using an existing md-device name. Specifying the “chunk” option is up to you. Then you define the RAID level you want by the “–level” option with the options listed above. You can also tell mdadm how many block devices you are using with the “–raid-devices=Z” option (Z is the number of devices). Then finally you give mdadm the list of block devices you are using.

An example of using the create option with mdadm is,

% mdadm --create --verbose /dev/md0 --level=0 --raid-devices=3 /dev/sda1 /dev/sdb1 /dev/sdc1


which creates a RAID-0 configuration that is labeled as /dev/md0 and uses three block devices that are /dev/sda1, /dev/sdb1, and /dev/sdc1.

In the example, I have used the first partition of each of the three drives that are valid block devices as the block devices for mdadm. They could have easily been the entire disk such as /dev/sda or /dev/sdb. The point is that they need to be valid Linux block devices (they could even be network based devices but that’s a different discussion).

Mdadm is smart enough to build the RAID-0 configuration using the smallest common size of each of the three devices. So it is recommended that you check on the size of each block device using fdisk as shown below.

root@laytonjb-laptop:~/# /sbin/fdisk /dev/sdb

The number of cylinders for this disk is set to 19457.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000bca3e

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1       18704   150239848+  83  Linux
/dev/sdb2           18705       19457     6048472+   5  Extended
/dev/sdb5           18705       19457     6048441   82  Linux swap / Solaris


Be sure to look at the column labeled “Blocks” for the particular partition you are going to use. Do this for all the devices and make sure the number of blocks is the same or very close (otherwise you are wasting space).

You might also notice that I used the “–verbose” option in the mdadm create command. I like to use this option to get more information about what mdadm is doing (“better informed than sorry” is a good motto). This is always a good habit to develop.

At this point, your RAID array should be created and running. An easy way to check this is to look at /proc/mdstat.

% cat /proc/mdstat


Fortunately, the output should be fairly easy to read at first glance. For much more detailed information you can read this article.

Immediately after the RAID array is created it may have to go through a synchronization process. This process performs the necessary RAID functions for configuration you created. For example, for RAID-1, the blocks on the first drive are copied to the second drive even if there isn’t any information on the blocks.

Once the array has finished synchronizing and is ready, then you can move to the next step which can be using the resulting RAID array device in LVM or creating a file system using the block device.

There are many options that you can use in “create” mode. You can read the man pages to get a list of them but below are some of the more important ones that haven’t been presented yet in this article.


    -x, –spare-devices= This option allows you to specify spare devices in the initial array. These are devices (disks) that are used in the event that a disk in the RAID configuration fails. Mdadm then uses the spare drive immediately and starts restoring the array to the desired configuration. Mdadm also allows spare drives to be added and removed later (they don’t have to be added when the array is created). If you use spare devices be sure that the “–raid-devices” option lists the number of devices to be the actual RAID drives plus the spares.


    -p, –layout=, –parity= Mdadm gives you remarkable control over your RAID configuration. This option lets you control the fine details of the data layout for RAID-5 and RAID-10 arrays and also controls the failure mode for a faulty or failed disk. Please read the manpages for more detail.


    -z, –size= This option is the amount of space to be used from each drive in a RAID-1, RAID-4, RAID-5, or RAID-6 configuration. The size is given in kibibytes and must be a multiple of the chunk size. In addition, you must leave about 128KB (128 kibibytes) of space at the end of the drive for the RAID superblock. After the array is created you can use the “grow” mode of mdadm (–grow) to increase the size of the RAID configuration.

Assembling an mdadm RAID array

One of the other “modes” in mdadm is “assemble”. After your RAID array has been created using mdadm, you can stop the array using the following command:

% mdadm --stop /dev/md0


which stops the RAID array /dev/md0 (be sure to unmount the file system that uses the RAID array first). However, there are problems in restarting the RAID array. When you restart the array you have to use mdadm to reassemble the array. For example,

% mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1


This command assembles the parts of a previously created array into an active array and “restarts” the array (i.e. make it function). To automate this, you could put this command as part of the system startup (for example, /etc/rc.d/rc.local) and you could create a simple script for stopping and starting the array. But mdadm can do some of the leg work for you with the following command:

% mdadm --assemble --scan


These options allow mdadm to scan the drives and reassemble the RAID array (it looks for the RAID superblocks on the drives). Typically this is done during the init phase of the system starting. For example, on my CentOS 5.5 system, there is a line in /etc/rc.d/rc.sysinit that looks like the following.

/sbin/mdadm -A -s


which will scan (“-s”) the drives and assemble (“-A”) the mdadm arrays.

However, you can get into trouble with scanning and assembling mdadm RAID arrays when you have more than one array. What is recommended is that if you want to restart an array by hand you specify the uuid for the array (a unique “name” of the RAID array). For example,

% mdadm --scan --assemble --uuid=7121b438:7d36f9f6:8aa9c8b3:b5b0d211


Since the uuid’s are unique to each array, this will ensure that mdadm can reassemble the array properly. However, mdadm‘s scanning and assembling capabilities are quite good and I routinely run two mdadm RAID configurations on my desktop and I’ve never had any confusion between which disks belong to which array (thanks to the superblocks on the devices).

Monitoring/Following an mdadm RAID array

The third mode of operation of mdadm is monitoring or following an mdadm RAID array. This mode monitors one or more arrays and allows action to be taken if the state of the array changes. According to the manpage for mdadm, this mode of operation is really only useful for RAID-1, 4, 5, 6, and 10, or multipath arrays since they have interesting states. On the other hand, RAID-0, and linear RAID are not useful because missing, spare, or failed drives cause these RAID modes to fail (i.e. non-operational).

The basic option for following or monitoring mdadm controlled arrays is the following:

mdadm --monitor options... devices...


(Note: you can use “-F” or “–follow” in place of “–monitor”). There are several options that can be used for monitoring and following mdadm arrays as listed below:


    -m, –mail This options allows you to define an email address where mdadm alerts are sent.


    -p, –program, –alert This option allows mdadm to run a “program” whenever an event is detected (it is recommended to use the full path to the program).


    -y, –syslog This option causes all events to be reported through “syslog”.


    -d, –delay This options creates a delay (in seconds) from when mdadm polls the arrays to when it next polls the arrays (i.e. the interval between polling).


    -f –daemonize This options tells mdadm to run as a background daemon if it is monitoring arrays. This causes mdadm to fork and run in the child process and disconnect from the terminal.


    -i, –pid-file This option tells mdadm to write the pid of the daemon process when mdadm is run as a daemon (see previous option). The pid is written to a specified file.


    -1, –oneshot This option checks the arrays only once and generates “NewArray” events as well as “DegradeArray” and “SparesMissing” events (this will show up in the logs). According to the manpages, if you run the command “mdadm –monitor –scan -1″ from a cron job, it will ensure regular notification of any degraded arrays (always a good thing).


    -t, –test This option generates a “TestMessage” alert for every array found at startup. This alert gets mailed and passed to the alert program (if you have defined one). This is very useful for testing that alert messages get through successfully (i.e. they work).

Comments on "Software RAID on Linux with mdadm"

smino

Great right up. Is there such a thing as software Maid, in that it powers down the drives not in use, and powers them up when needed?
What about doing a Raid but without the striping, only data parity, so that if two or three drives fail, you only lose the information on those two or three drives? I ask because I currently use unraid that does this, and it powers off the drives you do not use individually, since it is not stripped, but I am always looking for a better free alternative.

Reply
tindallh

You said:
Mdadm is smart enough to build the RAID-0 configuration using
the smallest common size of each of the three devices.

Not really sure why that’s relevant to RAID-0… RAID-1, yes, as the mirrored data can only be as large as the smallest member, but in level 0 (which I feel is a bad idea to start off with unless used with some other level) it doesn’t matter…

Reply
    przemek

    Re. RAID0 vs RAID1, I like this quip: ‘the RAID number is equal to the probability that you’ll get your data back after a disk failure’.

    I don’t mean to say that RAID0 is useless, but it does decrease the time to failure by a factor equal to the number of participating disks, so it essentially is only good for data that can be easily regenerated.

    Reply
ggmathew

Software RAID on Linux with mdadm is great way to achieve RAID configuration, but it has it’s own pros and cons compared to the Hardware RAID.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>