Introduction to RAID

RAID is one of those technologies that has really revolutionized storage. In this article, we'll review the six most common single RAID levels and describe how each works and what issues surround them.

Introduction

One of the most common techniques to improve either data reliability or data performance (or both) is called RAID (Redundant Array of Inexpensive Disks). The concept was developed in 1977 by David Patterson, Garth Gibson, and Randy Katz as a way to use several inexpensive disks to create a single disk from the perspective of the OS while also achieving enhanced reliability or performance or both.

Before anyone erupts and says that RAID does not stand for “Redundant Array of Inexpensive Disks”, let me start by stating that was the original definition. Over time, the definition has become more commonly known as “Redundant Array of Independent Disks” perhaps so the word “inexpensive” isn’t associated with RAID controllers or disks. Personally I use the original definition but regardless, either definition means that the disks are independent of one another. Feel free to use either definition since it won’t change the content of this article. Now, back to our discussion of RAID.

When the original paper was issued, five different RAID levels or configurations were defined. Since that time other RAID configurations have been developed including what are referred to as “hybrid” RAID configurations.

The RAID Advisory Board (RAB) was created to help advise the IT community on the defined RAID configurations and to help the creation of new RAID configuration definitions. While it is not an organization that creates legally binding standards and labeling, it does help in clarifying what the RAID levels mean and what is commonly accepted in the community. There was a time where companies were creating very strange RAID configurations and using strange labels, causing great confusion. The RAB has helped to reduce the proliferation of “weird” RAID configurations and labeling and standardize the meaning of various RAID levels.

In this article I want to review the seven most common standard RAID configurations. But I will also very briefly touch on some of the hybrid RAID configurations. For each RAID level, I will describe how it works as well as the configuration’s particular pros and cons. However, before starting I want to clarify one thing: RAID is not meant as a replacement for backups. RAID can help improve data reliability which really means data availability (improving uptime for data) and/or data performance (I/O performance). It is not intended as a replacement for backups or keeping multiple independent copies of your data.

RAID Configurations

As mentioned above, there were five original RAID levels or configurations that were defined but others have been developed since that original article. In RAID terminology each distinct RAID configuration is given a number which can also be called a RAID “level”. The core RAID configurations are listed as: RAID-0, RAID-1, RAID-2, RAID-3, RAID-4, RAID-5, and RAID-6.

RAID-0
This RAID configuration is really focused on performance since the blocks are basically striped across multiple disks. Figure 1 from wikipedia (image by Cburnett) illustrates how the data is written to two disks.

325px-RAID_0.svg.png
Figure 1: RAID-0 layout (from Cburnett at wikipedia under the GFDL license)

In this illustration, the first block of data, A0, is written to the fist disk, the second block of data, A1, is written to the second disk, the third block of data, A3, is written to the first disk, and so on. If the I/O is happening fast enough data blocks can be written almost simultaneously (i.e. A0 and A1 are written at just about the same time). Since the data is broken up into block sized units between the disks, it is commonly said that the data is striped across the disks. As you can see, striping data across the disks means that the overall write performance of the disk set is very fast, usually much faster than a single disk.

Reading from a RAID-0 group is also very fast. A read request comes in and the RAID controller, which controls the placement of data, knows that it can read A0 and A1 at the same time since they are on separate disks, basically doubling the potential read performance relative to a single disk.

You can have as many disks as you want in a RAID-0 array (a group of disks in a RAID-0 configuration). However, one of the downsides to RAID-0 is that there is is no additional data redundancy provided by RAID-0 (it is all focused on performance). No data parity is computed and stored meaning that if you lose a disk in a RAID-0 array, you will lose access to all of the data in the array. If you can bring the lost disk back into the array without losing any data on it, then you can recover the RAID-0 array, but this is a fairly rare occurrence.

Consequently, we can see that RAID-0 is focused solely on performance with no additional data redundancy beyond the redundancy in a single disk. This affects how RAID-0 is used. For example, it can be used in situations where performance is paramount and you have a copy of your data elsewhere or the data is not important. A classic usage case is for scratch space where data is written while an application is running but is not needed once the application is done and the final output is copied to a more resilient storage device. If a scratch space disk is lost while the application is running, you can rebuild the RAID-0 array with one fewer drives, and rerun the application.

The capacity and failure rate of a RAID-0 array is the fairly simple to compute. The capacity is computed as,

Capacity = n * min(disk sizes)

where n is the number of disks in the array and min(disk sizes) is the minimum common capacity across the drives (this indicates that you can use drives of different sizes). This equation also means that RAID-0 is very capacity effective since it doesn’t waste any space for parity or any other error correction. It uses all of the space for data focusing on performance.

The failure rate is a little more involved but can also be estimated.

MTTFgroup = MTTFdisk / n

where MTTF is the Mean Time To Failure and “group” refers to the RAID-0 array and “disk” refers to a single disk. So as you add disks, you greatly reduce the MTTF for the RAID-0 array. Having two disks decreases the MTTF by half. Three disks reduces the MTTF by a factor of 3, and so on. So you can tell why people are reluctant to use RAID-0 for file systems where data availability and reliability is important. But, RAID-0 is the fastest RAID configuration and has the best capacity utilization of any RAID configuration discussed in this article.

Table 1 below is a quick summary of RAID-0 with a few highlights.

Table 1 – RAID-0 Highlights

Raid Level Pros Cons Storage Efficiency Minimum Number of disks
RAID-0


  • Performance (great read and write performance)
  • Great capacity utilization (the best of any standard RAID configurations)


  • No data redundancy
  • Poor MTTF

100% assuming the drives are the same size 2

RAID-1
RAID-1 is almost the exact opposite of RAID-0 because it uses multiple drives that are mirrors of one another. Typically two drives are used in RAID-0 but three drive RAID-1 configurations are becoming more common. RAID-1 takes an incoming block of data to one drive and creates a mirror image (copy) of it on a second drive. So RAID-1 doesn’t compute any parity of the block – it just copies the entire block to a second drive. Figure 2 from wikipedia (image by Cburnett) illustrates how the data is written to two disks in RAID-1.

325px-RAID_1.svg.png
Figure 2: RAID-1 layout (from Cburnett at wikipedia under the GFDL license)

Comments on "Introduction to RAID"

aotto

You have typos. Consider:

$article_text =~ s/RIAD/RAID/g;

Also, you did not explain how parity works, which is something that confuses RAID newcomers. They need to fathom the concept that the parity is combined with the surrounding data to compute what the original data was so that it can be recreated.

Reply
markhahn

it’s worth noting that MD provides non-nested raid10: it’s a single raid level that merely provides replicas of blocks (on multiple disks, of course.) with 2 disks, it’s the same as raid1, but can still provide raid1-level redundancy with 3 or more disks. more disks give you a raid0-like increase in bandwidth and/or throughput. it also lets you choose replication of more than 2x.

but in general, I think people are gradually realizing that block-level raid is eventually going to become obsolete. there are a lot of advantages to letting a smart filesystem manage redundancy, since that permits file/access-aware choices, and can mitigate some of the issues of block-level raid rebuilds.

Reply
aslamnet

Thanks for such a structured article on RAID..

Reply
roustabout

RAID3 and RAID4 are considered to be “two of the most common RAID levels” by whom, exactly?

And not a peep about how any of these relate to linux in an article in something calling itself linuxmag?

Not even a mention of mdraid?

Reply
seenutn

Nice article on RAID, but it would be nice if it covers about where will the RAID controller exist (in BIOS or Kernel or separate controller). And also, where will the RAID controller store the meta data?

Reply
laytonjb

Thanks everyone for the comments. Just to clarify a bit:

@roustabout: RAID-3 and RAID-4 were part of the original RAID definition. I didn’t see what I called them “two of the most common RAID levels”. If I did, the intent is to point out that they are part of the original RAID definitions, but commonly _used_.

For everyone who is concerned that I haven’t talked about mdadm or software RAID, hardware RAID, or “fakeRAID” – that article is coming (as are articles about Nested RAID). This is a whole series of introductory articles on RAID. Talking about specific implementations, particularly for Linux, is coming. You just have to be patient. So @roustabout – you will just have to be patient :)

@markhahn – great comment and I totally agree with you but I also disagree to some extent. Putting RAID functionality into the file system _should_ allow the file system to do really useful things such as only recover the needed blocks during a disk failure. This avoids having to read all of the blocks for recovery and perhaps coming close to the dreaded URE limit.

But this means that we (the community) needs to rewrite all file systems to do this. With each file system being unique this means we are going to have different sets of code that do pretty much the same thing. I don’t think existing file systems will do this (too much work and too disruptive) so that means future file systems should incorporate this (such as btrfs). However, it takes a very long time for a file system to mature so we may be waiting for several years. So in the meantime, I think block-based RAID is here to stay.

On the other hand, I think the development of object based file systems that don’t use block-based RAID, should be the wave of the future. PanFS from Panasas is an example of this. I think local file systems should adopt this approach (and we’re seeing some of this with ExoFS) because we don’t need to read all the blocks to recover from a disk failure – just the objects that are “missing” or need to be duplicated.

Thanks for bringing up the topic – always good to think about what we need to do next few years.

Jeff

Reply
buggsy2

A great beginning article on RAID and I look forward to more on this topic. You eluded to current disk drives doing their own parity checking/correction, I’d like to see that explored more; just how much onboard data checking do they do? I’ve heard that modern high-density drives generate huge numbers of errors from the raw disk which must be corrected in the onboard electronics of the drive, but I’ve never seen anything definitive about this topic.

Reply
casiquey2k

I just wanted to make an observation, you start the third page by saying “In this layout, data is written in block stripes to the first three disks (disks 0, 1, and 2) while the third drive (disk 3)” and I think what you meant to say is “…while the fourth drive (disk 3)” since your array starts at disk 0.

Reply
karimbardee

Nice article for someone who does not know anything about RAID (like me ) and want to know the basic definition or the general idea.
thanks.

Reply
rajiravi

Good and detailed article

Reply
linux.haresh

Nice article to clear idea of Raid to the newcomers.

Reply

“In the real-world, RAID-4 is rarely used because RAID-5 (see next sub-section) has replaced it.”
Note that Netapp storage, which is most popular today, uses Raid 4 and Raid dp only. Raid dp is kind of raid 4 + one more parity disk

Reply

I constantly spent my half an hour to read this webpage’s content all the time along with a cup of coffee.

Reply

Mobile phones have undergone so many facilities and the new generation cell phones have almost all functionalities of a personal computer. The high end mobile phones with advanced features are known as Smart phones. They are highly efficient in performing multiple functions and is a combination of gadgets rolled into one namely a camera, computer, calendar, TV etc.

But if you want to buy this or any other smartphones,you should compare the popular smartphones and choose the one which suits your needs best.I found an article on 10 Best Smartphone Reviews at :- http://socialeum.com/10-best-smartphone-reviews-for-2014.html

Reply

Nice post. I learn something totally new and challenging on websites I stumbleupon every day.
It’s always helpful to read through content from
other authors and use something from other sites.

Reply

I simply want to mention I am beginner to weblog and truly enjoyed this web page. Most likely I’m want to bookmark your site . You surely come with fantastic article content. Regards for sharing with us your web-site.

Reply

I just want to tell you that I am very new to blogging and site-building and absolutely liked you’re web page. Almost certainly I’m planning to bookmark your website . You absolutely have remarkable posts. Many thanks for sharing with us your web-site.

Reply

Blog looks nice. I’m still trying to make a blog but it won’t be as professional as yours /: Keep on blogging :) pirater un compte facebook

Reply

Introduction to RAID | Linux Magazine
dcfzkxzzc http://www.g23y23cwxmauhn78958b5mb31ys6177vs.org/
[url=http://www.g23y23cwxmauhn78958b5mb31ys6177vs.org/]udcfzkxzzc[/url]
adcfzkxzzc

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>