A fairly common Linux storage question: Which is better for data striping, RAID-0 or LVM? Let's take a look at these two tools and see how they perform data striping tasks.
The previous section contrasted RAID-0 and LVM from a conceptual perspective, but the question of which one is faster still remains (even if the question isn’t a good one). This section will present a performance comparison of RAID-0 using mdadm and LVM. However, in the interest of time it doesn’t follow our good benchmarking guidelines (a full set of benchmarks would take over 160 hours). In this case IOzone is used as the benchmark.
IOzone is run in two ways: (1) Throughput, and (2) IOPS. Also, only the write, read, random write, and random read tests are run but a range of record sizes will be tested. Unlike tests from previous articles, each test was only run 1 time using ext4. The test system used a stock CentOS 5.3 distribution but with a 2.6.30 kernel (from kernel.org) and e3fsprogs was upgraded to the latest version as of the writing of this article, 1.41.9. The tests were run on the following system:
- GigaByte MAA78GM-US2H motherboard
- An AMD Phenom II X4 920 CPU
- 8GB of memory
- Linux 2.6.30 kernel
- The OS and boot drive are on an IBM DTLA-307020 (20GB drive at Ultra ATA/100)
- /home is on a Seagate ST1360827AS
- There are two drives for testing. They are Seagate ST3500641AS-RK with 16 MB cache each. These are
/dev/sdb and /dev/sdc.
Both drives, /dev/sdb and /dev/sdc, were used for all of the tests.
To help improve run times, 3 threads were used on the quad-core system. The fourth core was kept for the software RAID or LVM processing. So in the IOzone command lines the “-t 3″ option means that three threads were used. In addition, each thread had a size of 3GB, resulting in a total data size of 12GB. The important point is that the total amount of data is larger than memory (12GB > 8GB).
For the throughput tests, the following IOzone command line was used.
./iozone -Rb spreadsheet_ext4_write_and_read_1K_1.wks -i 0 -i 1 -i 2 -e -+n -r 1k -s 3G -t 3 > output_ext4_write_and_read_1K_1.txt
The command line is shown with a 1KB record size.
The IOPS tests used the following IOzone command line.
./iozone -Rb spreadsheet_ext4_write_and_read_1K_1.wks -i 0 -i 1 -i 2 -e -O -+n -r 1k -s 3G -t 3 > output_ext4_write_and_read_1K_1.txt
The RAID-0 array was constructed relying on defaults as shown in a previous article. The command used to construct the array was the following.
[root@test64 laytonjb]# mdadm --create --verbose /dev/md0 --level raid0 --raid-devices=2 /dev/sdb1 /dev/sdc1
The “chunk size”, or stripe width, defaults to 64KB.
To contrast RAID-0 and LVM they need to be constructed as similarly as possible. This is a bit more difficult in LVM since it is different than RAID. The basics of LVM were discussed in a previous article. After the physical volumes (PV’s) were created they were grouped into a single volume group.
[root@test64 laytonjb]# /usr/sbin/vgcreate primary_vg /dev/sdb1 /dev/sdc1
Volume group "primary_vg" successfully created
[root@test64 laytonjb]# /usr/sbin/vgdisplay
--- Volume group ---
VG Name primary_vg
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 1
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 0
Open LV 0
Max PV 0
Cur PV 2
Act PV 2
VG Size 931.52 GB
PE Size 4.00 MB
Total PE 238468
Alloc PE / Size 0 / 0
Free PE / Size 238468 / 931.52 GB
VG UUID yjkNSQ-416l-f5Bt-RZLt-38NH-8LT6-QfrjeJ
The key to stripe mapping in LVM is how the logical volume is created. For this article the number of stripes (“-i” option) was arbitrarily chosen to be 2, and the stripe width (“-I” option) was chosen to be 64KB to match RAID-0. The total size of the LV was arbitrarily chosen to be 465GB. The command line for creating the LV was the following.
[root@test64 laytonjb]# /usr/sbin/lvcreate -i2 -I64 --size 465G -n test_stripe_volume primary_vg /dev/sdb1 /dev/sdc1
Logical volume "test_stripe_volume" created
[root@test64 laytonjb]# /usr/sbin/lvdisplay
--- Logical volume ---
LV Name /dev/primary_vg/test_stripe_volume
VG Name primary_vg
LV UUID igTRtk-wcqn-YVzR-HNQh-Ki2b-HznC-HcW589
LV Write Access read/write
LV Status available
# open 0
LV Size 465.00 GB
Current LE 119040
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 512
Block device 253:0
Then the file system is created using the logical volume test_stripe_volume.
[root@test64 laytonjb]# /sbin/mkfs -t ext4 /dev/primary_vg/test_stripe_volume
mke2fs 1.41.9 (22-Aug-2009)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
30474240 inodes, 121896960 blocks
6094848 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
3720 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 21 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
RAID-0 and LVM Test Results
The two tables below present the throughput and IOPS results for both RAID-0 and LVM. Table 1 contains the throughput results.
Table 1 – Throughput Tests
|
RAID-0 mdadm |
LVM |
| Record Size |
Write (KB/s) |
Read (KB/s) |
Random Read (KB/s) |
Random Write (KB/s) |
Write (KB/s) |
Read (KB/s) |
Random Read (KB/s) |
Random Write (KB/s) |
| 1KB |
161,898 |
145,404 |
1,378 |
3,108 |
159,412 |
109,844 |
981 |
1,751 |
| 8KB |
186,725 |
151,225 |
5,150 |
7,976 |
183,352 |
155,871 |
4,429 |
7,233 |
| 32KB |
183,341 |
156,619 |
16,910 |
24,748 |
185,189 |
155,294 |
20,247 |
24,106 |
| 64KB |
182,698 |
173,024 |
33,319 |
44,386 |
188,842 |
150,967 |
30,142 |
40,277 |
| 128KB |
182,957 |
157,612 |
59,890 |
44,386 |
189,571 |
123,128 |
34,578 |
41,502 |
| 512KB |
189,282 |
157,612 |
134,966 |
98,189 |
184,341 |
147,475 |
87,534 |
92,821 |
| 1MB |
191,890 |
169,098 |
187,623 |
119,255 |
187,019 |
143,780 |
115,939 |
111,758 |
| 4MB |
186,289 |
157,202 |
194,943 |
137,252 |
182,579 |
137,214 |
143,452 |
136,715 |
| 8MB |
184,611 |
148,623 |
203,796 |
141,693 |
187,268 |
146,750 |
238,120 |
139,860 |
| 16MB |
186,541 |
149,814 |
223,136 |
144,753 |
187,935 |
121.341 |
199,823 |
139,860 |
Table 2 below contains the throughput results for both RAID-0 and LVM.
Table 2 – IOPS Tests
|
RAID-0 mdadm |
LVM |
| Record Size |
Write (Ops/s) |
Read (Ops/s) |
Random Read (Ops/s) |
Random Write (Ops/s) |
Write (Ops/s) |
Read (Ops/s) |
Random Read (Ops/s) |
Random Write (Ops/s) |
| 1KB |
181,457 |
161,357 |
1,545 |
2,156 |
176,556 |
106,719 |
836 |
894 |
| 8KB |
23,591 |
19,087 |
622 |
1,034 |
23,753 |
13,135 |
450 |
1,086 |
| 32KB |
5,763 |
6,291 |
617 |
796 |
5,836 |
3,709 |
529 |
748 |
| 64KB |
2,943 |
2,756 |
611 |
673 |
2,748 |
2,873 |
510 |
524 |
| 128KB |
1,483 |
1,228 |
323 |
331 |
1,492 |
989 |
261 |
282 |
| 512KB |
363 |
388 |
206 |
185 |
359 |
235 |
166 |
180 |
| 1MB |
178 |
161 |
141 |
112 |
189 |
143 |
108 |
109 |
| 4MB |
46 |
42 |
45 |
34 |
45 |
31 |
33 |
33 |
| 8MB |
24 |
20 |
26 |
17 |
22 |
15 |
21 |
17 |
| 16MB |
11 |
12 |
15 |
9 |
11 |
8 |
10 |
9 |
Even though the results did not follow our good benchmarking habits, which really limits our ability to make any conclusions, it is interesting to do a quick comparison.
- For both RAID-0 and LVM, as the record size increases, write throughput performance increases slightly and read performance remains about the same. Both random read and random write performance increases fairly dramatically as the record size increases.
- For both RAID-0 and LVM, as the record size increases write IOPS and read IOPS decreases dramatically (this is logical since you have fewer records reducing the IOPS). The same is true for random read IOPS and random write IOPS.
- Finally, while it is almost impossible to justify comparing RAID-0 and LVM performance, human nature will push it us to do a comparison. It appears as though RAID-0 offers a bit better throughput performance than LVM, particularly at the very small record sizes. The same is true for IOPS.
Summary
A fairly common question people ask is whether it is better to use data striping with RAID-0 (mdadm) or LVM. But in reality the two are different concepts. RAID is all about performance and/or data reliability while LVM is about storage and file system management. Ideally you can combine the two concepts but that’s the subject of another article or two.
In the interest of trying to answer the orignal question of which one is better, a quick test was run with IOzone. We did not use our good benchmarking skills in the interest of time, but the test results give some feel for the performance of both approaches. The performance was actually fairly close except for small record sizes (1KB – 8KB) where RAID-0 was much better.
Jeff Layton is an Enterprise Technologist for HPC at Dell. He can be found lounging around at a nearby Frys enjoying the coffee and waiting for sales (but never during working hours).
Comments on "Pick Your Pleasure: RAID-0 mdadm Striping or LVM Striping?"
As I understand it, this article is talking about software RAID as well as the lvm. Does anyone actually use software RAID outside of home use or temporary systems? And does anyone actually use RAID 0, software or hardware? If you do, you\’ve doubled (or more depending on how man drives you use) your chances of loosing all your data from a drive failure.
My usual setup is to use hardware RAID (usually 0+1 for small installation and 5 for larger datasets), generally with an HP SmartArray controller (PERC if Dell but I don\’t use Dell often), and then use lvm on top of that to provide maximum flexibility.
People talk about hardware raid controllers as though they were a class in their own right. In practice there are two flavours, those with battery backup (BBU) and those without…
With battery backup, ie such as your PERC/HP Smart Array, the performance goes through the roof (perhaps 10x the IO/s in some cases) simply because the card \”lies\” and says that the data has hit the disk when it\’s really still in disk cache. However, the card can aggregate the data, batch the writes, reorder the writes to minimise disk seeks and generally vastly improve write speeds (it should have little effect on read speeds)
Without battery backup the hardware cards should make little difference either way (I\’m sure any given test will show one better than the other mind). There is simply no reason for hardware to win over software in general purpose use (lets assume less than 16 hard drives and ignore super large arrays, etc) Fundamentally the read speed is set in stone, with perhaps a small amount of re-ordering possible to improve things, the write performance can be vastly improved by re-ordering and batching things, but re-ordering writes is simply not possible without a BBU (or at least not safely).
So in general the cheaper hardware cards are of little value other than ease of setup (and the really cheap cards are usually actually software raid via a custom driver…) And the expensive hardware cards will set your world on fire, but by definition they are expensive..
I have used the HP Smart Array cards many years ago and for our database application they gave us at least a 10x speedup in IO/s simply by flicking the writeback cache on/off… Stonking units
Wow you must be really misinformed.
Hardware raid is just that, and performance is independent of whether it has battery backup or not.
Obviously, when write cache is enabled for the controller (as it should to have a real performance benefit), you suffer the risk of data corruption when power is lost to the device.
You are talking about PERC and SmartArray as if it has battery backup by default. This is obviously not the case for either of these, or any other hardware raid card for that matter. This is sold seperately, and available for any major brand.
I can assure you, most users use the write cache even without the battery backup. Datacenters are quite reliable these days you know…
I have always wondered how much difference you would see between hardware raid and software raid. Are there some decent benchmarks covering this?
Back to the article: I think it was biased towards LVM.
\”The performance was actually fairly close except for small record sizes (1KB – 8KB) where RAID-0 was much better.\”
Like that doesn\’t count? for the 1K Read (KB/s) it was about a 32% increase. I know that most files are not that small. I think that if the OS was raided it would further magnify this difference. The size and dependability of today\’s drives minimizes the need for LVM.
The command you used to create the mdadm RAID-0 array actually created a RAID-1 array.
The write throughput of both RAID-0 and LVM are very similar. It is the read throughput that really varies between the two. Anyone have an idea why?
Write throughput is pretty flat starting at 8KB blocks. Random reads and writes show a steady improvement with larger blocks. Is this because the random access negates the benefit of Linux\’s disk cache?
Even though it would have been outside the article\’s topic, I would like to see the results of a non-striped test on the same hardware to show how much benefit there is from striping, whether RAID-0 or LVM.
ewildgoose: I thought \”BBU\” stood for \”Battlin\’ Business Units\”.
stevenjacobs: I think ewildgoose is not misinformed. I used to have a HP server with SmartArray card and no BBWC (Battery-Backed Write Cache), only read cache of 64MB (stock SmartArray 6i) so I can\’t enable write-cache on the controller. The performance was horrible (only a few MB/s when doing heavy IO like when doing backup). After adding BBWC then the performance was more reasonable (30-40MB/s).
apiset: I haven\’t had this restriction on any controller so far. We use 3Ware, Areca, PERC and they all allow write cache without a BBU.
It\’s definately best practice to install one, but saying that it is required to have better performance is just not right.
Hmm… I posted this yesterday but it didn\’t show up…
I fixed the raid-1 goof. Thanks for the catch typhoidmary. I checked the system and it was RAID-1. The raid-1 in the command line was a typo.
Jeff
@duncan,
You fell into the trap :) The benchmarks were only single runs, consequently there is no measure of variability in the runs. So 30% might be a huge difference, but the variability could be larger than that. The problem is that we don\’t just know. So it makes it difficult to compare (actually I think it\’s impossible).
But I included the results for fun.
Jeff
Hardware RAID 0 should make a difference over software:
a) CPU usage will be smaller (mainly with ATA/SATA disks)
b) bandwidth – the HW controller is, in principle, capable of comunicating independently with the disks and providing full speed (limited by the bus) to the chipset
Thanks for the article, it was interesting. However, there is one point that needs to be corrected, because it certainly confused me.
According to the lvcreate manpage, the “-i” option indicates across how many physical volumes (within the volume group) to scatter the logical volume. Hence, in your case, the number “2″ that you gave to “-i” is not arbitrary at all. In fact, this is the only number that makes sense, since you had two physical volumes in your volume group.
Here is the relevant part of the manpage:
-i, –stripes Stripes
Gives the number of stripes. This is equal to the number of physical volumes to scatter the logical volume.
Thanks,
Iordan
Interesting read. Of course, mdadm and LVM were created for different reasons. mdadm for stabiliy/speed, LVM for maximum flexibility. But i still wondered how close LVM’s striping capability comes to mdadm in terms of performance. btrfs will include both concepts already at the file system level, theoretically it should outperform a classic mdadm/LVM setup?
I think the author could make a mistake about the capacity of raid0, which he claimed to be computed from the smallest disk size among the disks in the group, multiplied by the number of drives in the group.
From my experiment under debian Lenny, using virtual disk made from /dev/zero, the result should be the sum of every disk of the group, which is exactly the same to lvm2. I made 2 virtual disks of 3Mb and 6Mb, the result capacity of raid0 is 9Mb, rather than 6Mb.
BTW, The mdadm version is v 2.6.7.2, linux kernel 2.6.26
yes thats wright! it is 9 mb. wonderful, thanks!