Linux Don’t Need No Stinkin’ ZFS: BTRFS Intro & Benchmarks

ZFS may be locked into the Solaris operating system but "Butter FS" is on the horizon and it's boasting more features and better performance.

Benchmarking btrfs

While there are a huge number of benchmarks for file systems, the benchmarks in this article will only use iozone. Iozone is a popular benchmark and allows some control over the range of file sizes and block sizes and produces usually pretty good results for “local” file systems (not distributed file system). For this article, an older system with a 2.0 GHz AMD Opteron64 CPU with a base CentOS 5.3 system was used with btrfs v.0.18 in a 2.6.30-rc1 kernel.

A number of options for btrfs were tested with a single disk and with two disks (identical disks). These options range from using default options for making the file system and for mounting the file system (noted as “standard”) to different mount options. The mount options used are:


  • nodatacow (No data copy on write)
  • nodatasum (No data sum checks)
  • compress (Turn on compression)

The results have 4 columns: write, rewrite, read, and reread. Using iozone a 2GB file was written with a 16,384 record size keeping the results out of the range of cache effects. For comparison the results from the previous article on ext4 are included. There are four results from the previous article: Ext3 (default), Ext4 (default), Ext3 (performance), and Ext4 (performance). The table below lists the options for creating the file system (mkfs.btrfs) and for mounting the file system.

File System
mkfs.btrfs
options
Mount options
Write
MB/s
Rewrite
MB/s
Read
MB/s
Reread
MB/s
Ext3
28.307 28.001 55.791 55.765
Ext4
30.228 29.626 108.701 108.884
Ext3 “optimal”
28.047 26.432 105.565 105.156
Ext4 “optimal”
30.127 29.365 109.889 109.600
Btrfs
Standard
Standard
31.586 31.917 104.104 104.106
Btrfs
Standard
nodatacow,
nodatasum
31.933 31.839 107.157 106.565
Btrfs
-m single
standard
31.513 31.504 105.578 105.828
Btrfs
-m single
nodatacow,
nodatasum
31.874 31.896 105.109 106.565
Btrfs
standard
compress
71.359 71.129 126.660 132.314
Btrfs
two disks,
raid0
standard
49.891 50.318 129.907 132.024
Btrfs
two disks,
raid0
nodatacow,
nodatasum
45.054 45.867 130.655 131.879
Btrfs
two disks,
single
standard
50.144 50.264 126.984 131.130
Btrfs
two disks,
single
nodatacow,
nodatasum
43.834 47.603 131.612 131.470
Btrfs
two disks,
raid1
standard
48.984 50.388 122.818 109.867
Btrfs
two disks,
raid1
nodatacow,
nodatasum
48.196 48.462 131.832 131.807
Btrfs
two disks,
raid10
standard
49.859 50.101 130.655 132.179
Btrfs
two disks,
raid10
nodatacow,
nodatasum
50.072 50.366 129.424 122.828
Btrfs
-m single
-o compress
70.241 69.299 138.849 127.958
Btrfs
-m single
-o compress
nodatacow,
nodatasum
31.976 31.922 107.000 106.728
Btrfs
two disks,
raid0
-o compress
70.234 69.048 130.852 129928
Btrfs
two disks,
raid0
-o compress
nodatacow,
nodatasum
48.762 48.831 130.812 130.202
Btrfs
two disks,
raid1
-o compress
70.467 68.286 130.990 130.051
Btrfs
two disks,
raid10
-o compress
70.900 69.926 130.812 130.202

Next: Benchmark Observations

Comments on "Linux Don’t Need No Stinkin’ ZFS: BTRFS Intro & Benchmarks"

waddles

Jeff – you didn’t mention what type of disks you used for benchmarking, but your results sure look like you’re hitting the practical bandwidth limits of SATA-150. I sure would like to see a benchmark on InfiniBand 12x.

Also, what happened to comparing Ext3/Ext4 on RAID?

Reply
samrawlins

Jeff, great article! Very informative summaries of the new features. I hope you write a follow-up in a few months (years?) when BTRFS is declared stable. What type of disk(s) are you using?

I do take issue with your claims regarding compression:

When compression was turned on the performance increased by a large amount. The write and rewrite performance almost doubles and the read and reread performance increases by about 30%.

Write performance enhancements vary greatly. The Write performance increase via compression should be split into three sets of data: When using “nodatacow” and “nodatasum”, the difference is minimal. Otherwise, when using any RAID, performance increases by about 40% (50 MB/s to 70 MB/s in most tests). Otherwise (“Standard” or “-m single” options), performance more than doubles (30 MB/s to 70 MB/s).

Read performance enhancements do not vary greatly, but are also certainly not 30% when using compression. There is a

Reply
jimmershere

Excellent article! I’m glad to see this kind of technology in the OpenSource world!

I do need to point out some incorrect entries in the BTRFS vs. ZFS comparison chart, however. specifically:

ZFS can online resize.
ZFS can repair/check/defrag while online. (scrub)
ZFS can take a snapshot of a snapshot

That being said, BTRFS still sounds like an exciting addition to Linux and I can’t wait to test it out.

Reply
craig73

It would have been useful to have CPU usage statistics for all these tests to identify the cost of compression. I would have expected that the compression level/algorithm be selected to increase IO bandwidth at a slight cost to CPU overhead, but the lack of CPU numbers makes it difficult to evaluate how much of a cost that 30% came at.

[I'm also curious how tunable this is, based on larger CPUs and larger caches, or specific workloads]

Reply
craig73

(BTW – I recognize the compression of zero’s makes the compression numbers meaningless)

Reply
laytonjb

Yep – I’m hitting the bandwidth limit pretty quickly. It’s a pretty old machine. I’m trying to get the family funding agency to fund my proposal for a new machine. :) No word yet but I think the funding agency is waiting for their stimulus package to arrive.

Are you looking for a benchmark of a system with a number of drives and btrfs exporting it over NFS/RDMA over IB?

Reply
laytonjb

I’m not sure I follow you? Are you saying the read performance can vary by a great deal so 30% may be in the noise?

I didn’t run the tests a number of times and compute the average or examine the standard deviation in a systematic manner. But I did run them several times to get a feel for the results. The numbers I reported are right around the average but that is by eyeball :)

BTW – I’m not sure when they will declare Btrfs stable. I’m guessing early 2010. I talked to Chris but didn’t ask that question. They are still doing some fairly heavy lifting in the code base. I think the on-disk format is fixed but they could change it one more time (it’s still marked as “experimental” :) ).

Did I capture your questions correctly or am I missing something?

Thanks!

Reply
laytonjb

Thanks for the comments. Can you point to an on-line source for the ZFS features? I’ve been looking for some time for a comprehensive list of ZFS features without luck.

Thanks!

Reply
laytonjb

You’re absolutely correct but I didn’t track the CPU usage during the run :( Bonnie++ might be an easier way to get this information.

I have an interview with Chris Mason that will be appearing soon. He pointed out that the crc32 computations used in the compression can now be done in hardware with Nehalem. I hope to have a Nehalem system soon and I will definitely test this out.

If you want me to repost the testing and to add more tests, let the editor know :)

Thanks!

Reply
sysadmn

One pretty good intro to ZFS is here: http://opensolaris.org/os/community/zfs/whatis/

It’s a glorified bullet list with a brief description of what each feature means.

For example, one could quibble on the “snapshot of snapshots” feature:

ZFS backup and restore are powered by snapshots. Any snapshot can generate a full backup, and any pair of snapshots can generate an incremental backup. Incremental backups are so efficient that they can be used for remote replication — e.g. to transmit an incremental update every 10 seconds.

It also appears you’ve left some of the ZFS advantages out of the table – I’d encourage your readers to see the original blog posting. I don’t know if this is in BTRFS, but it’s a lifesaver:

ZFS provides unlimited constant-time snapshots and clones. A snapshot is a read-only point-in-time copy of a filesystem, while a clone is a writable copy of a snapshot. Clones provide an extremely space-efficient way to store many copies of mostly-shared data such as workspaces, software installations, and diskless clients.

Reply
mark_w

I am very glad to see this comparison, but I would like to make a few comments.

- the title/summary is poor. The comment “Linux don’t need no stinkin’ ZFS…“Butter FS” is on the horizon and it’s boasting more features and better performance” is misleading.

According to your summary table, you have one area in which BTRFS can be considered more featured than ZFS (snapshots of snapshots). Considering that you haven’t even discussed clones (which maybe do the same thing or more) and that ZFS is not only a filesystem but it, and its utilities, integrate the LVM features and it has, right now, a wide selection of RAID modes that are actually functioning and it does ‘automagic’ adaptation to volumes of differing performance values, this seems to be a claim that you get nowhere near justifying. Maybe, eventually, it will be true, but the current version doesn’t have the features and there is little evidence that ‘first stable release’ will either.

(Sorry, I’m not trying to say that BTRFS is ‘bad’, just that you are accidentally overclaiming; my guess would be that you haven’t read much on ZFS and all of its features. I think if you had, you would have realised that BTRFS on its own or in its first stable release isn’t going to do it. BTRFS plus on the fly compression, plus a revised MD, plus LVM may well, though, but I’m expecting to wait a bit for all of those to work ideally together.)

- the title leads you to believe that BTRFS will be shown to have a better performance than ZFS. The actual comparison is between BTRFS and ext filesystems with various configurations.

- one of the clever features of ZFS is protection against silent data corruption. This may not be vital to me now, but as filesystem sizes increase…a colleague deals with the situation in which a research project grabs nearly 1G of data every day and they have data going back over 5 years (and they intend to keep capturing data for another 50+ years). As he says “how do I ensure that the data that we read is the data that we wrote?”, and its a good question. ZFS is good in that, admittedly obscure, situation, but I’m not sure whether BTRFS will be.

- You make (indirectly) the very good point that with the more advanced filesystems, benchmarking performance can be rather sensitive to set-up options. I am very glad that you did that testing and hope that you can do more thorough testing as we get closer to a version 1.0 release. I am not sure that I can help you with your funding submission, though…

It is also the case that the results that you get are heavily dependant on what you measure, so the more different tests you do, the better.

- another clever feature of ZFS (sorry about sounding like sounding like a stuck record, but I am impressed by ZFS and would love to be as impressed by BTRFS) is that you can make an array incorporating SSDs and hard disks and have the system automagically sort out the usage of the various storage types to give an agregate high performance, low-cost-per-TB, array. In a commercial setting this is probably one of the biggest advantages that ZFS has over earlier technology. I’m sure that I don’t have to use the ‘O’ word here, but this is a vital advantage that Sun has, and I’m expecting to see some interesting work on optimising this system, in particular for database accesses, soon.

(BTW, this is so effective that a hybrid array, as used in Amber Road, can be so effective that comparing a hybrid array of SSDs and slow hard disks can be faster, cheaper and much more power efficient than one using the usual enterprise class SAS disks, so this can be an interesting solution, in enterprise apps)

If you wish to know more about ZFS, you can do worse than read Sun’s whitepapers on the subject; search on “sun zfs discovery day” to get more info; the presentation is very readable. (I’m not, btw, suggesting that this is an unbiased comparison of ZFS to anything else; its a description of the features and how-to-use.)

Reply
smino

I was going to say your trasnfer speeds are slow, as I get 70MB/s trasnfer files over ethernet Gige from XP to Unraid cache drive (csche meaning single drive unprotected no raid) which uses rieserFS. The drives are use are two 500GB Seagte 7200.12 and seagate 1.5TB 7200.11. I am sure the performance would be better trasnfering from drive to drive and not over ethernet. I am pretty close to hitting the Ethernet practical limit. Of course this is with my aqntivirus running and zone alarm pro, and streaming a movie from the same box.

If anyone gets a chance to test the brts performance on a fast system with those seagate drives above, please let me know.
I chose those drives because they are as fast at the veloraptor 10K drives, only more storage and cheaper!
Sly.

Reply
softweyr

One (of the many) mis-understood features of ZFS is how the file metadata works. In most existing filesystems that support arbitrary metadata, the metadata space is limited and essentially allows key/data pairs to be associated with the file. In ZFS, the metadata for each filesystem object is another entire filesystem rooted at the containing file. ZFS is, in essence, a 3D filesystem. Consider Apple’s application bundles, currently implemented as a directory that the Finder application recognizes and displays specially. Using ZFS, the application file could be the icon file for the application, all of the component files, including executables, libraries, localization packs, and other app resources, would be stored in the file metadata. To the outside world, the bundle truly appears as a single file.

ZFS has it’s warts, for instance that storage of small files is quite inefficient, but actually besting it is going to be a long, hard slog. I suspect the big commercial Linux distributions will iron out the licensing issues somehow and including ZFS in their distributions in the next year. Linux is too big a business to be driven by the religious zealots anymore.

Reply
bezya01

Will BTRFS have the option to sync to file system copies over
the network (WAN as well…)?
I believe ZFS has such an option and Linux DRBD allows this as
well on the device level.

Yours,

Jack.

http://itprofessional-mastermind.com/blog

Reply
rogerdpack

no speed comparisons with zfs?

Reply
stoatwblr

2-and-a-bit years on and ZFS is still chugging along nicely as a working, robust filesystem (and yes, it _does_ have snapshots of snapshots) which I’ve found impossible to break.

Meantime, Btrfs has again trashed itself on my systems given scenarios as simple as power failure.

That’s without even going into the joy of ZFS having working ssd read and write caching out front which substantially boosts performance while keeping the robustness (even 64Gb is more than adequate for a 10TB installation I use for testing)

If there’s any way of reconciling CDDL and GPL then the best way forward would be to combine forces. BTRFS has a lot of nice ideas but ZFS has them too and it has the advantage of a decade’s worth of actual real world deployment experience to draw on.

Less infighting among OSS devs, please.

Reply

I like the helpful info you supply on your articles. I will bookmark your weblog and test again here regularly. I am rather sure I will learn a lot of new stuff right here! Best of luck for the next!

Reply

thank you for share!

Reply

thank you for share!

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>