Linux Don’t Need No Stinkin’ ZFS: BTRFS Intro & Benchmarks

ZFS may be locked into the Solaris operating system but "Butter FS" is on the horizon and it's boasting more features and better performance.

Linux is a fun project to watch because you get to see this living and breathing piece of software change and evolve.

Recently, Linux has been undergoing something of a revolution with a respect to file systems. There are a number of projects including ext4 (See ext4 File System: Introduction and Benchmarks), squashfs, nilfs which have made it into the kernel, and tux3, which hasn’t made it into the kernel yet but is under heavy development.

And perhaps most importantly, btrfs. Btrfs holds the promise of giving Linux many enterprise class file system features similar to ZFS (See Solving Common Administration Problems with ZFS and ZFS on FUSE) but with even more features and better performance. In fact, many Linux experts think that btrfs is one of the keys to the future of Linux. While btrfs is not quite ready to be your only file system, it is in the kernel ready for testing and is still undergoing very heavy development.

In this article we will introduce the key features of btrfs find out how it compares to existing file systems. We will also test it using iozone, with the understanding that it’s still in development so performance will likely change over time.

Introduction to btrfs

Btrfs, sometimes called “Butter FS”, is a new file system for Linux that has a wealth of features users have been wanting for some time. Chris Mason, who works at Oracle, started the project and Oracle has released the project under the GPL license so anyone can work on it. Btrfs has a gained a very large following within the Linux community and has whipped up a good deal of buzz. Some of the features of btrfs that have led to this buzz are:


  • Copy on Write
  • Extents (see article about ext4)
  • Space efficient packing of small files (file systems have needed this for a long time)
  • Space efficient indexed directories
  • Dynamic inode allocation
  • Writable snapshots
  • Snapshots of snapshots
  • Subvolumes (separate internal filesystem roots)
  • Object level mirroring and striping
  • Checksums on data and metadata
  • Compression
  • Integrated multiple device support, with several RAID algorithms
  • Online filesystem check and defragmentation
  • Very fast offline filesystem check
  • Efficient incremental backup and File System mirroring

You can tell from the list of features that btrfs is a very ambitious project with a load of features (got to love that).

One of the most immediate questions one would have about btrfs is, “how does it compare to ZFS?” That’s a great question. There is a pretty good blog posting that attempts to impartially compare btrfs and ZFS. While the comparison is quite long, a quick summary of the comparison is below:

Feature
ZFS
BTRFS
Copy on Write yes yes
Snapshots yes yes
Snapshots of snapshots ? yes
Performance degradation at near 98-100% disk usage yes Most likely yes
Block level compression yes Currently a mount option
Disk encryption Being developed Planned but not currently in kernel. Encryptfs could be an option
Online resizing no (being worked on) yes
Online defragmentation no (being being worked on as part of other projects) yes
Write checksums yes yes
Built-in RAID yes (0,1,10, 5 with fixes for write hole (raidz), and some variant of 6 called raidz) yes (0, 1, 10)
ACL yes yes
Direct IO yes Writes yes. Reads No – Planned
Quotas yes yes

Next: Feature Breakdown

Comments on "Linux Don’t Need No Stinkin’ ZFS: BTRFS Intro & Benchmarks"

waddles

Jeff – you didn’t mention what type of disks you used for benchmarking, but your results sure look like you’re hitting the practical bandwidth limits of SATA-150. I sure would like to see a benchmark on InfiniBand 12x.

Also, what happened to comparing Ext3/Ext4 on RAID?

Reply
samrawlins

Jeff, great article! Very informative summaries of the new features. I hope you write a follow-up in a few months (years?) when BTRFS is declared stable. What type of disk(s) are you using?

I do take issue with your claims regarding compression:

When compression was turned on the performance increased by a large amount. The write and rewrite performance almost doubles and the read and reread performance increases by about 30%.

Write performance enhancements vary greatly. The Write performance increase via compression should be split into three sets of data: When using “nodatacow” and “nodatasum”, the difference is minimal. Otherwise, when using any RAID, performance increases by about 40% (50 MB/s to 70 MB/s in most tests). Otherwise (“Standard” or “-m single” options), performance more than doubles (30 MB/s to 70 MB/s).

Read performance enhancements do not vary greatly, but are also certainly not 30% when using compression. There is a

Reply
jimmershere

Excellent article! I’m glad to see this kind of technology in the OpenSource world!

I do need to point out some incorrect entries in the BTRFS vs. ZFS comparison chart, however. specifically:

ZFS can online resize.
ZFS can repair/check/defrag while online. (scrub)
ZFS can take a snapshot of a snapshot

That being said, BTRFS still sounds like an exciting addition to Linux and I can’t wait to test it out.

Reply
craig73

It would have been useful to have CPU usage statistics for all these tests to identify the cost of compression. I would have expected that the compression level/algorithm be selected to increase IO bandwidth at a slight cost to CPU overhead, but the lack of CPU numbers makes it difficult to evaluate how much of a cost that 30% came at.

[I'm also curious how tunable this is, based on larger CPUs and larger caches, or specific workloads]

Reply
craig73

(BTW – I recognize the compression of zero’s makes the compression numbers meaningless)

Reply
laytonjb

Yep – I’m hitting the bandwidth limit pretty quickly. It’s a pretty old machine. I’m trying to get the family funding agency to fund my proposal for a new machine. :) No word yet but I think the funding agency is waiting for their stimulus package to arrive.

Are you looking for a benchmark of a system with a number of drives and btrfs exporting it over NFS/RDMA over IB?

Reply
laytonjb

I’m not sure I follow you? Are you saying the read performance can vary by a great deal so 30% may be in the noise?

I didn’t run the tests a number of times and compute the average or examine the standard deviation in a systematic manner. But I did run them several times to get a feel for the results. The numbers I reported are right around the average but that is by eyeball :)

BTW – I’m not sure when they will declare Btrfs stable. I’m guessing early 2010. I talked to Chris but didn’t ask that question. They are still doing some fairly heavy lifting in the code base. I think the on-disk format is fixed but they could change it one more time (it’s still marked as “experimental” :) ).

Did I capture your questions correctly or am I missing something?

Thanks!

Reply
laytonjb

Thanks for the comments. Can you point to an on-line source for the ZFS features? I’ve been looking for some time for a comprehensive list of ZFS features without luck.

Thanks!

Reply
laytonjb

You’re absolutely correct but I didn’t track the CPU usage during the run :( Bonnie++ might be an easier way to get this information.

I have an interview with Chris Mason that will be appearing soon. He pointed out that the crc32 computations used in the compression can now be done in hardware with Nehalem. I hope to have a Nehalem system soon and I will definitely test this out.

If you want me to repost the testing and to add more tests, let the editor know :)

Thanks!

Reply
sysadmn

One pretty good intro to ZFS is here: http://opensolaris.org/os/community/zfs/whatis/

It’s a glorified bullet list with a brief description of what each feature means.

For example, one could quibble on the “snapshot of snapshots” feature:

ZFS backup and restore are powered by snapshots. Any snapshot can generate a full backup, and any pair of snapshots can generate an incremental backup. Incremental backups are so efficient that they can be used for remote replication — e.g. to transmit an incremental update every 10 seconds.

It also appears you’ve left some of the ZFS advantages out of the table – I’d encourage your readers to see the original blog posting. I don’t know if this is in BTRFS, but it’s a lifesaver:

ZFS provides unlimited constant-time snapshots and clones. A snapshot is a read-only point-in-time copy of a filesystem, while a clone is a writable copy of a snapshot. Clones provide an extremely space-efficient way to store many copies of mostly-shared data such as workspaces, software installations, and diskless clients.

Reply
mark_w

I am very glad to see this comparison, but I would like to make a few comments.

- the title/summary is poor. The comment “Linux don’t need no stinkin’ ZFS…“Butter FS” is on the horizon and it’s boasting more features and better performance” is misleading.

According to your summary table, you have one area in which BTRFS can be considered more featured than ZFS (snapshots of snapshots). Considering that you haven’t even discussed clones (which maybe do the same thing or more) and that ZFS is not only a filesystem but it, and its utilities, integrate the LVM features and it has, right now, a wide selection of RAID modes that are actually functioning and it does ‘automagic’ adaptation to volumes of differing performance values, this seems to be a claim that you get nowhere near justifying. Maybe, eventually, it will be true, but the current version doesn’t have the features and there is little evidence that ‘first stable release’ will either.

(Sorry, I’m not trying to say that BTRFS is ‘bad’, just that you are accidentally overclaiming; my guess would be that you haven’t read much on ZFS and all of its features. I think if you had, you would have realised that BTRFS on its own or in its first stable release isn’t going to do it. BTRFS plus on the fly compression, plus a revised MD, plus LVM may well, though, but I’m expecting to wait a bit for all of those to work ideally together.)

- the title leads you to believe that BTRFS will be shown to have a better performance than ZFS. The actual comparison is between BTRFS and ext filesystems with various configurations.

- one of the clever features of ZFS is protection against silent data corruption. This may not be vital to me now, but as filesystem sizes increase…a colleague deals with the situation in which a research project grabs nearly 1G of data every day and they have data going back over 5 years (and they intend to keep capturing data for another 50+ years). As he says “how do I ensure that the data that we read is the data that we wrote?”, and its a good question. ZFS is good in that, admittedly obscure, situation, but I’m not sure whether BTRFS will be.

- You make (indirectly) the very good point that with the more advanced filesystems, benchmarking performance can be rather sensitive to set-up options. I am very glad that you did that testing and hope that you can do more thorough testing as we get closer to a version 1.0 release. I am not sure that I can help you with your funding submission, though…

It is also the case that the results that you get are heavily dependant on what you measure, so the more different tests you do, the better.

- another clever feature of ZFS (sorry about sounding like sounding like a stuck record, but I am impressed by ZFS and would love to be as impressed by BTRFS) is that you can make an array incorporating SSDs and hard disks and have the system automagically sort out the usage of the various storage types to give an agregate high performance, low-cost-per-TB, array. In a commercial setting this is probably one of the biggest advantages that ZFS has over earlier technology. I’m sure that I don’t have to use the ‘O’ word here, but this is a vital advantage that Sun has, and I’m expecting to see some interesting work on optimising this system, in particular for database accesses, soon.

(BTW, this is so effective that a hybrid array, as used in Amber Road, can be so effective that comparing a hybrid array of SSDs and slow hard disks can be faster, cheaper and much more power efficient than one using the usual enterprise class SAS disks, so this can be an interesting solution, in enterprise apps)

If you wish to know more about ZFS, you can do worse than read Sun’s whitepapers on the subject; search on “sun zfs discovery day” to get more info; the presentation is very readable. (I’m not, btw, suggesting that this is an unbiased comparison of ZFS to anything else; its a description of the features and how-to-use.)

Reply
smino

I was going to say your trasnfer speeds are slow, as I get 70MB/s trasnfer files over ethernet Gige from XP to Unraid cache drive (csche meaning single drive unprotected no raid) which uses rieserFS. The drives are use are two 500GB Seagte 7200.12 and seagate 1.5TB 7200.11. I am sure the performance would be better trasnfering from drive to drive and not over ethernet. I am pretty close to hitting the Ethernet practical limit. Of course this is with my aqntivirus running and zone alarm pro, and streaming a movie from the same box.

If anyone gets a chance to test the brts performance on a fast system with those seagate drives above, please let me know.
I chose those drives because they are as fast at the veloraptor 10K drives, only more storage and cheaper!
Sly.

Reply
softweyr

One (of the many) mis-understood features of ZFS is how the file metadata works. In most existing filesystems that support arbitrary metadata, the metadata space is limited and essentially allows key/data pairs to be associated with the file. In ZFS, the metadata for each filesystem object is another entire filesystem rooted at the containing file. ZFS is, in essence, a 3D filesystem. Consider Apple’s application bundles, currently implemented as a directory that the Finder application recognizes and displays specially. Using ZFS, the application file could be the icon file for the application, all of the component files, including executables, libraries, localization packs, and other app resources, would be stored in the file metadata. To the outside world, the bundle truly appears as a single file.

ZFS has it’s warts, for instance that storage of small files is quite inefficient, but actually besting it is going to be a long, hard slog. I suspect the big commercial Linux distributions will iron out the licensing issues somehow and including ZFS in their distributions in the next year. Linux is too big a business to be driven by the religious zealots anymore.

Reply
bezya01

Will BTRFS have the option to sync to file system copies over
the network (WAN as well…)?
I believe ZFS has such an option and Linux DRBD allows this as
well on the device level.

Yours,

Jack.

http://itprofessional-mastermind.com/blog

Reply
rogerdpack

no speed comparisons with zfs?

Reply
stoatwblr

2-and-a-bit years on and ZFS is still chugging along nicely as a working, robust filesystem (and yes, it _does_ have snapshots of snapshots) which I’ve found impossible to break.

Meantime, Btrfs has again trashed itself on my systems given scenarios as simple as power failure.

That’s without even going into the joy of ZFS having working ssd read and write caching out front which substantially boosts performance while keeping the robustness (even 64Gb is more than adequate for a 10TB installation I use for testing)

If there’s any way of reconciling CDDL and GPL then the best way forward would be to combine forces. BTRFS has a lot of nice ideas but ZFS has them too and it has the advantage of a decade’s worth of actual real world deployment experience to draw on.

Less infighting among OSS devs, please.

Reply

I like the helpful info you supply on your articles. I will bookmark your weblog and test again here regularly. I am rather sure I will learn a lot of new stuff right here! Best of luck for the next!

Reply

thank you for share!

Reply

thank you for share!

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>