Linux Don’t Need No Stinkin’ ZFS: BTRFS Intro & Benchmarks

ZFS may be locked into the Solaris operating system but "Butter FS" is on the horizon and it's boasting more features and better performance.

Feature Breakdown

Btrfs is a very ambitious project with tons of new features. While the more important of these features have already been listed, it is worthwhile to go into more depth about some of the key features.

Dynamic inode allocation:
This feature may sound boring but it is actually quite important when creating a file system (or extending one). Dynamic inode allocation means that when the file system is created only a few inodes are created. Then when inodes are needed the file system creates them on the fly. What this feature does for you is that the creation or extension of a file system takes place extremely fast. Contrast this with ext3 that takes minutes to create the file system.

Snapshots:
Btrfs allows you to create snapshots of your file system or sections of your file system. You can then use these snapshots to create backups or as a fast emergency copy of existing data. You can also use it to grab sections of the file system to dump to an archive (after archiving you go back and erase the snapshot and the original data). Usually no modifications are made to the snapshot since it’s used for critical functions such as backups, repairs, or archiving. But in some cases, you may want to write to the snapshot and btrfs allows you to do this. For example, you might take a snapshot prior to do working with a directory. Then you can make sure the active directory updates the snapshot so you can keep an updated copy for various purposes. Btrfs allows you to this. It also allows you to take a snapshot of a snapshot. There is a great deal of flexibility in btrfs’ snapshot functionality.

Copy On Write:
Copy on write is a technique that allows a piece of data to be copied when it is written (making two copies). This has all kinds of uses in btrfs. For example, btrfs can use it in conjunction with snapshots or even snapshots of snapshots and allow them to be easily updated. Btrfs also uses copy on write in conjunction with logging. This makes the file system even more resilient because a copy of the data or metadata to be written is kept in case of a power failure (basically you can keep a copy of the journal until the file system is absolutely sure that the journal has been committed and the data or metadata is on the disk and correct).

Subvolumes:
Btrfs has the capability of taking parts of a file system and mounting them as the root part for an internal file system. This is terribly useful it you want to limit user access to a certain potion of a directory structure. For example if there is a subdirectory that users need to access without being allowed access to other parts of the main directory, then the user subdirectory can be mounted as a subvolume and to the user it appears as root file system for that data.

Multiple Devices:
With current Linux file systems, if we want to create a RAID-0 or RAID-1 or any other RAID level, then ideally we have to use lvm to create the volumes and then use a hardware RAID card or software RAID (md) to combine the volumes into a device that can then be formatted by the file system. This can make things difficult in terms of management. But btrfs can do multiple devices (RAID) as part of the file system. At this time, it can do RAID-0, RAID-1, RAID-10 but will be adding other RAID levels. Btrfs also allows you to add devices (disks) to the file system once the file system has been formatted (dynamic inode allocations are a big key as well) and allows you to remove devices from the file system (all while it’s mounted).

Fsck and Defragmentation Enhancements:
Fsck can be the bane of an administrator’s existence because you usually have to take the file system offline, run fsck to repair it, which can take a great deal of time, and then remount the file system (assuming everything was fine). Btrfs allows you to perform an fsck on a mounted file system that is actually in use. While the performance of a file system undergoing an fsck is not spectacular, you can still use the file system. In addition, despite the best efforts of file system developers, fragmentation happens and can severely impact performance. To “defrag” a file system you have to also take it off line, perform the defragmentation, and then remount the file system. Btrfs allows you to defrag the file system while it’s mounted and in use.

Encryption:
Encryption of file systems is becoming an ever popular topic especially for corporate systems such as laptops, that are stolen from time to time (you see the occasional article about a laptop that is lost of stolen that has sensitive information on the hard drive). There are encryption add-ons that can be used with existing file systems to provide encryption. Btrfs has strong encryption built into the file system and will be adding additional encryption techniques in the future (remember that it’s a work in progress).

Compression:
In addition to encryption, btrfs can also provide compression to save space and improve performance. Currently it uses the zlib capabilities built into the kernel.

Btrfs – Coming Soon?

When a new feature for Linux is discussed there are many people who take that to mean that the feature is fully baked and ready to use. However, the truth is that the feature is not yet ready and still requires a great deal of testing, debugging, and development. These new features are almost always initially developed outside the kernel. Once the feature reaches a certain point it is sometimes added to the kernel and marked as “experimental”. This is done so that the feature gets more exposure and so it can be developed as part of the kernel rather than try to hoist a potentially large external code base into the kernel.

This process is exactly the process being followed in the development of btrfs. Initially the code base was developed outside the kernel. Recently, in the 2.6.29 kernel, btrfs was added with the “experimental” label. The goal is to get much wider testing and exposure while making the development process easier because btrfs is in the kernel. So btrfs is not really ready for prime time use, but it is ready for testing. If you want to help Linux development and don’t have the skills for kernel development (like me), you can make a definite contribution by testing something like btrfs and report any problems to the btrfs mailing list.

Creating btrfs File Systems and Benchmarking

As with the previous article on ext4, it is always good to start with a quick introduction to commands for a new file system. Btrfs is no exception.


The btrfs wiki contains a set of instructions on getting started with btrfs at the moment. Assuming that one follows these directions, then diving into creating file systems is fairly easy.

A simple way to start is by using,

% mkfs.btrfs /dev/sda1

This happens so fast that I ran the “date” command before and after creating the filesystem to see how long the command took. Here is the output from the command:

[root@test64 ~]# ./test.sh
Sat Apr 18 15:47:50 EDT 2009

WARNING! - Btrfs Btrfs v0.18 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on /dev/sda1
        nodesize 4096 leafsize 4096 sectorsize 4096 size 465.76GB
Btrfs Btrfs v0.18
Sat Apr 18 15:47:51 EDT 2009

As your can see it took about 1 second to create a file system on a 465GB file system. The is a direct result of the dynamic inode allocation.

Once the file system is created it is easy to mount.

% mount -t btrfs /dev/sda1 /mnt/data_btrfs

It is really simple to create a file system from multiple devices. At this time by default, the metadata is mirrored across all of the devices. and the data is striped across all available devices. In addition, btrfs allows you to define the metadata behavior. You can have metadata in the following manner:


  • RAID-0 (metadata is appended across all devices present)
  • RAID-1 (metadata is mirrored across all devices present)
  • RAID-10 (metadata is appended and mirrored across all devices present)
  • single (metadata is mirrored on a single device)

An example of building btrfs with multiple devices is,

% mkfs.btrfs /dev/sda1 /dev/sdb1

Then you can mount the file system as before using either /dev/sda1 or /dev/sdb1. Note that using multiple devices doesn’t slow down the creation of the file system compared to a single device.

Next: Benchmarking btrfs

Comments on "Linux Don’t Need No Stinkin’ ZFS: BTRFS Intro & Benchmarks"

waddles

Jeff – you didn’t mention what type of disks you used for benchmarking, but your results sure look like you’re hitting the practical bandwidth limits of SATA-150. I sure would like to see a benchmark on InfiniBand 12x.

Also, what happened to comparing Ext3/Ext4 on RAID?

Reply
samrawlins

Jeff, great article! Very informative summaries of the new features. I hope you write a follow-up in a few months (years?) when BTRFS is declared stable. What type of disk(s) are you using?

I do take issue with your claims regarding compression:

When compression was turned on the performance increased by a large amount. The write and rewrite performance almost doubles and the read and reread performance increases by about 30%.

Write performance enhancements vary greatly. The Write performance increase via compression should be split into three sets of data: When using “nodatacow” and “nodatasum”, the difference is minimal. Otherwise, when using any RAID, performance increases by about 40% (50 MB/s to 70 MB/s in most tests). Otherwise (“Standard” or “-m single” options), performance more than doubles (30 MB/s to 70 MB/s).

Read performance enhancements do not vary greatly, but are also certainly not 30% when using compression. There is a

Reply
jimmershere

Excellent article! I’m glad to see this kind of technology in the OpenSource world!

I do need to point out some incorrect entries in the BTRFS vs. ZFS comparison chart, however. specifically:

ZFS can online resize.
ZFS can repair/check/defrag while online. (scrub)
ZFS can take a snapshot of a snapshot

That being said, BTRFS still sounds like an exciting addition to Linux and I can’t wait to test it out.

Reply
craig73

It would have been useful to have CPU usage statistics for all these tests to identify the cost of compression. I would have expected that the compression level/algorithm be selected to increase IO bandwidth at a slight cost to CPU overhead, but the lack of CPU numbers makes it difficult to evaluate how much of a cost that 30% came at.

[I'm also curious how tunable this is, based on larger CPUs and larger caches, or specific workloads]

Reply
craig73

(BTW – I recognize the compression of zero’s makes the compression numbers meaningless)

Reply
laytonjb

Yep – I’m hitting the bandwidth limit pretty quickly. It’s a pretty old machine. I’m trying to get the family funding agency to fund my proposal for a new machine. :) No word yet but I think the funding agency is waiting for their stimulus package to arrive.

Are you looking for a benchmark of a system with a number of drives and btrfs exporting it over NFS/RDMA over IB?

Reply
laytonjb

I’m not sure I follow you? Are you saying the read performance can vary by a great deal so 30% may be in the noise?

I didn’t run the tests a number of times and compute the average or examine the standard deviation in a systematic manner. But I did run them several times to get a feel for the results. The numbers I reported are right around the average but that is by eyeball :)

BTW – I’m not sure when they will declare Btrfs stable. I’m guessing early 2010. I talked to Chris but didn’t ask that question. They are still doing some fairly heavy lifting in the code base. I think the on-disk format is fixed but they could change it one more time (it’s still marked as “experimental” :) ).

Did I capture your questions correctly or am I missing something?

Thanks!

Reply
laytonjb

Thanks for the comments. Can you point to an on-line source for the ZFS features? I’ve been looking for some time for a comprehensive list of ZFS features without luck.

Thanks!

Reply
laytonjb

You’re absolutely correct but I didn’t track the CPU usage during the run :( Bonnie++ might be an easier way to get this information.

I have an interview with Chris Mason that will be appearing soon. He pointed out that the crc32 computations used in the compression can now be done in hardware with Nehalem. I hope to have a Nehalem system soon and I will definitely test this out.

If you want me to repost the testing and to add more tests, let the editor know :)

Thanks!

Reply
sysadmn

One pretty good intro to ZFS is here: http://opensolaris.org/os/community/zfs/whatis/

It’s a glorified bullet list with a brief description of what each feature means.

For example, one could quibble on the “snapshot of snapshots” feature:

ZFS backup and restore are powered by snapshots. Any snapshot can generate a full backup, and any pair of snapshots can generate an incremental backup. Incremental backups are so efficient that they can be used for remote replication — e.g. to transmit an incremental update every 10 seconds.

It also appears you’ve left some of the ZFS advantages out of the table – I’d encourage your readers to see the original blog posting. I don’t know if this is in BTRFS, but it’s a lifesaver:

ZFS provides unlimited constant-time snapshots and clones. A snapshot is a read-only point-in-time copy of a filesystem, while a clone is a writable copy of a snapshot. Clones provide an extremely space-efficient way to store many copies of mostly-shared data such as workspaces, software installations, and diskless clients.

Reply
mark_w

I am very glad to see this comparison, but I would like to make a few comments.

- the title/summary is poor. The comment “Linux don’t need no stinkin’ ZFS…“Butter FS” is on the horizon and it’s boasting more features and better performance” is misleading.

According to your summary table, you have one area in which BTRFS can be considered more featured than ZFS (snapshots of snapshots). Considering that you haven’t even discussed clones (which maybe do the same thing or more) and that ZFS is not only a filesystem but it, and its utilities, integrate the LVM features and it has, right now, a wide selection of RAID modes that are actually functioning and it does ‘automagic’ adaptation to volumes of differing performance values, this seems to be a claim that you get nowhere near justifying. Maybe, eventually, it will be true, but the current version doesn’t have the features and there is little evidence that ‘first stable release’ will either.

(Sorry, I’m not trying to say that BTRFS is ‘bad’, just that you are accidentally overclaiming; my guess would be that you haven’t read much on ZFS and all of its features. I think if you had, you would have realised that BTRFS on its own or in its first stable release isn’t going to do it. BTRFS plus on the fly compression, plus a revised MD, plus LVM may well, though, but I’m expecting to wait a bit for all of those to work ideally together.)

- the title leads you to believe that BTRFS will be shown to have a better performance than ZFS. The actual comparison is between BTRFS and ext filesystems with various configurations.

- one of the clever features of ZFS is protection against silent data corruption. This may not be vital to me now, but as filesystem sizes increase…a colleague deals with the situation in which a research project grabs nearly 1G of data every day and they have data going back over 5 years (and they intend to keep capturing data for another 50+ years). As he says “how do I ensure that the data that we read is the data that we wrote?”, and its a good question. ZFS is good in that, admittedly obscure, situation, but I’m not sure whether BTRFS will be.

- You make (indirectly) the very good point that with the more advanced filesystems, benchmarking performance can be rather sensitive to set-up options. I am very glad that you did that testing and hope that you can do more thorough testing as we get closer to a version 1.0 release. I am not sure that I can help you with your funding submission, though…

It is also the case that the results that you get are heavily dependant on what you measure, so the more different tests you do, the better.

- another clever feature of ZFS (sorry about sounding like sounding like a stuck record, but I am impressed by ZFS and would love to be as impressed by BTRFS) is that you can make an array incorporating SSDs and hard disks and have the system automagically sort out the usage of the various storage types to give an agregate high performance, low-cost-per-TB, array. In a commercial setting this is probably one of the biggest advantages that ZFS has over earlier technology. I’m sure that I don’t have to use the ‘O’ word here, but this is a vital advantage that Sun has, and I’m expecting to see some interesting work on optimising this system, in particular for database accesses, soon.

(BTW, this is so effective that a hybrid array, as used in Amber Road, can be so effective that comparing a hybrid array of SSDs and slow hard disks can be faster, cheaper and much more power efficient than one using the usual enterprise class SAS disks, so this can be an interesting solution, in enterprise apps)

If you wish to know more about ZFS, you can do worse than read Sun’s whitepapers on the subject; search on “sun zfs discovery day” to get more info; the presentation is very readable. (I’m not, btw, suggesting that this is an unbiased comparison of ZFS to anything else; its a description of the features and how-to-use.)

Reply
smino

I was going to say your trasnfer speeds are slow, as I get 70MB/s trasnfer files over ethernet Gige from XP to Unraid cache drive (csche meaning single drive unprotected no raid) which uses rieserFS. The drives are use are two 500GB Seagte 7200.12 and seagate 1.5TB 7200.11. I am sure the performance would be better trasnfering from drive to drive and not over ethernet. I am pretty close to hitting the Ethernet practical limit. Of course this is with my aqntivirus running and zone alarm pro, and streaming a movie from the same box.

If anyone gets a chance to test the brts performance on a fast system with those seagate drives above, please let me know.
I chose those drives because they are as fast at the veloraptor 10K drives, only more storage and cheaper!
Sly.

Reply
softweyr

One (of the many) mis-understood features of ZFS is how the file metadata works. In most existing filesystems that support arbitrary metadata, the metadata space is limited and essentially allows key/data pairs to be associated with the file. In ZFS, the metadata for each filesystem object is another entire filesystem rooted at the containing file. ZFS is, in essence, a 3D filesystem. Consider Apple’s application bundles, currently implemented as a directory that the Finder application recognizes and displays specially. Using ZFS, the application file could be the icon file for the application, all of the component files, including executables, libraries, localization packs, and other app resources, would be stored in the file metadata. To the outside world, the bundle truly appears as a single file.

ZFS has it’s warts, for instance that storage of small files is quite inefficient, but actually besting it is going to be a long, hard slog. I suspect the big commercial Linux distributions will iron out the licensing issues somehow and including ZFS in their distributions in the next year. Linux is too big a business to be driven by the religious zealots anymore.

Reply
bezya01

Will BTRFS have the option to sync to file system copies over
the network (WAN as well…)?
I believe ZFS has such an option and Linux DRBD allows this as
well on the device level.

Yours,

Jack.

http://itprofessional-mastermind.com/blog

Reply
rogerdpack

no speed comparisons with zfs?

Reply
stoatwblr

2-and-a-bit years on and ZFS is still chugging along nicely as a working, robust filesystem (and yes, it _does_ have snapshots of snapshots) which I’ve found impossible to break.

Meantime, Btrfs has again trashed itself on my systems given scenarios as simple as power failure.

That’s without even going into the joy of ZFS having working ssd read and write caching out front which substantially boosts performance while keeping the robustness (even 64Gb is more than adequate for a 10TB installation I use for testing)

If there’s any way of reconciling CDDL and GPL then the best way forward would be to combine forces. BTRFS has a lot of nice ideas but ZFS has them too and it has the advantage of a decade’s worth of actual real world deployment experience to draw on.

Less infighting among OSS devs, please.

Reply

I like the helpful info you supply on your articles. I will bookmark your weblog and test again here regularly. I am rather sure I will learn a lot of new stuff right here! Best of luck for the next!

Reply

thank you for share!

Reply

thank you for share!

Reply

ADlXqM sfniwfumhdfb, [url=http://zinhdqbcoowg.com/]zinhdqbcoowg[/url], [link=http://ottqymguefje.com/]ottqymguefje[/link], http://wfymyzusaqym.com/

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>