dcsimg

ZFS on FUSE

Although its features and terminology may seem strange if you're used to more traditional Linux filesystems, ZFS offers a great deal of flexibility.

In recent years, filesystems have undergone dramatic changes. In the Linux world, ext2fs has long been the standard filesystem, but its journaling successor, ext3fs, has become prevalent, along with other journaling filesystems. Most recently, a new upstart, the Zettabyte File System (ZFS), has been gaining attention.

As described shortly, ZFS boasts many revolutionary features, although it’s also got some serious problems from a Linux perspective. Despite these problems, ZFS is worth investigating — if nothing else, as a hint of future filesystem developments generally.

ZFS was released under Sun’s Common Development and Distribution License (CDDL), which is incompatible with the GNU Public License (GPL) used by Linux. For this reason, ZFS is only available for Linux in the form of a Filesystem in Userspace (FUSE) module. This means that, despite its promise, ZFS is not a good choice to use as the only filesystem on a Linux system, although you can use it for storing user files and other non-system files. ZFS for Linux is still fairly immature, so I recommend you treat it as experimental.

If its features are extremely important to you, you might want to consider looking at OpenSolaris, which has more mature ZFS support. If you’d like to get an advance look at ZFS and try it out on some non-mission-critical systems, though, using ZFS on FUSE is a perfectly reasonable alternative.

Why Use ZFS?

So, just what are these advanced features of ZFS? Its developers re-examined filesystem design from the ground up. The result, according to the ZFS Web page, is that its developers have “blown away 20 years of obsolete assumptions, eliminated complexity at the source, and created a storage system that’s actually a pleasure to use.” Examples of its advanced features include:

  • Pooled storage — The filesystem enables storage space from multiple disks to be “pooled” and allocated to specific mount points on an as-needed basis. This feature effectively combines volume-resizing and Logical Volume Management (LVM) features, but at the filesystem level. With ZFS, you don’t need to worry about how to size your partitions, since the filesystem adjusts itself automatically!
  • RAID-Z — This feature is similar to Redundant Array of Independent Disks (RAID) level 5 support, but with less overhead.
  • Always-consistent disk space — The disk space is kept consistent at all times through copy-on-write operations. Thus, there’s no need to perform disk checks after a system crash or power outage.
  • Disk scrubbing — This feature is similar to the Error Correcting Code (ECC) feature of certain computer memory modules; it permits the computer to detect and correct on-disk errors.
  • Snapshots and clones — A snapshot is a read-only copy of a filesystem, and a clone is a read/write copy of a filesystem. You can use snapshots and clones to preserve a fixed version of a filesystem or to make backups of a filesystem.
  • Built-in compression — ZFS supports compression at the filesystem level, which is particularly handy if you’re low on disk space or if you store highly compressible data.

These features make ZFS an appealing filesystem for many purposes. The pooled storage model alone should be enough to pique the interest of anybody who’s managed Linux systems using old-style partitions!

The main drawback to ZFS on Linux include its early status (as I write, it’s currently at version 0.4.0 beta 1) and the fact that it must be implemented via a FUSE module. These factors mean that there are certain things that ZFS on Linux can’t do. Most notably, ZFS for FUSE doesn’t yet support Access Control Lists (ACLs), exporting via the Network File System (NFS), or storing swap space on a ZFS volume.

Check the STATUS file in the ZFS distribution to see a complete list of unsupported features. ZFS for FUSE has extensive memory requirements; it chews up about 128MB of RAM, and its developers recommend that it be used only on systems with 1GB or more of RAM. Because ZFS for Linux is only available via FUSE, it’s not a good choice for your root filesystem. You should perform stability and performance tests on your own hardware before entrusting critical data to ZFS.

FUSE Basics

Because ZFS for Linux is a FUSE module, you must first get FUSE up and running before you can use ZFS. For more on FUSE you can check out my earlier article (Lighting the FUSE) or the main FUSE Web page, for details on using FUSE. A quick introduction should be enough to get you going, though.

The main FUSE package, available from its Web site, includes user-mode tools and Linux kernel modules. Chances are your distribution includes a ready-made FUSE package (Fedora and Gentoo both provide packages called fuse, and Ubuntu provides one called fuse-utils, for instance.) Use synaptic, yum, emerge, or whatever package-management tools your distribution provides to locate and install this package. If you can’t locate a distribution-specific package, you can download the source code from the main FUSE Web site.

FUSE looks for kernel support while it’s building, and if it doesn’t find that support, FUSE will build it. If you prefer, though, you can build the kernel modules as part of a standard Linux kernel compilation — FUSE is available as a regular filesystem option. Once installed, the main FUSE package provides support libraries, documentation, and user utilities. The most important of these user tools is fusermount, which is used to mount FUSE-based filesystems.

Obtaining and Installing ZFS for FUSE

Few distributions provide ready-made ZFS for FUSE packages (Gentoo is an exception). If your distribution provides such a package, it will likely be the easiest way to add ZFS support to your system; however, you should check the version to be sure it’s not too far behind the latest version available on the ZFS for FUSE Web site.

If you must install ZFS from source code, the steps are fairly typical, with a notable exception:

  1. Download the tarball (zfs-fuse-0.4.0_beta1.tar.bz2 is the latest as I write).
  2. In a suitable directory, uncompress the tarball by typing tar xvfj zfs-fuse-0.4.0_beta1.tar.bz2. The result is a subdirectory containing the source code, documentation, and support files.
  3. Read the README, INSTALL, and STATUS files to get up to speed on the status of the software and so you’ll be aware of any important deviations from what I describe here.
  4. Change into the src subdirectory and type scons. The scons program is similar to the more commonly-used make program; it directs the compilation process to compile the ZFS code.
  5. Type scons install as root to install the ZFS files.

As you might guess by steps four and five, ZFS for FUSE relies on scons. This program is likely to be available as a ready-made package for your system, but if not, you can obtain its source code from scons.org. ZFS for FUSE also requires Linux kernel version 2.6.x (2.6.15 or later being recommended), FUSE version 2.5.x or later (although Gentoo’s 2.6.0rc1 release doesn’t work), and glibc 2.3.3 or later. Version 0.4.0 beta 1 works only with x86 and AMD64 CPUs.

Using ZFS

In order to function, ZFS requires the zfs-fuse program to be running (launched as root). You can launch this program by using the run.sh script in the ZFS for FUSE src/zfs-fuse directory. Gentoo provides a suitable startup script (/etc/init.d/zfs-fuse) as part of its SysV init script inventory. If you don’t have such a script, and if you intend to use ZFS on a regular basis, you’ll probably want to modify your local startup scripts to launch run.sh, or cut-and-paste its contents into an existing startup script. Note that zfs-fuse isn’t a true daemon, so you’ll want to launch it in the background by appending an ampersand (&) to its command line in your local startup script.

Before proceeding further, you should understand the distinction between pools and data sets. A pool is a named collection of one or more partitions or disks. Each pool may contain one or more data sets, each of which can be mounted at a different mount point. When you create a ZFS pool, it may be immediately accessed as a data set, and you can create and delete additional data sets within the pool as you see fit. You’ll use two commands, zpool and zfs, to manipulate pools and data sets.

With the zfs-fuse program running, your first step is to prepare one or more partitions to be part of a ZFS pool. You can do this with the zpool command:

zpool create mypool /dev/sda1

This example creates a ZFS pool, and associated initial data set, on the /dev/sda1 device and mounts the data set at /mypool. You should of course change the pool name and source device for your system. If you want to change the mount point, you can use the zfs utility and its set option:

zfs set mountpoint=/usr/local mypool

This command changes the mount point from /mypool to /usr/local. In the process, the data set may be unmounted. You can remount it using the zfs utility’s mount option, in one of two ways:

zfs mount -a
zfs mount mypool

The -a option mounts all the defined data sets, whereas the second mounts only the specified data set. You can include one of these commands in a local startup script if you want your ZFS data sets to be accessible whenever you boot the system. If you want to unmount a data set, you do so with the zfs unmount command:

zfs unmount -a
zfs unmount mypool

Note that the subcommand name is unmount, with two ns, unlike the Linux umount command, which has just one n. As you might guess, these two commands unmount all ZFS data sets or just the specified data set, respectively. Note that you should generally use the zfs tool to manipulate ZFS data sets rather than the standard Linux mount and umount commands. ZFS remembers the mount points, so you don’t need to specify them on the command line or in /etc/fstab.

If you want to use a pool to store data under two directories (say, /opt and /usr/local), you can use zfs to create a new data set within the pool:

zfs create mypool/opt
zfs set mountpoint=/opt mypool/opt
zfs mount -a

At this point, the /dev/sda1 device you dedicated to the pool acts much like two partitions, one mounted at /usr/local (the data set associated with the original pool) and the other at /opt (the new data set); however, disk space will be allocated dynamically, so that if you end up storing more data in /opt and less in /usr/local than you’d anticipated, you won’t have any problems.

If you run out of disk space in your initial pool, you can expand it using zpool‘s add command (provided you’ve got a free disk or partition):

zpool add mypool /dev/sdb1

This example adds /dev/sdb1 to the existing mypool pool. Be aware, however, that removing devices from a pool isn’t as easy as adding them; thus, you shouldn’t add devices to a pool just to test this feature.

Before shutting down the system, you should issue the zpool export command. You should also issue this same command before moving the volume to a non-Linux system (say, if you use ZFS on a removable disk). This command cleans up the system and makes your ZFS pools inaccessible until you use the zpool import mypool command, where mypool is the name of the pool you want to use. The ZFS utilities then make this pool available and mount it. If you don’t remember a pool name, typing zpool import causes the ZFS utilities to scan the disks for available pools.

Advanced ZFS Tricks

Unfortunately, man pages relevant to ZFS have not yet been ported to Linux, as of ZFS for FUSE 0.4.0 beta 1. You can, however, learn something of the relevant commands by typing zpool or zfs with no options; the result is a list of subcommands and the options they take. Documentation on Sun’s main ZFS site can also help you on your way. Some highlights of advanced features of ZFS include:

  • Compression — You can enable compression by typing zfs set compression=on mypool (substituting your data set name, of course).
  • Quotas — You can give quotas to particular data sets, as in zfs set quota=10G mypool/opt to give the mypool/opt data set a 10GB quota. Disk usage won’t be permitted to exceed the specified value for this data set.
  • Reservations — You can reserve at least a given amount of space for a specific data set, as in zfs set reservation=5G mypool/opt. This command guarantees that the mypool/opt data set will be able to store at least 5GB of data.
  • Disk checks — Type zpool scrub mypool (substituting your pool name) to check the data integrity on the pool. This is equivalent to the fsck command.
  • Replacing disks — If you discover that a disk is flaky or you otherwise want to replace it, you can use the replace subcommand to zpool, as in zpool replace mypool /dev/sda1 /dev/sdc1. This example replaces /dev/sda1 with /dev/sdc1. This command can take a while to execute, since a lot of data may need to be copied.
  • Creating snapshots — To create a snapshot (a read-only copy) of a data set, use the zfs snapshot command, as in zfs snapshot mypool@backup. This example creates a snapshot of mypool called backup. Note that the snapshot is stored in the same pool as the original.
  • Rolling back a snapshot — You can use the rollback option to zfs to restore a snapshot, as in zfs rollback mypool@backup to restore the filesystem to the state it was in when the backup snapshot was taken.
  • Destroying data sets and pools — The destroy option to zfs destroys data sets, while the same option to zpool destroys an entire pool, as in zfs destroy mypool/opt or zpool destroy mypool.

These examples don’t exhaust all you can do with ZFS. Although its features and terminology may seem strange at first if you’re used to more traditional Linux filesystems, ZFS offers a great deal of flexibility. You can learn more by experimenting with this filesystem (on a non-critical system, of course!) or by reading the documentation on the Web sites I’ve already referenced.

Comments on "ZFS on FUSE"

aligature

I’ve been using this on a CentOS 5.2 system for about six months. I’m running kernel supported ZFS on a Mac Pro serving my gigs of personal data. The ZFS on FUSE filesystem runs on a big usb drive on the CentOS system in a different state. Nightly rsyncs and I have a geographically separate backup complete with nightly ZFS snapshots. I find the ZFS snapshot feature invaluable, making sure that I don’t accidentally delete all of my important data.

One place where I disagree with this article though is how to install ZFS on FUSE. If you look at his blog site (http://zfs-on-fuse.blogspot.com/), you’ll see that he recommends running the latest code out of the mercurial trunk. That’s what I’ve been doing and I haven’t had any issues.

Reply
billtodd

Whether many of ZFS’s features qualify as ‘revolutionary’ (or even ‘advanced’) is subject to debate:

1. Pooled storage while a good idea is hardly a new one, nor is ZFS’s implementation nearly as automated as it might be (e.g., RAID groups still have to be defined manually, disk by disk – just as with a conventional LVM).

2. RAID-Z might more reasonably be described as ‘brain-damaged’, given that it has dramatically *more* overhead than conventional RAID-5: every small write operation hits every disk in the stripe (rather than writes to just two of the disks after reading them – with at least one of the reads often satisfied in cache), and (even worse) every small *read* operation hits all but one of the disks in the stripe.

3. ‘Always consistent disk space’ has been available in many file systems – e.g., VxFS, XFS, JFS, NTFS, WAFL – since the early ’90s, the last of which is a copy-on-write implementation (the others use a transaction log to protect internal integrity, which has advantages in minimizing fragmentation in files which are updated at fine grain but may be read in bulk – especially since last I knew ZFS had no defragmenter).

4. Disk scrubbing has been available in Linux for years – and it’s analogous to RAM scrubbing, not to ECC (it only detects errors, allowing some other mechanism to correct them if sufficient redundancy exists to do so). ZFS’s additional internal checksums can (like similar checksums in NetApp’s WAFL) catch some errors that conventional scrubbing can’t – perhaps improving the 99.99% of otherwise undetected errors that conventional scrubbing would catch to 99.999% (yes, some people actually need this last decimal place of reassurance, but probably not very many).

5. Snapshots are relatively old hat by now (thanks to NetApp’s leading the way 15 years ago). Clones are more interesting, but arguably far less important.

6. Built-in compression has been part of NTFS since one of the early NT releases, and standard (though layered) compression on commodity platforms dates back to Microsoft DOS in the ’80s. Given storage prices these days, far fewer people bother using it than used to (not that it isn’t nice to have for special needs).

I really applaud Sun’s initiative in what has otherwise been a sadly-neglected area of corporate system development, but am less happy about the degree of hype (“ZFS – The Last Word In File Systems” being particularly notable) in which ZFS has been wrapped for public consumption. So I tend to comment upon the latter when it happens to cross my path.

Reply
kebabbert

For those of you who says ZFS is nothing new, please read this article discussing the future of file systems:

http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=504

They say things like, a modern hard drive always devote 20% of its capacity to error correcting codes. Some of the errors cant even be detected nor corrected. And the larger the drives, more causes of silent data corruptions will occur. Unless you use ZFS of course, which fixes all these problems.

The link above is copied from this Linux user who tries out ZFS (from slashdot) for his reliable home file server:

http://breden.org.uk/2008/03/02/home-fileserver-i%e2%80%99ll-use-zfs/

Reply
sfsetse

In the step where you created your pool, you used the partition /dev/sda1 rather than the full drive /dev/sda. If ZFS for Linux ever escapes the FUSE layer, this usage is sub-optimal as ZFS understands when it has full use of the drive, and will adjust its caching and write behaviors accordingly. Basically, it knows that when the full write path to the drive is under its control, it can intelligently reorder the write packets to make optimal use of the bandwidth when sending packets to the drive so that head movement is minimized and such. This is not possible when a partition is used.

Change:
$ zpool create mypool /dev/sda1

To:
$ zpool create mypool /dev/sda

- kate

Reply
billtodd

Oh, dear – I see that we’ve got one of those victims of the unjustified hype that I was referring to right here – but he doesn’t seem to have understood my own post (though he seems to have been responding to it).

In particular, not only did I note ZFS’s marginally superior detection of otherwise undetectable errors (which are orders of magnitude less common than detectable but uncorrectable errors – in the absence of scrubbing the primary cause of data loss in high-density arrays when a rebuild is required), but I also mentioned that WAFL had comparable integrity-checking mechanisms (and, incidentally, it had them before ZFS did). Leaving aside the fact that IBM’s i-series (previously AS/400, nee System 38) systems have included supplementary data-integrity checksums for decades (as have high-end block-storage devices like EMC’s Symmetrix): these are also comparably capable of catching uncorrected and even undetected bit-rot on a disk, though less capable of detecting wild or lost writes.

Thus the ZFS developers reveal only their own ignorance when stating that no external storage systems provide comparable data integrity – and in general the acm article cited reads more like Sun marketing literature than like a serious journal submission (not to mention having been conducted by a Sun moderator…).

- bill

Reply
mattwillsher

Bill, you seem to be missing a key point of ZFS – the ‘I’ in RAID. ZFS is inexpensive. All the systems you mention, sure they’ve done it before and often better, but they cost $$$.

ZFS comes with the OS at no additional charge. It offers some of the benefits of technology that belonged in the high end and enterprise markets to low and mid range users. Sure, some of the technologies are available in other file systems for the lower end market but ZFS brings them together and gives a pretty clean interface for managing what is underneath a relatively complex system.

It’s ideal where money is tight, knowledge is limited and performance isn’t a driving force.

Reply
tbuskey

The comments above are correct about ZFS not being that new. You could say the same thing about the iPod and the iPhone when they came out. Nothing new, others are doing it better technically.

ZFS brings it all together at an affordable price.

I used a NetApp in 1996 that was a 486 with EISA bus and 14 disks that were not removable w/o a shutdown and lots of screws. Snapshots were fantastic. Speed was better then the Sparc 10s. Everything was 10T though we might’ve had FDDI.

And it was $$$$$. The SCSI controller was $2000 or so. Adaptec 17xx from NetApp vs $500 elsewhere. Disks were similar.

Now, I can can get generic SATA drives, a $20 4 port SATA controller and build a fileserver with everything that NetApp had and maybe a little more. Before ZFS came out, I either built a Linux box w/o snapshots, a more complicated grow/shrink, or used a prebuilt appliance that cost $$ with less performance then my Linux box.

I’m looking forward to btrfs offering the same capabilities and economies with better device support.

FWIW – at least I can get ZFS at home. I can’t get WAFL….

Reply

On top of this, unsaturated fats provide a slow release of energy,
so unlike high sugar foods and refined carbohydrates, will
keep you satiated for longer. Women should take half
a serving, and men should take one serving, mixed in the recommended amounts
of water. If you want extra support to write your goals in the
STAR format just write an email to info@loseweighttowin.

Reply

Ahaa, its pleasant conversation concerning this post here at this web site, I have read
all that, so now me also commenting here.

Reply

I have read so many articles or reviews concerning the blogger lovers however this paragraph is truly
a good piece of writing, keep it up.

Reply

hello!,I like your writing very much! share we communicate more about your post on AOL? I need an expert on this area to solve my problem. Maybe that’s you! Looking forward to see you.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>