NILFS: A File System to Make SSDs Scream

The 2.6.30 kernel is chock full of next-gen file systems. One such example is NILFS, a new log-structured file system that dramatically improves write performance.

It’s difficult to write storage articles at this time and not focus on the upcoming 2.6.30 kernel. Why? This kernel is loaded with a number of new file systems — some of which we’ve already covered, like ext4 and btrfs. Another of the hot new file systems that is in 2.6.30 is NILFS. This file system is definitely one that you should be testing.

NILFS2 (New Implementation of a Log-Structured File System Version 2) is a very promising new log-structured file system that has continuous snapshots and versioning of the entire file system. This means that you can recover files that were deleted or unintentionally modified as well as perform backups at any time from a snapshot without a performance penalty normally associated with creating snapshots. In addition, there is evidence that NILFS has extremely good performance on SSD drives.

Log-Structured File System?

Log-Structured File Systems are a bit different than other file systems with both good points and bad points. Rather than write to a tree structure such as a b-tree or an h-tree, either with or without a journal, a log-structured file system writes all data and metadata sequentially in a continuous stream that is called a log (actually it is a circular log).

The concept was developed by John Ousterhout of TCL fame and Fred Douglis. The motivation behind log-structured file systems is that typical file systems lay out data based on spatial locality for rotating media (hard drives). But rotating media tends to have slow seek times limiting write performance. In addition, it was presumed that most IO would become write dominated (this observation is supported by a study that was summarized in a recent article). So a log-structured file system takes a new approach and treats the file system as a circular log and writes sequentially to the “head” of the log (the beginning) never over writing the existing log. This means that seeks are kept to a minimum because everything is sequential, improving write performance.

A log-structured file system, because of its design, makes it very easy to create snapshots (in NILFS they are called checkpoints) of both the data and metadata. NILFS can then mount these checkpoints (or snapshots) along side the primary NILFS file system. From these checkpoints, you can recover erased files (if the checkpoint has a date and time prior to when the file was erased) or you can use it for backups or even disaster recovery images.

Another benefit of log-structured file systems is that recovering from a crash is easier than the more typical tree based file systems (e.g. ext2, ext3, etc.). After a log-structured file system crashes, when it is remounted it can reconstruct its state from the last consistent point in the log. It starts at the head of the circular log and backs up until the file system is consistent. This point should be very close to the head so little if any data or metadata will be lost. This process is extremely fast regardless of the size of the file system.

This bears repeating – a log-structured file system recovers from a crash extremely fast and the amount of time is independent of the size of the file system. In contrast, other file systems have to replay their journal and possibly even walk their data structures to make sure the file system is consistent (i.e. run “fsck”). Everyone who has run fsck on a very large file system knows how much time it can take.

One problematic aspect of log-structured file systems is that they need to include a fairly sophisticated capability of “garbage collection” to reclaim free space. Free space needs to be reclaimed from the tail of the log, primarily the old check points, so that the file system doesn’t become full when the head of the log wraps around to the tail. There are many techniques for reclaiming space, one is covered in the Wikipedia article about log-structured file systems. The garbage collection process reclaims space from the check points (snap shots) otherwise the file system would fill far too quickly.

A Log Structured File System for Linux – NILFS

The Nippon Telephone and Telegraph (NTT) CyberSpace Laboratories has been developing NILFS (also referred to as NILFS2 since it is the version 2 of the file system) for Linux. It is released under the GPL 2.0 license and is included in the 2.6.30 kernel. It spent a great deal of time in the -mm kernels and under went much testing since it’s initial announcement.

One of the most noticeable features of NILFS is that it can “continuously and automatically save instantaneous states of the file system without interrupting service”. NILFS refers to these as checkpoints. In contrast, other file systems such as ZFS, can provide snapshots but they have to suspend operation to perform the snapshot operation. NILFS doesn’t have to do this. The snapshots (checkpoints) are part of the file system design itself.

One of the really cool features of NILFS is that these checkpoints can actually be mounted along side the primary file system. This has many, many uses, one of which is to mount a checkpoint to recover files that were unintentionally erased.

In addition to being able to recover recently erased files and extremely fast crash recovery times, there are a number of other features of NILFS that are very attractive:


  • The file size and inode numbers are stored as 64-bit fields

  • File sizes of up to 8 EiB (Exbibyte – approximately an Exabyte)

  • Block sizes that are smaller than a page size (i.e. 1KB-2KB). This can potentially make NILFS much faster for small files than other file systems.

  • File and inode blocks use a B-tree (the use of B-trees in a log-structured file system stems from the implementation which use something called segments)

  • NILFS uses 32-bit checksums (CRC32) on data and metadata for integrity assurance

  • Correctly ordered data and meta-data writes

  • Redundant superblock

  • Read-ahead for meta data files as well as data files (helps read performance)

  • Continuous check pointing which can be used for snapshots. These can be used for backups or they can even be used for recovering files.

Next: Checkpoints and Snapshots

Comments on "NILFS: A File System to Make SSDs Scream"

jrichter

Performance is great, but in my experience, as large an issue can be the difficulty of expanding the file systems and underlying disk structure. Easing that task is a characteristic of, for example, ZFS. Does NILFS have any features in this area?

Reply
fbacchella

“In contrast, other file systems such as ZFS, can provide snapshots but they have to suspend operation to perform the snapshot operation.”
” creating these checkpoints or snapshots do not result in decreased performance as they do for file systems such as ZFS.”

That’s plain FUD. Snapshots are free operations (except for the disk space of course) in zfs and as no impact on performance.

Reply
rejoc

On OpenVMS, 10++ years ago, there was such a FS called Spiralog !

Write performance was tremendous !

But also, it is the only filesystem I’ve seen where, when the disk was full, you could not delete any file because deleting a file added a new record to the log and there was no more place on the disk to extend the log :)

Reply
mrbig4545

I registered just to say, you don’t want raid in the filesystem, it breaks the whole layered design thing. on top of that, software raid in linux is one of the finest in existence, no harm using it underneath NILFS (which looks pretty awesome btw) to achieve the same thing, in a more modular fashion.

Reply
momokuri

NILFS seems not having the feature to expand volume, but one of developers said they have experimental code in his test repository.
https://www.nilfs.org/pipermail/users-ja/2008-November/000059.html
(in Japanese)

I think it will be apear after merging it into mainline kernel, maybe 2.6.32 or later.

Reply
ddebroy

I get why a log structured file system, which is meant to optimize for a rotating disk head, is beneficial for rotating disks. I don’t get how that translates to “make your SSDs scream”. Since SSDs don’t have spinning disks, how does a log structured file system boost a SSD? Infact, doesn’t the fact that a FS is log-structured make it completely SSD-unaware and therefore inferior to a file system that is not log structured (say allocates data in some random fashion all over the volume)?

The main design principle for this file system appears to be optimizing for a premise (i.e. a rotating head) which is completely absent in the case of a SSD.

Reply
golding

Wonder what Jörn Engel (dev of LogFS) thinks of this.
Does this include his efforts?

Reply
laytonjb

In general, I agree with you. But a very large number of people are asking for RAID to be included in the file system ala’ ZFS. I’m not a file system designer enough to be able to explain the details of either approach. But when looking at btrfs, I found that I like the built-in RAID because it was easier to build the file system. It may have been being lazy :) or it may have been mounting and unmounting the file system so much, but I did like it better.

Reply
laytonjb

I would recommend reading the NILFS mailing list archives and perhaps some of the other articles around.

From my understanding one of the things that a log-structured file system gives you is that thing are just appended to the head of the log. Then garbage collection clean up later. SSD’s have notoriously bad rewrite performance because you have to actually go in and change the cells before you can write to them again. This means you have to basically do two writes. Of course, SSD drive manufacturers are figuring out ways around this or to at least hide it. With log-structured file systems recovering space can happen when there is no pressure to reuse the particular space of the SSD.

Plus the “blocks” (if you will) of the SSD that get erased are fairly large. So it’s definitely possible for a classic file system to “re-erase” a section of the drive several times even if it’s already been erased. Log-structured file systems can be “tuned” to erase or reclaim space that
matches the size that needs to be erased. Consequently you only do this one time. BTW – classic file systems are gaining this behavior as well (I think btrfs does this and I would be willing to bet that ZFS does as well but I don’t know for sure).

Jeff

Reply
laytonjb

Thanks for comment about this – I had not seen that yet.

If you think about, expanding or even shrinking a lob based file system should be fairly easy. If you add more space, you just have to make the log aware of that space. If you shrink the file system, I think you can do the same thing but in reverse.

Of course the devil is in the details :) And I don’t know the details.

Jeff

Reply
laytonjb

I did find an article that might help you in your quest for understanding why log-structured file systems make SSD’s scream:

http://www.ibm.com/developerworks/linux/library/l-flash-filesystems/

I hope this helps (but I haven’t read the whole thing).

Jeff

Reply

I am no longer positive the place you are getting your info, however great topic. I must spend some time finding out more or understanding more. Thanks for fantastic information I used to be looking for this info for my mission.

Reply

Why viewers still make use of to read news papers when in this technological world the whole thing is available on web?

Reply

You actually make it seem so easy with your presentation but I to find this matter to be really one thing that I think I would never understand. It sort of feels too complex and very broad for me. I am looking ahead to your subsequent submit, I’ll try to get the cling of it!

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>