It's the end of the year and that means it's time to either make predictions for the coming year or review the highlights from the past year. This article takes a look at the cool things that happened around storage in the past year and perhaps hints at some things in the coming year.
The tradition is that at the end of the year one looks back at the year to summarize events in our case, storage. One can also look forward and make predictions on events that are to occur in 2010. I’m not very good at predictions (I thought I would have my home wired with 10GigE about 4 years ago) so I will shy away from them. However, I will tackle reviewing storage events from a Linux point of view. It’s definitely not comprehensive, since that would be too long and someone would always feel slighted, but I have chosen what I think are some fairly significant events in 2009.
Let’s start with probably the most important single development in Linux Storage in 2009 – the explosion of file systems for Linux.
Continuing Development of Linux File System
The best highlight around Linux storage (this is Linux Magazine after all) is, in my opinion, the continuing and perhaps accelerating development of a variety of file systems. If you re-read the interview with Valerie Aurora (one of the thought leaders behind open-source file systems) she points out that a few years ago, Linux file system development was all but dead except for ReiserFS. Valerie commented on this to the community and then organized the first ever Linux File Systems Workshop to help jump start Linux file system development.
Since then, and particularly in 2009, there has been a veritable explosion of file systems for Linux. Starting with version 2.6.28 of the kernel, file system development has exploded and continues at a furious pace through the current kernel (2.6.32 is the current release as of the writing of this article). Subsequent sub-sections will talk about specific file systems but just by the sheer number of sub-sections you will see that Linux file system development is alive and well.
Btrfs – Linux Don’t Need No Stinkin’ ZFS
Yes, the title is controversial, but the intent is to illustrate that Linux has a ZFS quality file system in very rapid development. All good Linux storage people know that that this file system is btrfs.
Btrfs was added to the kernel in version 2.6.29 which was released on March 23, 2009. It was, and still is, marked as “experimental” but if you follow kernel development you realize that merging something into the kernel is an indication that further development is best done in the kernel. In general this means that development proceeds faster since more people have access to it.
As previously discussed btrfs has a number of wonderful features in its current version and is adding additional features all the time. Another article, that is more detailed, discusses btrfs and contrasts it with ZFS also illustrating that btrfs is on-par with ZFS in its goals.
SquashFS is a compressed read-only file system that has been development outside the kernel for a number of years. Version 4.0 of SquashFS was included in the kernel in version 2.6.29 on March 23, 2009.
While one would think that a compressed read-only file system is something only useful for embedded file systems, it actually has a number of uses for non-embedded systems. For example, it can be used to archive data within user’s accounts to save space but still leave the data on-line. You can also use SquashFS in combination with UnionFS to save space but also allow data to be seemingly changed or added.
SquashFS uses gzip for compressing images achieving a fairly good compression ratio depending upon the original source of data. There is also an additional project, SquashFS-LZMA, that uses the LZMA compression algorithm to achieve an even better level of compression. The problem with SquashFS-LZMA was that LZMA was not in the kernel so you had to patch the kernel with both LZMA and also SquashFS-LZMA. However in version 2.6.31 of the kernel, released on September 9, 2009, LZMA support was added to the kernel. So one should expect the SquashFS-LZMA project to be merged into SquashFS in 2010 (sorry – had to make at least one prediction).
Ext4 Ready for Production
The venerable ext3 file system has been the default file system for a very large number of Linux distributions for many years. The ext family of file systems has been in the kernel since 1992 when the original version of ext was merged in version 0.96c. Ext3 was added to the 2.4.15 kernel in November 2001. But an ext3 file system is limited to 16 TiB in size and files are limited to 2 TiB in size. Given that today’s drives can be 2 TB in size it’s fairly easy to see that ext3 is limiting from at least a capacity perspective.
Ext4 is an updated version of ext3 with added features:
Ext4 has been in the kernel for a while, but marked as “experimental”. In version 2.6.28 of the kernel, the experimental flag was removed. Moreover, the performance of ext4 is also quite good. In a Study of metadata performance, ext4 had some of the best performance of the major file systems and in a throughput performance study ext4 again compared well to the commonly considered performance leaders such as xfs, btrfs, jfs, and reiser4.
NILFS2 is an implementation of what is called a log-structured file system. These types of file systems are different than standard file systems because they store everything, data and metadata, in a sequentially continuous log file. So one piece of data is put into the log perhaps followed by a piece of metadata, and perhaps followed by more metadata, and then data, and so on. The data and metadata are added to the log sequentially.
NILFS2 has been written with a number of advanced features:
- It can be used in file systems up to 8 exbibytes
- Block sizes that are smaller than a page size (i.e. 1KB-2KB). This can potentially make NILFS much faster for small files than other file systems.
- NILFS uses 32-bit checksums (CRC32) on data and metadata for integrity assurance
- Correctly ordered data and meta-data writes
- Redundant superblock
- Read-ahead for metadata files as well as data files (helps read performance)
- Continuous check pointing which can be used for snapshots. These can be used for backups or they can even be used for recovering files.
This last feature, continuous check pointing, has great utility in production file systems since snapshots can be created with interrupting the performance one little bit. In addition, the fundamental design of NILFS2, the log-structure design, allows it to potentially run very fast on SSD devices.
NILFS2 was added to the kernel in version 2.6.30 and is still noted as under development. The mailing list is very active and development is proceeding to stabilize the code, finalize the on-disk format, and improve the garbage collecting (gc) capability which should improve performance.