SquashFS: Not Just for Embedded Systems

Who knew that compression could be so useful in file systems? SquashFS, typically used for embedded systems, can be a great fit for laptops, desktops and, yes, even servers.

After the file system is created, and it may take some time, it can be mounted on the user’s home directory. The first step is to move the original directory to the side (just in case).

$ mv Documents Documents.original
$ mkdir /home/laytonjb/Documents

The next step is to then mount the directory.

$ sudo mount -t squashfs /squashfs/storage/Documents.sqsh /home/laytonjb/Documents -o loop

To get an idea of how much space is used, or saved, by SquashFS, one can look at the image file:

$ ls -lsah /squashfs/storage/Documents.sqsh
192M -rwx------ 1 root root 192M 2009-06-06 17:32 /squashfs/storage/Documents.sqsh

So the compression ratio is 314/192 = 1.63:1.

The exact amount of compression one can achieve with SquashFS really depends upon the data. If the data is primarily binary data, the compression is not likely to be very large. But if the data is primarily text based, then a fairly large compression ratio can be obtained. There is an article that illustrates the compression that can be achieved in SquashFS by comparing it to other file systems. The table is reproduced here:

File System Size
ext3 uncompressed 1.4 GB
IOS9660 uncompressed 1.3 GB
Zisofs 589.81 MB
Cloop 471.89 MB
SquashFS 2.0 448.58 GB
SquahsFS 2.1 448.58 MB

Some of these ratios are about 3:1 illustrating how much storage can be saved using a compressed file system.

There are some aspects to compressed file systems that are to be observed. The first and obvious aspect is that the file system is read-only. The second aspect, while not a big problem in some cases, is that it can take a fair amount of CPU time to create a SquashFS image. But this image is only created once and additional data can be added without having to rebuild the entire image. The third aspect is that compressed file systems are slower than typical file systems because the data has to be uncompressed, requiring time. This also requires more CPU cycles than typical file systems.

Adding Capability to SquashFS

Earlier in the article it was a mentioned that you could combine SquashFS with a write capable file system through a Union file system. The basic premise is that you mount SquashFS with a typical file system such as ext3, and “combine” them with UnionFS so that any writes to the combined file system go to the writable part of the file system. To the user it appears as though everything is a single file system.

There is a very good example in the HOWTO that presents a simple approach to combining SquashFS with another file system through a Union mount point. The steps are fairly straight forward.

  • Step 1 – Create a SquashFS file system image of the targeted directory. In the HOWTO this is a user’s home directory.

  • Step 2 – Mount the SquashFS image (for example /mnt/squashfs1)

  • Step 3 – Create a directory that is read/write (this is the directory where file changes will end up). For example this could be /home/user1.
  • Step 4 – Combine the two directories using unionfs as, # mount -t unionfs -o dirs=/home/user1=rw:/mnt/squashfs1=ro /home/laytonjb2

These four basic steps will allow you to combine a directory that is read/write (/home/user1) with a SquashFS image (/mnt/squashfs1). In this quick 4 steps, all write operations will actually go to /home/user1). Be sure to read the HOWTO and walk through the example about “Making it writable.”

The HOWTO also makes brief mention that you can take any files that have been written to the writable portion of the union and add them to the SquashFS image. This is very useful if you want to periodically update the image without having to recreate the entire image from scratch every time.

Using SquashFS on Non-Embedded Systems

As previously mentioned, the most common use for compressed file systems is for embedded systems. But that doesn’t mean they have to be limited to only embedded systems. It can be used in a number of ways for everyday use on non-embedded systems. Here are 3 scenarios for possible use:

  • Scenario 1: Old User Data
    This scenario is one administrators encounter on a fairly regular basis – what to do with data from users after they leave the company. Company directives usually dictate the account be disabled and the data in the user’s account be locked. However, other users may depend upon that data. At the same time you should not leave the data in a read/write state and give permissions to other users, because the original data could be compromised. So one option that works well, is to build a SquashFS image from the user’s account who has left the company. Then that image can be mounted on the system as the user’s directory, but now it’s in read-only mode so the data cannot be compromised. The administrator can give permissions to access the image to groups or individuals or even mount it in other user’s home directories. Regardless of who has permission to access the data, no one can modify the data (rather important).

  • Scenario 2: User Archives
    As mentioned in a previous article studies have shown that users may insist that their data be on-line all of the time, but in actuality there is a great deal of data that is accessed very infrequently. So this data is never really used but the user insists that it be on-line. A simple way to reduce the storage requirements of this type of data is to copy it to a subdirectory called something like /home/laytonjb/ARCHIVE. Be sure to keep the directory structure when moving data to the “ARCHIVE”. Then a SquashFS image can be made of the ARCHIVE directory and mounted in the user’s account. This keeps the data on-line as the user requested, but it also reduces the storage requirements of the data. This could be done as part of a cron job that scans user’s directories for data that has not been accessed in a long time (“long time” depends upon the user community and corporate policies). The files that are identified are moved to the ARCHIVE subdirectory and symlinks are created that point from the original data location to the ARCHIVE. Then a SquashFS file system image is created of the user’s ARCHIVE directory. The ARCHIVE is erased and the SquashFS image is mounted. The data in the original location are symlinks to the ARCHIVE directory which in turn is a SquashFS file system that is mounted read-only. The user keeps the data on-line and useful (we presume) and the administrator has a way to reduce file system storage.

  • Scenario 3: HPC Images
    There are HPC cluster tools that create system images that are mounted on the compute nodes on Ramdisks. These are referred to as “stateless” nodes and the best example of this type of approach is Perceus. Perceus takes an image and installs it on a tmpfs ramdisk. The images range in size from 100 MB or so to gigabytes. These images are primarily pieces of the /usr and the /lib directories from a full installation. Using SquashFS to take these directories and create mountable compressed images saves space which is really memory (HPC users typically like lots of memory).

The moral of these scenarios is that SquashFS is not limited to embedded systems – there are many other scenarios – as many as there are administrators. Perhaps these three scenarios have started you thinking about how you might use SquashFS on your systems.

SquashFS – Next Generation

There is a project underway that is porting SquashFS to use LZMA to get greater levels of compression. Currently LZMA is not in the Linux kernel so the SquashFS-LZMA project cannot merge their patches with SquashFS, but the development team hopes that LZMA will be included in the Linux kernel at some point allowing them to add their patches to SquashFS.

On the SquashFS-LZMA homepage is a table that illustrates what kind of compression you can get with SquashFS using gzip and various block sizes along with SquashFS and LZMA using various block sizes. The table is reproduced here:

Method Block Size Slax Data Size Percent
uncompressed - 668 MB 100%
mksquashfs+gzip 64KB 227 MB 34%
mksquashfs+gzip 1024KB 222 MB 33%
mksquashfs+lzma 64KB 191 MB 28%
mksquashfs+lzma 128KB 184 MB 27%
mksquashfs+lzma 512KB 172 MB 26%
mksquashfs+lzma 1024KB 167 MB 25%

LZMA allows the data compression to go from 33% of the original data size using gzip to 25% of the original data size using LZMA. While 8% may not sound like a large amount, there are situations where it is important.

SquashFS – Not Just for Embedded Systems

The article is intended as a gentle introduction to SquashFS and hopefully has convinced you that it’s not just a compressed file system for embedded devices. It’s fairly easy to construct a SquashFS image but it does take some time. However, the image only has to be created once with additional data being added to the image as needed. The image is mounted read-only but a simple technique of using a union mount allows it to appear as a read/write file system to the user.

There are many ways to use SquashFS, of which three were briefly mentioned. Of course, the precise way it is used depends upon the particular system and processes, but if you are open to new possibilities then SquashFS can save you storage space and provide you with an easy way to enforce a read-only file system.

Keep an eye on SquashFS. It’s still having features added and the SquashFS-LZMA project illustrates that there is still room for improvement and/or additions to achieve even larger levels of compression.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62