SquashFS: Not Just for Embedded Systems

Who knew that compression could be so useful in file systems? SquashFS, typically used for embedded systems, can be a great fit for laptops, desktops and, yes, even servers.

After the file system is created, and it may take some time, it can be mounted on the user’s home directory. The first step is to move the original directory to the side (just in case).

$ mv Documents Documents.original
$ mkdir /home/laytonjb/Documents

The next step is to then mount the directory.

$ sudo mount -t squashfs /squashfs/storage/Documents.sqsh /home/laytonjb/Documents -o loop

To get an idea of how much space is used, or saved, by SquashFS, one can look at the image file:

$ ls -lsah /squashfs/storage/Documents.sqsh
192M -rwx------ 1 root root 192M 2009-06-06 17:32 /squashfs/storage/Documents.sqsh

So the compression ratio is 314/192 = 1.63:1.

The exact amount of compression one can achieve with SquashFS really depends upon the data. If the data is primarily binary data, the compression is not likely to be very large. But if the data is primarily text based, then a fairly large compression ratio can be obtained. There is an article that illustrates the compression that can be achieved in SquashFS by comparing it to other file systems. The table is reproduced here:

File System Size
ext3 uncompressed 1.4 GB
IOS9660 uncompressed 1.3 GB
Zisofs 589.81 MB
Cloop 471.89 MB
SquashFS 2.0 448.58 GB
SquahsFS 2.1 448.58 MB

Some of these ratios are about 3:1 illustrating how much storage can be saved using a compressed file system.

There are some aspects to compressed file systems that are to be observed. The first and obvious aspect is that the file system is read-only. The second aspect, while not a big problem in some cases, is that it can take a fair amount of CPU time to create a SquashFS image. But this image is only created once and additional data can be added without having to rebuild the entire image. The third aspect is that compressed file systems are slower than typical file systems because the data has to be uncompressed, requiring time. This also requires more CPU cycles than typical file systems.

Adding Capability to SquashFS

Earlier in the article it was a mentioned that you could combine SquashFS with a write capable file system through a Union file system. The basic premise is that you mount SquashFS with a typical file system such as ext3, and “combine” them with UnionFS so that any writes to the combined file system go to the writable part of the file system. To the user it appears as though everything is a single file system.

There is a very good example in the HOWTO that presents a simple approach to combining SquashFS with another file system through a Union mount point. The steps are fairly straight forward.


  • Step 1 – Create a SquashFS file system image of the targeted directory. In the HOWTO this is a user’s home directory.

  • Step 2 – Mount the SquashFS image (for example /mnt/squashfs1)

  • Step 3 – Create a directory that is read/write (this is the directory where file changes will end up). For example this could be /home/user1.
  • Step 4 – Combine the two directories using unionfs as, # mount -t unionfs -o dirs=/home/user1=rw:/mnt/squashfs1=ro /home/laytonjb2

These four basic steps will allow you to combine a directory that is read/write (/home/user1) with a SquashFS image (/mnt/squashfs1). In this quick 4 steps, all write operations will actually go to /home/user1). Be sure to read the HOWTO and walk through the example about “Making it writable.”

The HOWTO also makes brief mention that you can take any files that have been written to the writable portion of the union and add them to the SquashFS image. This is very useful if you want to periodically update the image without having to recreate the entire image from scratch every time.

Using SquashFS on Non-Embedded Systems

As previously mentioned, the most common use for compressed file systems is for embedded systems. But that doesn’t mean they have to be limited to only embedded systems. It can be used in a number of ways for everyday use on non-embedded systems. Here are 3 scenarios for possible use:


  • Scenario 1: Old User Data
    This scenario is one administrators encounter on a fairly regular basis – what to do with data from users after they leave the company. Company directives usually dictate the account be disabled and the data in the user’s account be locked. However, other users may depend upon that data. At the same time you should not leave the data in a read/write state and give permissions to other users, because the original data could be compromised. So one option that works well, is to build a SquashFS image from the user’s account who has left the company. Then that image can be mounted on the system as the user’s directory, but now it’s in read-only mode so the data cannot be compromised. The administrator can give permissions to access the image to groups or individuals or even mount it in other user’s home directories. Regardless of who has permission to access the data, no one can modify the data (rather important).

  • Scenario 2: User Archives
    As mentioned in a previous article studies have shown that users may insist that their data be on-line all of the time, but in actuality there is a great deal of data that is accessed very infrequently. So this data is never really used but the user insists that it be on-line. A simple way to reduce the storage requirements of this type of data is to copy it to a subdirectory called something like /home/laytonjb/ARCHIVE. Be sure to keep the directory structure when moving data to the “ARCHIVE”. Then a SquashFS image can be made of the ARCHIVE directory and mounted in the user’s account. This keeps the data on-line as the user requested, but it also reduces the storage requirements of the data. This could be done as part of a cron job that scans user’s directories for data that has not been accessed in a long time (“long time” depends upon the user community and corporate policies). The files that are identified are moved to the ARCHIVE subdirectory and symlinks are created that point from the original data location to the ARCHIVE. Then a SquashFS file system image is created of the user’s ARCHIVE directory. The ARCHIVE is erased and the SquashFS image is mounted. The data in the original location are symlinks to the ARCHIVE directory which in turn is a SquashFS file system that is mounted read-only. The user keeps the data on-line and useful (we presume) and the administrator has a way to reduce file system storage.


  • Scenario 3: HPC Images
    There are HPC cluster tools that create system images that are mounted on the compute nodes on Ramdisks. These are referred to as “stateless” nodes and the best example of this type of approach is Perceus. Perceus takes an image and installs it on a tmpfs ramdisk. The images range in size from 100 MB or so to gigabytes. These images are primarily pieces of the /usr and the /lib directories from a full installation. Using SquashFS to take these directories and create mountable compressed images saves space which is really memory (HPC users typically like lots of memory).

The moral of these scenarios is that SquashFS is not limited to embedded systems – there are many other scenarios – as many as there are administrators. Perhaps these three scenarios have started you thinking about how you might use SquashFS on your systems.

SquashFS – Next Generation

There is a project underway that is porting SquashFS to use LZMA to get greater levels of compression. Currently LZMA is not in the Linux kernel so the SquashFS-LZMA project cannot merge their patches with SquashFS, but the development team hopes that LZMA will be included in the Linux kernel at some point allowing them to add their patches to SquashFS.

On the SquashFS-LZMA homepage is a table that illustrates what kind of compression you can get with SquashFS using gzip and various block sizes along with SquashFS and LZMA using various block sizes. The table is reproduced here:

Method Block Size Slax Data Size Percent
uncompressed - 668 MB 100%
mksquashfs+gzip 64KB 227 MB 34%
mksquashfs+gzip 1024KB 222 MB 33%
mksquashfs+lzma 64KB 191 MB 28%
mksquashfs+lzma 128KB 184 MB 27%
mksquashfs+lzma 512KB 172 MB 26%
mksquashfs+lzma 1024KB 167 MB 25%

LZMA allows the data compression to go from 33% of the original data size using gzip to 25% of the original data size using LZMA. While 8% may not sound like a large amount, there are situations where it is important.

SquashFS – Not Just for Embedded Systems

The article is intended as a gentle introduction to SquashFS and hopefully has convinced you that it’s not just a compressed file system for embedded devices. It’s fairly easy to construct a SquashFS image but it does take some time. However, the image only has to be created once with additional data being added to the image as needed. The image is mounted read-only but a simple technique of using a union mount allows it to appear as a read/write file system to the user.

There are many ways to use SquashFS, of which three were briefly mentioned. Of course, the precise way it is used depends upon the particular system and processes, but if you are open to new possibilities then SquashFS can save you storage space and provide you with an easy way to enforce a read-only file system.

Keep an eye on SquashFS. It’s still having features added and the SquashFS-LZMA project illustrates that there is still room for improvement and/or additions to achieve even larger levels of compression.

Jeff Layton is an Enterprise Technologist for HPC at Dell. He can be found lounging around at a nearby Frys enjoying the coffee and waiting for sales (but never during working hours).

Comments on "SquashFS: Not Just for Embedded Systems"

x95tobos

Nice article, I enjoyed it. As a former Computer Engineering major (in a galaxy far far away, ahemm, Kalamazoo MI, many many years ago..) with no clue about SquashFS, my guess is it is necessary/useful only when the actual “overhead” of the FS structures is significant- you have a lot of small files with really long path names: mp3 players anyone?

Otherwise why not just use tar gzip or bzip? Lately file managers are really good at using these just as regular directory files- you can browse web pages, play movies/songs, etc. And I am saying this, because why in the world would you want to do compression on sensitive data such as inodes and such, if you do not absolutely have to? In this light, examples 1 and 2 are a bit contrived: you always can mount a partition read-only in the Unix world ! And if the user insists on online data access, the problem is completely different, rather a moral debate: if he is willing to pay for that, he should have that and he should have it with no “strings attached”. There is always a tradeoff between redundancy and space, sometimes you want one, sometimes the other.

Reply
sysadmn

One advantage a compressing fs has over .tgz or .bz2 is memory footprint. It’s probably more of a factor in embeded systems. Letting the fs driver manage the data means that all consumers (file managers, mp3 clients, etc) share the cached, uncompressed data and metadata.

Reply
m0n0

hmm.. its mistake at write about of:

SquashFS 2.0 448.58 GB
SquahsFS 2.1 448.58 MB

It’s really 448.58 ¿GB?, must be MB also.

Reply
leepatrick

It’s also a good FS for SSD since it can save space for relatively small size SSD and it’s readonly. For the writing, create the unionfs in ramdisk and write the update back to SSD periodically to reduce write cycles.

Reply
dwolsten

It’s a good thing they released SquashFS v2.1, as 2.0 obviously had very poor performance, turning 1.3GB of uncompressed data into a whopping 448GB! Luckily, v2.1 achieves a 1000-fold improvement over this figure.

Reply
vonchilliman

Puppy Linux use SquashFS and UnionFS to incredibly good effect.

Reply
rexterd

SquashFS is in most of every distro ubuntu,slax,puppylinux,slitaz etc. Puppylinux and slax uses the layered filesystem unionfs or aufs. Layered filesystems + squashfs combined with lzma compression will make system secured because of a read only squashfs, a layered system where we can easily add and delete branches and compression that will save as memory/storage space.

Reply
laytonjb

Well the first two examples are contrived but the point is that I can mount any combination of directories or files read-only without having to mount the entire partition as read-only. There are some advantages in this. The one I think is interesting is to run a cron job periodically that scans a user’s directory for files that have not been accessed on months. Then you move them to an “ARCHIVE” subdirectory in their account leaving symlinks behind so the user retains the directory structure. Then you use SquashFS on the ARCHIVE directory and remount it for the user. With the symlinks the user finds the files where they expect but you also save some space.

You have a couple of choices on mounting the SquashFS image as well. You can mount it read-only so that if the user wants to actually edit the data, they will have to copy the data to a new directory to work on it. While this uses more space, it also keeps a copy of the data (which hasn’t been used in a while) in case the user screws something up (I’m quite good at that).

The other option is to use a union mount with the SquashFS image so the user can “change” the data. But the changes are written to the r/w part of the union so you still retain a copy of the data :)

The second option is the most transparent from a user perspective, but I kind of like the first approach (partly because it’s a little bit easier :) ).

Kind of cool isn’t it? I really like this idea and I’m getting ready to hack up some scripts to do this (both options). Just need to get the editor off my back for a week or so :)

BTW – thanks for the comments and compliments. Glad it helped.

Jeff

Reply
laytonjb

LOL!!! I’m sorry I missed that. But you are correct – it’s MB instead of GB.

Thanks!

Jeff

Reply
laytonjb

Cool idea. I never thought about using a ramdisk and write back to the SSD. How would you do the writeback part? There were some patches floating around from Daniel Phillips called Ramback that did this. I want to test these patches at some point – pretty cool idea.

Thanks!

Jeff

Reply
laytonjb

Cool idea! I didn’t think about using a ramdisk and writing back to the SSD. Talk about really fast write performance!

How would you do the write back?

One thing I want to try is a set of patches from Daniel Phillips called Ramback. They do exactly what you say – write back from a ramdisk to device such as spinning disk, usb, or SSD. I just need to find some time to actually test it :)

Thanks!

Jeff

Reply
laytonjb

LOL Mea Cupla. I screwed up the measurement. It should have been MB instead of GB.

Reply
laytonjb

Good observations. Thanks!

Jeff

Reply

Leave a Reply to dwolsten Cancel reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>