Cool User File Systems: ArchiveMount

Have you ever wanted to look inside a tar.gz file but without expanding it? Have you ever wanted to just dump files in a .tar.gz file without having to organize it and periodically tar and gzip this data? This article presents another REALLY useful user-space file system, archivemount. It allows you to mount archives such as .tar.gz files as a file system and interact with it using normal file/directory tools.

Introduction

I’m not sure about you but when I examine a new tool or package that I’m building from source I usually like to read the documentation first and perhaps examine the makefile or examine the output of “configure –help”. But I really want to do this without gunzip-ing and untarring the archive.

I also have the opposite problem in that I sometimes just want to grab various files and collect them into a .tar.gz archive without having to organize a complete tree and then tar it and gzip it. Ideally I would just have a directory where I could copy data and it would automatically create the archive for me. You can call me lazy of you like but I started computing when storage space was at a premium so I tend to be conservative with my disk space. At the same time, however, I tend to be a pack rat with code, which has saved my bacon several times. So I look for something that allows me to efficiently archive my data.

Of course there are many ways to scratch this itch, including creating a directory for storing data and running a cron job that creates an archive from the directory, or I could use SquashFS (one of my personal favorites) coupled with a cron job and symlinks to do something similar, but these approaches don’t exactly make my itch disappear and we all know that a small itch can become a rash pretty quickly. In keeping with the user-space file system theme of recent articles I want to talk about archivemount which cured my itch quite while ago.

Archivemount

Another great example of what you can do with FUSE is something called archivemount. It allows you to mount an archive such as .tar.gz file as though it were a file system. This means you can use common file system tools such as cp, mv, ls, etc., to manipulate the archive. For example, you could start a minimal .tar.gz file and then copy in files as needed. This includes the ability to create subdirectories and put files in them – all inside the .tar.gz file without using tar or gzip.

You can also use archivemount for the opposite process – extracting files from a .tar.gz file. For example, if you have a large .tar.gz file and you only want to extract the README or the documentation, you can use archivemount to mount the .tar.gz file and then “cp” the file from the archive. Without archivemount you would have to gunzip and untar the entire .tar.gz file just to get one file.

Archivemount Example

There is a good introductory article to using archivemount from which I will take the first simple example. To get started, we need to download and install libarchive. This library allows you to create and read streaming archive formats. It does the heavy-lifting for the archiving and unarchiving for archivemount. Building the library is rather easy – you just perform the “configure 3-step” that is so common in Linux.

[root@home8 libarchive-2.8.4]# ./configure
[root@home8 libarchive-2.8.4]# make
[root@home8 libarchive-2.8.4]# make install


For this version of libarchive (2.8.4), by default, the installation goes to /usr/local. Be sure you do the installation step as root.

After libarchive is installed, the next step is to install archivemount itself. It too follows the “configure 3-step”.

[root@home8 archivemount-0.6.1]# ./configure
[root@home8 archivemount-0.6.1]# make
[root@home8 archivemount-0.6.1]# make install


This version of archivemount (0.6.1) will install into /usr/local/ but be sure to do the installation as root.

The next steps depend upon the FUSE configuration on your system. I was testing a reasonably fresh CentOS 5.5 installation but the details depend upon your distribution and configuration. However, the goal is the same – to allow users to use archivemount in their home account. Consequently, we need to configure the system to allow users to use not only archivemount but the FUSE tool, fusermount, which is needed to mount and umount FUSE based file systems.

The first step is to make sure there is a FUSE group on the system.

[root@home8 etc]# grep -i fuse /etc/group
fuse:x:105:


So there is a fuse group on the system. If there isn’t one on your system, create one using the command “groupadd”. The next step is to add the desired user(s) to this group.

[root@home8 etc]# /usr/sbin/usermod -a -G fuse laytonj
[root@home8 etc]# id laytonj
uid=500(laytonj) gid=500(laytonj) groups=500(laytonj),105(fuse)


Notice that I used the command “id” to check the groups for the user “laytonj” (that’s me). So I belong to the FUSE group at this point.

The next step is to check the FUSE device. Let’s start with the permissions on .

[root@home8 etc]# ls -lsa /dev/fuse
0 crw-rw-rw- 1 root root 10, 229 Jul 14 22:04 /dev/fuse


Even though the permissions are correct for users, if we needed to change the permissions we would run the following command

[root@home8 etc]# chmod g+rw /dev/fuse
[root@home8 etc]# ls -lsa /dev/fuse
0 crw-rw-rw- 1 root root 10, 229 Jul 14 22:08 /dev/fuse


Notice that the first command sets the desired permissions and the second command just checked them (no changes).

We also need to change the group attribute the /dev/fuse device so that the FUSE group can access it (in other words so users belonging to the FUSE group can access the device). This is easy to do as well.

[root@home8 etc]# chgrp fuse /dev/fuse
[root@home8 etc]# ls -lsa /dev/fuse
0 crw-rw-rw- 1 root fuse 10, 229 Jul 14 22:04 /dev/fuse


The first command changes the group on /dev/fuse to FUSE and the second command just checks it.

The final step is to change the permissions on "fusermount".

[root@home8 tmp]# chmod +x /bin/fusermount
[root@home8 tmp]# ls -lsa /bin/fusermount
24 -rwsr-x--x 1 root fuse 23692 Sep  3  2009 /bin/fusermount


For this case both the FUSE group and any user has executable permission on "fusermount". This may or may not be what you want (you can always change the permissions to restrict it to the binary owner and the same group and exclude "the world" from using the binary). Be sure that this binary has a group attribute of FUSE.

At this point we're all set to begin some quick testing with archivemount!. The first thing I'll do is run through the example from the Linux.com article to make sure it works correctly. This example creates a directory tree in /tmp and then creates a .tar.gz from it, mounts it with archivemount and then tries a few file manipulation commands (including adding a file to it). Let's get cooking.

[laytonj@home8 tmp]$ mkdir -p /tmp/archivetest
[laytonj@home8 tmp]$ cd /tmp/archivetest/
[laytonj@home8 archivetest]$ date > datefile1
[laytonj@home8 archivetest]$ date > datefile2
[laytonj@home8 archivetest]$ mkdir subA
[laytonj@home8 archivetest]$ date > subA/foobar
[laytonj@home8 archivetest]$ cd /tmp
[laytonj@home8 tmp]$ tar czvf archivetest.tar.gz archivetest
archivetest/
archivetest/subA/
archivetest/subA/foobar
archivetest/datefile1
archivetest/datefile2


These steps create the directory structure in /tmp/archivetest and populates it with some simple filed containing "dates". Notice that the directory has a subdirectory, "subA". Finally, on the last line, a .tar.gz file is created.

Now we can create a mount point, /tmp/testing, for the .tar.gz file and mount it.

[laytonj@home8 tmp]$ mkdir testing
[laytonj@home8 tmp]$ /usr/local/bin/archivemount archivetest.tar.gz testing
[laytonj@home8 tmp]$ ls -l testing/archivetest/
total 3655287589024971187
-rw-rw-r-- 0 laytonj laytonj 29 Jul 14 22:43 datefile1
-rw-rw-r-- 0 laytonj laytonj 29 Jul 14 22:43 datefile2
drwxrwxr-x 0 laytonj laytonj  0 Jul 14 22:43 subA/


The first command creates the mount point (/tmp/testing) and the second command uses archivemount to mount the archive, archivetest.tar.gz, to the mount point /tmp/testing. Then a simple list of the directory is performed. The one issue I encountered, and I hope you have spotted it already, is that the total space reported for the mounted archive is 3655287589024971187, which is obviously incorrect. There must be a bug somewhere. However, you can still see the files and all of the data is there.

Now that the archive is mounted we can "manipulate it". The first thing in the example from the article is to create a new file.

[laytonj@home8 tmp]$ date > testing/archivetest/new-file1
[laytonj@home8 tmp]$ cat testing/archivetest/new-file1
Wed Jul 14 22:50:10 EDT 2010


The first command creates the file and the second command "cats" the file to check that it was created which is true. However if you look at the .tar.gz file you will see that the new file actually isn't in the archive yet.

laytonj@home8 tmp]$ tar tzvf archivetest.tar.gz
drwxrwxr-x laytonj/laytonj   0 2010-07-14 22:43:49 archivetest/
drwxrwxr-x laytonj/laytonj   0 2010-07-14 22:43:56 archivetest/subA/
-rw-rw-r-- laytonj/laytonj  29 2010-07-14 22:43:56 archivetest/subA/foobar
-rw-rw-r-- laytonj/laytonj  29 2010-07-14 22:43:43 archivetest/datefile1
-rw-rw-r-- laytonj/laytonj  29 2010-07-14 22:43:47 archivetest/datefile2


Where is the file, "new-file1"? The answer is that archivemount delays all writes to the archive until it is unmounted. If we use "fusermount" to unmount the archive and then examine the archive we will see "new-file1".

[laytonj@home8 tmp]$ fusermount -u testing
[laytonj@home8 tmp]$ tar tzvf archivetest.tar.gz
drwxrwxr-x laytonj/laytonj   0 2010-07-14 22:43:49 archivetest/
drwxrwxr-x laytonj/laytonj   0 2010-07-14 22:43:56 archivetest/subA/
-rw-rw-r-- laytonj/laytonj  29 2010-07-14 22:43:56 archivetest/subA/foobar
-rw-rw-r-- laytonj/laytonj  29 2010-07-14 22:43:43 archivetest/datefile1
-rw-rw-r-- laytonj/laytonj  29 2010-07-14 22:43:47 archivetest/datefile2
-rw-rw-r-- laytonj/laytonj  29 2010-07-14 22:50:10 archivetest/new-file1


Now we can see the additional file is in the archive.

Archivemount Limitations and Warnings

Delaying writes to unmounting the archive is a bit of an inconvenience, but according to the archivemount documentation, writing to an archive using libarchive is not possible. The README in archivemount states,

In order to provide write support thus the whole archive has to be recreated; this requires two things: space and time. To optimize at least the timely behaviour, archives are recreated only once: at the time of unmount.


This makes sense but is limiting. But as a word of caution, archivemount, like virtually all file systems, tries its best to ensure the integrity of your data but the README contains this warning as well.

If there are any problems creating the new archive - bad luck, the changes are lost. Some checks are run when mounting the archive to determine if it can be mounted writeable, but there is no guarantee.



Also note that unmounting a fuse filesystem is NOT necessarily completed when the unmount command returns. Although unmounting takes a long time already, fuse backgrounds the process and lets the unmount command return early. You can check on the real state of unmounting by checking the process list for archivemount.


Important tip - check the process table ("top" or "ps -ef | grep -i archivemount") to look for archivemount. When you don't see the process, archivemount is done.

The README goes to say the following.

THERE IS ALSO NO GUARANTEE THAT DATA IS WRITTEN CORRECTLY. DO NOT TRUST THIS SOFTWARE! A backup is made of the original archive (with .orig appended to the name), but please understand that I, the author of archivemount, do not guarantee anything at all about the state of your data and I am not responsible if you lose vital information by using this software. YOU HAVE BEEN WARNED!


While this sounds a little but apocalyptic I think the author is just being cautious considering that the software package deals with data and people get really upset if their data goes missing (don't forget to do backups!).

However, I've been using archivemount for a while to manipulate archives including one experiment of creating basically a blank archive and then dumping files into as needed, and it has worked fine. However, I have also done some experiments using the Linux kernel and dumping it into a blank archive that illustrate some of the limitations of archivemount.

First, let's create a kernel archive directory and dump a simple file that tells us that the archive was created using archivemount.

[laytonj@home8 ~]$ echo "Created with archivemount-0.6.1" > KERNEL_ARCHIVE/layton_notice.txt
[laytonj@home8 ~]$ tar czvf kernel_archive.tar.gz KERNEL_ARCHIVE
KERNEL_ARCHIVE/
KERNEL_ARCHIVE/layton_notice.txt


Then we can create a mount point and mount the archive.

[laytonj@home8 ~]$ mkdir -p KERNEL_ARCHIVE_MP
[laytonj@home8 ~]$ archivemount kernel_archive.tar.gz KERNEL_ARCHIVE_MP
[laytonj@home8 ~]$ ls -s KERNEL_ARCHIVE_MP/
total 0
0 KERNEL_ARCHIVE/
[laytonj@home8 ~]$ cat KERNEL_ARCHIVE_MP/KERNEL_ARCHIVE/layton_notice.txt
Created with archivemount-0.6.1

Now let's take the latest kernel source (2.6.35-rc5) and try to copy it into the mounted archive.

[laytonj@home8 TMP]$ ls -s
total 403564
     4 linux-2.6.35-rc5/  403560 linux-2.6.35-rc5.tar
[laytonj@home8 TMP]$ cp -r linux-2.6.35-rc5 /home/laytonj/KERNEL_ARCHIVE_MP/KERNEL_ARCHIVE/
cp: writing `/home/laytonj/KERNEL_ARCHIVE_MP/KERNEL_ARCHIVE/linux-2.6.35-rc5/sound/pci/asihpi/hpidspcd.c': Too many open files
cp: cannot create regular file `/home/laytonj/KERNEL_ARCHIVE_MP/KERNEL_ARCHIVE/linux-2.6.35-rc5/sound/pci/asihpi/hpicmn.h': Numerical result out of range
cp: cannot create regular file `/home/laytonj/KERNEL_ARCHIVE_MP/KERNEL_ARCHIVE/linux-2.6.35-rc5/sound/pci/asihpi/hpifunc.c': Numerical result out of range
...


So we see lots of errors. While I haven't dug into the source of the problem the first error message - too many open files - is probably the root cause of the problem. But let's see if we can save what data made it to the archive.

[laytonj@home8 TMP]$ fusermount -u ~/KERNEL_ARCHIVE_MP/
[laytonj@home8 ~]$ tar tzvf kernel_archive.tar.gz
drwxrwxr-x laytonj/laytonj   0 2010-07-17 09:01:59 KERNEL_ARCHIVE/
-rw-rw-r-- laytonj/laytonj  32 2010-07-17 09:02:07 KERNEL_ARCHIVE/layton_notice.txt


So it looks like nothing made it into the archive, perhaps because of the errors or the limitation of libarchive.

This example does illustrate that there are some limitations to archivemount and what it can do. But dumping over 33,000 files into a mounted archive is a bit rough in my opinion.

Summary

While you may have a pessimistic view of archivemount because of the last example where it failed because I was trying to push too many files (33,312) into a mounted archive and because of the dire warning the author has in the README, a quick glance using Google shows that there aren't too many problems reported with archivemount. I have been using it for a while now and have never lost data or had difficulty. In fact, I have found it to be a reliable tool that helps me examine .tar.gz files and for dumping data into an archive.

I use archivemount in my everyday work when I'm doing some coding or writing. When I start, I just create an archive with my "notice" file that I used in the kernel example although I also put a date stamp in the file contents. Then while I'm working I periodically copy data to the archive mount point using a directory that has a time stamp name. I've found it to be a simple, easy way to save multiple copies of data so I can rollback if I need a previous version of something. Then when I have to look up from the screen about every hour or so to uncross my eyes (just don't tell my eye doctor I wait that long), I just run the command "fusermount -u [mount_point]" copy the .tar.gz file to somewhere else giving me a second copy, and then just remount the .tar.gz file using archivemount and continue working. Then at the end of the day I label the final .tar.gz with the date and copy it somewhere safe.

FUSE file systems tend to be very niche file systems, scratching a very specific itch. Archivemount is no exception but it can be used in a variety of ways (one article on the web discusses how to use it in conjunction with CIFS of all things!). Give it a whirl and see how it can be used to solve your archive manipulation needs.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62