dcsimg

I Like My File Systems Chunky: UnionsFS and ChunkFS

Diving deeper into UnionFS: walking through how to create and manage large file systems using the principles of ChunkFS and UnionFS.

For a file system built from 4 chunks that have identical times between mandatory fsck all four chunks will be checked at the same time. One of the key principles behind ChunkFS is to improve the check and repair time and having all of the chunks fsck-ed at the same time is something of an antithesis. How is that problem fixed?

There is a simple way to change the number of mounts and time between fsck using tune2fs. The number of mounts and time between mounts are staggered across the four chunks.

[root@test64 laytonjb]# /sbin/tune2fs -c 10 /dev/sdb1
tune2fs 1.41.7 (29-June-2009)
Setting maximal mount count to 10
[root@test64 laytonjb]# /sbin/tune2fs -i 60d /dev/sdb1
tune2fs 1.41.7 (29-June-2009)
Setting interval between checks to 5184000 seconds
[root@test64 laytonjb]# /sbin/tune2fs -c 12 /dev/sdb2
tune2fs 1.41.7 (29-June-2009)
Setting maximal mount count to 12
[root@test64 laytonjb]# /sbin/tune2fs -i 90d /dev/sdb2
tune2fs 1.41.7 (29-June-2009)
Setting interval between checks to 7776000 seconds
[root@test64 laytonjb]# /sbin/tune2fs -c 14 /dev/sdc1
tune2fs 1.41.7 (29-June-2009)
Setting maximal mount count to 14
[root@test64 laytonjb]# /sbin/tune2fs -i 120d /dev/sdc1
tune2fs 1.41.7 (29-June-2009)
Setting interval between checks to 10368000 seconds
[root@test64 laytonjb]# /sbin/tune2fs -c 16 /dev/sdc2
tune2fs 1.41.7 (29-June-2009)
Setting maximal mount count to 16
[root@test64 laytonjb]# /sbin/tune2fs -i 150d /dev/sdc2
tune2fs 1.41.7 (29-June-2009)
Setting interval between checks to 12960000 seconds

There are two days between mounts for each partition (10, 12, 14, 16) and there are 30 days between checks for each partition (60, 90, 120, 150). This way during a reboot or a remount only one partition at a time will have a forced fsck. Although if admins don’t like this forced fsck process, it can easily be disabled but then it’s entirely up to the administrator to periodically perform an fsck if so desired.

For this example, the intent of the union is to be for /home on the system. For this test system, /home is already used, so for this example the union mount point will be /BIGhome. However, before creating the union mount, the four chunks much be mounted. First the mount points for each of the four chunks is created.

[root@test64 laytonjb]# mkdir /mnt/home1
[root@test64 laytonjb]# mkdir /mnt/home2
[root@test64 laytonjb]# mkdir /mnt/home3
[root@test64 laytonjb]# mkdir /mnt/home4

Then /etc/fstab is modified as below,

LABEL=/                 /                       ext3    defaults        1 1
LABEL=/home             /home                   ext3    defaults        1 2
LABEL=/boot1            /boot                   ext2    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
LABEL=SWAP-hda2         swap                    swap    defaults        0 0
/dev/sdb1               /mnt/home1              ext3    defaults,data=ordered   0 0
/dev/sdb2               /mnt/home2              ext3    defaults,data=ordered   0 0
/dev/sdc1               /mnt/home3              ext3    defaults,data=ordered   0 0
/dev/sdc2               /mnt/home4              ext3    defaults,data=ordered   0 0

Note that the mounts use the option, “data=ordered” on the recommendation of Valerie Aurora with the new 2.6.30 kernel. Then the chunks are mounted.

[root@test64 laytonjb]# mount -a
[root@test64 laytonjb]# mount
/dev/hda3 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /home type ext3 (rw)
/dev/hda1 on /boot type ext2 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/sdb1 on /mnt/home1 type ext3 (rw,data=ordered)
/dev/sdb2 on /mnt/home2 type ext3 (rw,data=ordered)
/dev/sdc1 on /mnt/home3 type ext3 (rw,data=ordered)
/dev/sdc2 on /mnt/home4 type ext3 (rw,data=ordered)

Finally the union mount is created in /etc/fstab as show below.

LABEL=/                 /                       ext3    defaults        1 1
LABEL=/home             /home                   ext3    defaults        1 2
LABEL=/boot1            /boot                   ext2    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
LABEL=SWAP-hda2         swap                    swap    defaults        0 0
/dev/sdb1               /mnt/home1              ext3    defaults,data=ordered   0 0
/dev/sdb2               /mnt/home2              ext3    defaults,data=ordered   0 0
/dev/sdc1               /mnt/home3              ext3    defaults,data=ordered   0 0
/dev/sdc2               /mnt/home4              ext3    defaults,data=ordered   0 0
unionfs                 /BIGhome                unionfs dirs=/mnt/home1=rw:/mnt/home2=rw:/mnt/home3=rw:/mnt/home4=rw  0 0

Notice that all four chunks are mounted read-write since they are intended for /home. The order of the chunks, /mnt/home1, /mnt/home2, /mnt/home3, /mnt/home4 is completely arbitrary. Finally the union is mounted.

[root@test64 laytonjb]# mount -a
[root@test64 laytonjb]# mount
/dev/hda3 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /home type ext3 (rw)
/dev/hda1 on /boot type ext2 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/sdb1 on /mnt/home1 type ext3 (rw,data=ordered)
/dev/sdb2 on /mnt/home2 type ext3 (rw,data=ordered)
/dev/sdc1 on /mnt/home3 type ext3 (rw,data=ordered)
/dev/sdc2 on /mnt/home4 type ext3 (rw,data=ordered)
unionfs on /BIGhome type unionfs (rw,dirs=/mnt/home1=rw:/mnt/home2=rw:/mnt/home3=rw:/mnt/home4=rw)

One More Step – Adding Users

Now that the “chunky” file system has been created, there remains the small step of adding users. So, being the good admins that we are, we use the useradd command to add a user. For this example, user2 is added to the system.

[root@test64 laytonjb]# /usr/sbin/useradd -d /BIGhome/user2 -m -s /bin/bash user2
[root@test64 laytonjb]# su user2
[user2@test64 laytonjb]$ cd
[user2@test64 ~]$ ls -s
total 0
[user2@test64 ~]$ pwd
/BIGhome/user2
[user2@test64 ~]$

Notice that the home directory for the user is specified as /BIGhome/user2. By changing the directory to the home directory you can see that the home directory is /BIGhome/user2. But which partition is actually used?

[root@test64 laytonjb]# ls -s /mnt/home1
total 20
16 lost+found   4 user2
[root@test64 laytonjb]# ls -s /mnt/home2
total 16
16 lost+found
[root@test64 laytonjb]# ls -s /mnt/home3
total 16
16 lost+found
[root@test64 laytonjb]# ls -s /mnt/home4
total 16
16 lost+found

The home directory for user2 is actually in /mnt/home1, the first partition. If you added a second user they too would go into the first directory (/mnt/home1) listed in the union mount. A third user would do the same. This would continue until the first directory is filled and then would move to the second directory (/mnt/home2). However, it is much more likely that we would like to spread the users across the four directories in this example. So how can this be done?

A perhaps better approach is to specify the user’s home directory as the one of the directories in the union. Since the first added user ended up into the first directory (/mnt/home1) it seems logical to put the next user in one of the other directories (e.g. /mnt/home3).

[root@test64 laytonjb]# /usr/sbin/useradd -d /mnt/home3/user3 -m -s /bin/bash user3
[root@test64 laytonjb]# su user3
[user3@test64 laytonjb]$ cd
[user3@test64 ~]$ pwd
/mnt/home3/user3
[root@test64 laytonjb]# ls -s /BIGhome/
total 24
16 lost+found   4 user2   4 user3

What is interesting is the the home directory is /mnt/home3/user3 rather than /BIGhome/user3.

To better understand how files are created for user3 within the “chunky” union, let’s create a couple of simple test files. First, let’s cd to the user’s home directory and create a simple file, junk.

[user3@test64 ~]$ pwd
/mnt/home3/user3
[user3@test64 ~]$ vi junk
[user3@test64 ~]$ ls -s
total 4
4 junk

Notice that the “pwd” is /mnt/home3/user3. Then a second file, junk2 is created but this time the directory is changed to /BIGhome/user3.

[user3@test64 ~]$ cd /BIGhome/user3
[user3@test64 user3]$ pwd
/BIGhome/user3
[user3@test64 user3]$ vi junk2
[user3@test64 user3]$ ls -s
total 8
4 junk  4 junk2

So with the union mount, the user can write to the directory where their home directory is located (in this case, /mnt/home3/user3) and also the union directory, /BIGhome/user3.

Final Comments

As capacities increase the prediction is that the time to perform a fsck will increase at a dramatic rate. The concepts behind ChunkFS were developed in direct response to this increase in check and repair times. Fundamentally ChunkFS breaks up the file system into “chunks” that can be checked and repaired independently while allowing data files to extend across the chunks.

This article takes some of the principles of ChunkFS and uses UnionFS to create “large” file systems that are easier to fsck than a typical single file system. While ChunkFS allows data to extend across chunks as needed, using UnionFS restricts the data to the chunk where it’s located. If you can accept that restriction along with a little extra planning of where data is located then this approach can be used to great advantage.

As homework for this article, think about the following problem: Using this example, how could you adapt the layout if a user exceeded the space in their chunk? I have my own solution(s) to this problem but I’m interested in your solutions. Please submit them to me at jlayton _at_ linux-mag.com and I will publish them next week. Until then, just remember, file systems are good when they are chunky.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62