The world is awash in data. This fact is putting more and more pressure on file systems to efficiently scale to handle increasingly large amounts of data. Recently, Ric Wheeler from Redhat experimented with putting 1 Billion files in a single file system to understand what problems/issues the Linux community might face in the future. Let's see what happened...
As Ric pointed out in a presentation he made at 2010 LinuxCon, 1 billion files is very conceivable from a capacity perspective. If you use 1KB files, then 1 billion files (1,000,000,000) takes up only 1TB. If you use 10KB files, then you need 10TB’s to accommodate 1 billion files (not to difficult to imagine even in a home system). If you 100KB files, then you need 100TB’s to hold 1 billion files. Again it’s not hard to imagine 100 TB’s in a storage array. The point is that with smaller files, current storage devices can easily accommodate 1 billion files from a capacity perspective.
Ric built a 100TB storage array (raw capacity) for performing some tests. But, as previously mentioned, you don’t need much capacity for performing these experiments. According to Ric the life cycle of a file system has several stages
- Create the file system (mkfs)
- Fill the file system
- Iteration over the files (basically use it)
- Repair the file systems (fsck)
- Removing files
Ric create 1 million file file systems and experimented with each of these steps to understand how they performed. He examined several file systems – ext3, ext4, xfs, and btrfs – for each of these stages and recorded the results.
To understand the amount of time it takes to create file systems, Ric performed a simple experiment by first creating a 1TB file system using the four file systems on a SATA disk. To understand if the bottleneck was the performance of the storage media itself, he also built a 75GB file system on a PCIe SSD. Figure 1 below is from his LinuxCon talk plotting the amount of time it took for each file system to be created.
Figure 1: File System make (mkfs) for four file systems and two hardware devices
Remember that he’s focusing on 1 million file file systems for these experiments. Notice that ext3 and ext4 took a long time to create because of the need to create static inode tables. XFS, with dynamic inode allocation, is much faster only taking about 20 seconds (ext3 took approximately 275 seconds). On the other hand, creating the file systems on the PCIe based SSD was considerably faster relative to the SATA drive. However, it is still noticeable that ext3 and ext4 took longer than the XFS and btrfs to create a file system.
The second phase of the file system’s life cycle is to fill the file system with data. Recall that Ric is using 1 million files to fill up the file systems. He used 1,000 directories with 1,000 file each. Figure 2 below shows the amount of time it took to create 1 million files on the two types of hardware – 1TB file system on a SATA drive and a 75GB file system on a PCIe SSD.
Figure 2: File System file create (fill file system) for four file systems and two hardware devices
You can see in the figure that ext3 and XFS took the longest time to fill the file system on the SATA drive. It took ext3 about 9,700 seconds or 2 hours and 42 minutes, XFS took about 7,200 seconds or right around 2 hours, ext4 took about 800 seconds (a little over 13 minutes), and btrfs took about 600 seconds (10 minutes). On the PCIe SSD, while difficult to tell in the figure, XFS took slightly longer than ext3, ext4, or btrfs. Ext4 was the fastest file system to fill up on the SSD, but all four file systems take so little time that it’s difficult to differentiate between them.
The fourth phase in the life cycle is to repair the file system. Figure 3 below plots the amount of time it takes to repair the 1 million file configurations.
Figure 3: File system check/repair (fsck) for four file systems and two hardware devices
It’s pretty obvious that ext3 is much slower than the other file systems on both storage media, but it’s really noticeable on the SATA drive. It took about 1,040 seconds to repair the file system while btrfs, which had the second worse time, took only about 90 seconds. However, notice how fast the file system was repaired on the PCIe SSD. Even ext3 took only about 80 seconds to repair 1 million files.
The final life cycle phase is the remove files from the file systems. Figure 4 below plots the time it took to remove all 1 million files from each file system for both storage media.
Figure 4: File remove for four file systems and two hardware devices
Notice that XFS is much slower than even ext3 on the SATA drive. It took XFS about 3,800 seconds (a little over an hour) to remove all 1 million files. The next slowest file system was ext3 and it took about 875 seconds (not quite 15 minutes) to remove the files. Ext4 was the fastest on the SATA drive but btrfs wasn’t too far behind. On the other hand, on the PCIe SSD, the slowest was btrfs followed by ext3, XFS, and then ext4. But the differences in time is very small.
Hey everybody – watch this!