Improving MetaData Performance of the Ext4 Journaling Device

In the never-ending quest for more performance, we examine three different journaling device options for ext4 with an eye toward improving metadata performance. Who doesn't like speed?
# ls -lsa /dev/ram*
0 lrwxrwxrwx 1 root root     4 Dec  6 17:27 /dev/ram -> ram1
0 brw-r----- 1 root disk 1,  0 Dec  6 17:27 /dev/ram0
0 brw-r----- 1 root disk 1,  1 Dec  6 17:27 /dev/ram1
0 brw-r----- 1 root disk 1, 10 Dec  6 17:27 /dev/ram10
0 brw-r----- 1 root disk 1, 11 Dec  6 17:27 /dev/ram11
0 brw-r----- 1 root disk 1, 12 Dec  6 17:27 /dev/ram12
0 brw-r----- 1 root disk 1, 13 Dec  6 17:27 /dev/ram13
0 brw-r----- 1 root disk 1, 14 Dec  6 17:27 /dev/ram14
0 brw-r----- 1 root disk 1, 15 Dec  6 17:27 /dev/ram15
0 brw-r----- 1 root disk 1,  2 Dec  6 17:27 /dev/ram2
0 brw-r----- 1 root disk 1,  3 Dec  6 17:27 /dev/ram3
0 brw-r----- 1 root disk 1,  4 Dec  6 17:27 /dev/ram4
0 brw-r----- 1 root disk 1,  5 Dec  6 17:27 /dev/ram5
0 brw-r----- 1 root disk 1,  6 Dec  6 17:27 /dev/ram6
0 brw-r----- 1 root disk 1,  7 Dec  6 17:27 /dev/ram7
0 brw-r----- 1 root disk 1,  8 Dec  6 17:27 /dev/ram8
0 brw-r----- 1 root disk 1,  9 Dec  6 17:27 /dev/ram9
0 lrwxrwxrwx 1 root root     4 Dec  6 17:27 /dev/ramdisk -> ram0

For this simple example, the ramdisk, /dev/ram0 was used. The first step is to expand it to the maximum size, which is 16MB, without rebooting the kernel using the “dd” command.

# dd if=/dev/zero of=/dev/ram0 bs=1k count=16000
16000+0 records in
16000+0 records out
16384000 bytes (16 MB) copied, 0.0411906 seconds, 398 MB/s

The second step is to create an external journal on the expanded ramdisk.

# mke2fs -O journal_dev /dev/ram0
mke2fs 1.41.9 (22-Aug-2009)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
0 inodes, 4096 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
0 block group
32768 blocks per group, 32768 fragments per group
0 inodes per group
Superblock backups stored on blocks:

Zeroing journal device: done

The third step is to tell the file system that it does not have a journal.

# tune2fs -O ^has_journal /dev/sdb1
tune2fs 1.41.9 (22-Aug-2009)

The final step is to then tell the file system that it has an external journal on a specific device.

# tune2fs -o journal_data -j -J device=/dev/ram0 /dev/sdb1
tune2fs 1.41.9 (22-Aug-2009)
Creating journal on device /dev/ram0: done
This filesystem will be automatically checked every 31 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
# tune2fs -l /dev/sdb1
tune2fs 1.41.9 (22-Aug-2009)
Filesystem volume name:   
Last mounted on:          
Filesystem UUID:          7438d86f-7e12-4208-ad52-36de72591e0a
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    journal_data
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              30531584
Block count:              122096000
Reserved block count:     6104800
Free blocks:              120161866
Free inodes:              30531573
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      994
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Sat Dec  5 20:15:20 2009
Last mount time:          n/a
Last write time:          Sat Dec  5 20:35:12 2009
Mount count:              0
Maximum mount count:      31
Last checked:             Sat Dec  5 20:15:20 2009
Check interval:           15552000 (6 months)
Next check after:         Thu Jun  3 21:15:20 2010
Lifetime writes:          7590 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal UUID:             d19989da-109e-4fbc-abc5-dc42ce5da249
Journal device:           0x0100
Default directory hash:   half_md4
Directory Hash Seed:      278035d2-49a3-474c-bb13-5174d44fec51
Journal backup:           inode blocks

For all three journal device options the file system is mounted with the option “data=ordered” option.

Benchmark Results

The first combination tested was for small files (4 KiB) with a shallow directory structure. Table 1 below lists the results with an average value and just below it, in red, is the standard deviation.

Table 1 – Benchmark Times Small Files (4 KiB) – Shallow Directory Structure

Journal Location Directory Create
(secs.)
File Create
(secs.)
File Remove
(secs.)
Directory Remove
(secs.)
Same Disk Journal 31.10
0.83
355.70
5.39
76.70
0.90
6.40
0.92
Second Disk Journal 28.40
1.28
346.70
2.53
70.90
0.94
6.80
3.89
Ramdisk Journal 26.30
0.46
351.11
3.33
70.50
1.02
15.70
0.64

The first test, directory creates, had an average run time of approximately 30 seconds for all three device journals, so the results may not be that meaningful. In addition, the directory remove test ran in less than 10 seconds. Consequently, this test may not have much value.

Table 2 below lists the performance results with an average value and just below it, in red, is the standard deviation.

Table 2 – Performance Results of Small Files (4 KiB) – Shallow Directory Structure

Journal Location Directory Create
(Dirs/sec)
File Create
(Files/sec)
File Create
(KiB/sec)
File Remove
(Files/sec)
Directory Remove
(Dirs/sec)
Same Disk Journal 270.60
7.32
946.70
14.59
3,788.30
58.72
4,391.90
51.82
1,353.20
266.08
Second Disk Journal 296.50
13.30
971.10
7.06
3,885.90
28.73
4,751.50
63.66
1,529.50
513.69
Ramdisk Journal 319.40
5.50
959.00
9.25
3,827.20
36.44
4,778.60
69.29
536.90
21.62

The second combination tested was for small files (4 KiB) with a deep directory structure. Table 3 below lists the benchmark times with an average value and just below it, in red, is the standard deviation.

Table 3 – Benchmark Times Small Files (4 KiB) – Deep Directory Structure

Journal Location Directory Create
(secs.)
File Create
(secs.)
File Remove
(secs.)
Directory Remove
(secs.)
Same Disk Journal 335.90
8.93
627.60
10.36
343.30
6.78
202.00
3.58
Second Disk Journal 324.50
3.17
633.30
7.09
330.60
2.15
214.40
1.36
Ramdisk Journal 312.40
3.56
624.80
4.66
333.00
3.07
253.00
25.78

All four tests were longer than 60 seconds so they are valid for examination.

Table 4 below lists the performance results with an average value and just below it, in red, is the standard deviation.

Table 4 – Performance Results of Small Files (4 KiB) – Deep Directory Structure

Journal Location Directory Create
(Dirs/sec)
File Create
(Files/sec)
File Create
(KiB/sec)
File Remove
(Files/sec)
Directory Remove
(Dirs/sec)
Same Disk Journal 263.40
7.05
564.20
9.20
2,258.10
37.07
1,031.90
20.40
438.10
7.63
Second Disk Journal 272.50
2.80
559.00
6.36
2,237.70
25.28
1,071.30
6.96
412.40
2.42
Ramdisk Journal 282.80
2.99
566.60
4.25
2,267.80
17.08
1,063.60
9.62
362.60
3.95

The third combination tested was for medium files (4 MiB) with a shallow directory structure. Table 5 below lists the benchmark times with an average value and just below it, in red, is the standard deviation.

Table 5 – Benchmark Times Medium Files (4 MiB) – Shallow Directory Structure

Journal Location Directory Create
(secs.)
File Create
(secs.)
File Remove
(secs.)
Directory Remove
(secs.)
Same Disk Journal 0.40
0.49
155.80
3.25
13.20
3.25
0.00
0.00
Second Disk Journal 0.20
0.40
154.40
3.41
12.20
3.49
0.10
0.30
Ramdisk Journal 0.40
0.49
153.40
2.06
13.20
3.25
0.10
0.30

For these tests, the first test, directory creates, took less than 1 second. This time is very small and, consequently, the results are not as applicable as some of the other tests. The file removes test took about 10-15 seconds. Again this is a very short time and the results may not be as applicable. The last test, directory removes, took 0-1.4 seconds. This time too, is very short.

Table 6 below lists the performance results with an average value and just below it, in red, is the standard deviation.

Table 6 – Performance Results of Medium Files (4 MiB) – Shallow Directory Structure

Journal Location Directory Create
(Dirs/sec)
File Create
(Files/sec)
File Create
(KiB/sec)
File Remove
(Files/sec)
Directory Remove
(Dirs/sec)
Same Disk Journal 122.80
150.40
19.30
0.46
79,344.30
1,133.38
250.40
69.83
0.00
0.00
Second Disk Journal 61.40
122.80
19.40
0.66
79,570.40
1,682.43
271.60
72.43
30.70
92.10
Ramdisk Journal 122.80
150.40
19.80
0.40
80,065.70
1,047.94
252.30
84.27
30.70
92.10

The fourth and final combination tested was for medium files (4 MiB) with a deep directory structure. Table 7 below lists the benchmark times with an average value and just below it, in red, is the standard deviation.

Table 7 – Benchmark Times Medium Files (4 MiB) – Deep Directory Structure

Journal Location Directory Create
(secs.)
File Create
(secs.)
File Remove
(secs.)
Directory Remove
(secs.)
Same Disk Journal 4.20
0.60
228.30
1.35
16.30
3.47
2.30
0.78
Second Disk Journal 4.20
0.60
225.90
1.58
15.30
2.69
1.50
0.50
Ramdisk Journal 5.50
0.50
225.90
1.51
14.90
3.86
2.40
0.49

The first test, directory creates, took 2-3 seconds, which is very short. The time for the third test, file removal, was also fairly short at 11-19 seconds. The last test, directory removes, was extremely fast at less than 2 seconds. These three results are somewhat suspect because of short run time.

Table 8 below lists the performance results with an average value and just below it, in red, is the standard deviation.

Table 8 – Results of Medium Files (4 MiB) – Deep Directory Structure

Journal Location Directory Create
(Dirs/sec)
File Create
(Files/sec)
File Create
(KiB/sec)
File Remove
(Files/sec)
Directory Remove
(Dirs/sec)
Same Disk Journal 497.50
76.57
17.40
0.49
71,731.90
422.40
265.30
69.85
1,006.00
392.48
Second Disk Journal 497.50
76.57
17.80
0.40
72,495.30
507.54
225.90
1.58
1,535.00
512.00
Ramdisk Journal 375.00
34.00
18.00
0.45
72,495.00
485.26
225.90
1.51
886.60
167.06

Benchmark Observations

The first thing you should check when examining the results is the time to complete the test. If the test does not run longer than 60 second than the test is suspect because not enough time has been allowed for meaningful results. After that then you can contrast or compare the three journal device options.

The first test, shallow directory structure and small files (Tables 1 and 2), did not have run times greater than 60 seconds except for file create and file remove. If we examine the results for these two tests, the following observations can be made.

  • File Creation:
    • Putting the journal on a second disk is slightly faster than having it on the same disk.
    • Putting the journal on the ramdisk improved metadata performance in comparison to having it on the same drive. However it was just a very tiny bit slower than putting the journal on a second drive.

  • File Removal:
    • Putting the journal on the same disk is about 10% slower than putting it either on a second disk or a ramdisk.

The second test, small files, deep directory, produced longer run times for the various metadata tests. All tests ran longer than 60 seconds allowing all data to be contrasted or compared. Tables 3 and 4 are used to compared results for the three journal devices:

  • Directory Creation:
    • Putting the journal on a second disk is faster than putting the journal on the same disk.
    • Putting the journal on a ramdisk is even faster than putting it on a second disk. It is about 10% faster than the journal on a single disk.

  • File Creation:
    • All three journal device options produce about the same results

  • File Removal:
    • All three journal device options produce about the same results

  • Directory Removal:
    • Unexpectedly, putting the journal on the same disk is faster than putting it on a second disk with the same disk option being about 5% faster.
    • Perhaps even more unexpectedly, putting the journal on a ramdisk is slower than putting it on a second disk. More over, the ramdisk journal is approximately 14% slower than having the journal on a single disk.

The third case, medium files (4 MiB) and a shallow directory structure only had one test that ran longer than 60 seconds, file create. Tables 5 and 6 contain the results for this test. Comparing the results for the three journal locations results in the following observations:

  • File Create:
    • All three journal device options produce about the same results

The last test is for medium files for a deep directory structure. Tables 7 and 8 contain the times and results for this test for all three journal devices. As with the medium files, shallow directory test (the previous test), only one test, file creation, that ran longer than 60 seconds. Comparing results for the three devices has the following observations.

  • File Creation:
    • Putting the journal on a second hard drive or a ramdisk produced slightly faster results than putting it on the same disk.

Summary

The journal is an important aspect of a file system from a data integrity perspective and also a performance perspective. Many file systems in Linux allow you to put the journal on a different device. This flexibility gives the opportunity to use various block devices to improve performance.

This article examines three options for placing the journal for ext4: (1)On the same disk as the file system, (2)On a second drive, and (3)On a ramdisk. To contrast the three options metadata tests using the benchmark fdtree were run. This metadata benchmark is easy to run (only requiring bash) and has been used before in metadata testing.

The results are a bit mixed with no clear journal location as the winner. One would have expected the ramdisk to produce the fastest file system performance in terms of metadata performance but only in one or two instances was the ramdisk faster than the other two options. In a larger set of cases, using a second hard drive was found to be just as fast or faster than using a ramdisk.

The reasons for the ramdisk not producing faster results is not known at this time. Further testing will have to be performed, but there is some speculation that the size of the journal played a role. If you compare the results in this article to the results in a previous article one would see that the results here are much slower. It is presumed that is is because the size of the journal was artificially constrained to be 16 MB. Future testing will focus on determining if this is the cause.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62