Metadata Performance of Four Linux File Systems

Using the principles of good benchmarking, we explore the metadata performance of four linux file systems using a simple benchmark, fdtree.

In a previous article, the case was made for how low file system benchmarks have fallen. Benchmarks have become the tool of marketing to the point where they are mere numbers and do not prove of much use. The article reviewed a paper that examined nine years of storage and file system benchmarking and made some excellent observations. The paper also made some recommendations about how to improve benchmarks.

This article isn’t so much about benchmarks as a product, but rather it is an exploration looking for interesting observations or trends or the lack thereof. In particular this article examines the metadata performance of several Linux file systems using a specific micro-benchmark. Fundamentally this article is really an exploration to understand if there is any metadata performance differences between 4 Linux file systems (ext3, ext4, btrfs, and nilfs) using a metadata benchmark called fdtree. So now it’s time to eat our dog food and do benchmarking with the recommendations previously mentioned.

Start at the Beginning – Why?

The previous article made several observations about benchmarking, one of which is that storage and file system benchmarks seldom, if ever, explain why they are performing a benchmark. This is a point that is not to be underestimated. Specifically, if the reason why the benchmark was performed can not be adequately explained, then the benchmark itself becomes suspect (it may just be pure marketing material).

Given this point, the reason the benchmark in this article is being performed is to examine or explore if, and possibly how much, difference there is between the metadata performance of four Linux file systems using a single metadata benchmark. The search is not to find which file system is the best because it is a single benchmark, fdtree. Rather it is to search for differences and contrast the metadata performance of the file systems.

Why is examining the metadata performance a worthwhile exploration? Glad that you asked. There are a number of applications, workloads, and classes of applications that are metadata intensive. Mail servers can be very metadata intensive applications because of the need to read and write very small files. Sometimes databases have workloads that do a great deal of reading and writing small files. In the world of technical computing, many bioinformatic applications such as gene sequencing applications, do a great deal of small reads and writes.

The metadata benchmark used in this article is called fdtree. It is a simple bash script that stresses the metadata aspects of the file system using standard *nix commands. While it is not the most well known benchmark in the storage and file system world, it is a bit better known in the HPC (High Performance Computing) world.

An Examination of fdtree

Before jumping into the results, it is appropriate and highly recommended to examine the benchmark itself. fdtree is a simple bash script that performs four different metadata tests:


  • Directory creation
  • File creation
  • File removal
  • Directory Removal

It creates a specified number of files of a given size (in blocks) in a top-level directory. Then it creates a specified number of subdirectories and then in turn subdirectories are recursively created up to a specified number of levels and are populated with files.

Directory Creation

This phase of the benchmark begins by creating the number of specified directories in the main directory using the simple “mkdir” command in a bash function “create_dir”.

mkdir $base_name"L"$nl"D"$nd"/"

The bash variables specify the details of the directory names. The next step is to call the “create_dir” function recursively with a different “base name” (directory) to create all of the required directories.

create_dirs $((nl-1)) $base_name"L"$nl"D"$nd"/"

File Creation

This step of the benchmark creates the required number of files using the “dd” command in a bash function, “create_files”.

dd if=/dev/zero bs=4096 count=$fsize of=$file_name > /dev/null 2>&1

To create files in the subdirectories, the bash function is called recursively. As part of the benchmark, the number of 4 KiB blocks is specified ($fsize).

File Removal

The third function in the benchmark is to remove the files that were created. This is done with the standard “rm” command in a function called “remove_files”.

rm -f $file_name

The function “remove_files” is called recursively to remove all of the files.

Directory Removal

The fourth and final function in the benchmark is to remove the directories. This is done in a bash function “remove_dirs” using the *nix command “rmdir $dir_names”

rmdir $dir_names

The function “remove_dirs” is called recursively to remove all of the directories.

Overall the script uses standard *nix commands for the benchmark. It does not use any recursive options for any of the *nix commands. It stresses the metadata capabilities of the file system because of the potentially large number of files and directories.

The one interesting thing that the test does, is round the results, time and rates, to integer values. So there could be times when the time to execute the test could be 0 seconds. That is, the test ran in less than 1 second.

Running the benchmark

In the benchmark exploration in this article, fdtree was used in 4 different approaches to stressing the metadata capability:


  • Small files (4 KiB)

    • Shallow directory structure
    • Deep directory structure

  • Larger files (4 MiB)

    • Shallow directory structure
    • Deep directory structure

The two file sizes, 4 KiB (1 block) and 4 MiB (1,000 blocks) were used to get some feel for a range of performance as a function of the amount of data. The two directory structures were used to stress the metadata in different ways to discover if there is any impact on the metadata performance. The shallow directory structure means that there are many directories but not very many levels down. The deep directory structure means that there are not many directories at a particular level but that there are many levels.

To create the specific parameters for fdtree used in the exploration, there were three overall goals:


  • Keep the total run time to approximately 10-12 minutes at a maximum

  • Keep the total data for the two directory structures approximately the same

  • Keep the run time for each of the four functions greater than 1 minute if possible

All four functions were not always run for 1 minute, sometimes only for a few seconds. These will be noted in the results.

The command lines for the four combinations are:

Small Files – Shallow Directory Structure

./fdtree.bash -d 20 -f 40 -s 1 -l 3

This command creates 20 sub-directories from each upper level directory at each level (“-d 20″) and there are 3 levels (“-l 3″). It’s a basic tree structure. This is a total of 8,421 directories. In each directory there are 40 files (“-f 40″) each sized at 1 block (4 KiB) denoted by “-s 1″. This is a total of 336,840 files and 1,347,360 KiB total data.

Small Files – Deep Directory Structure

./fdtree.bash -d 3 -f 4 -s 1 -l 10

This command creates 3 sub-directories from each upper level directory at each level (“-d 3″) and there are 10 levels (“-l 10″). This is a total of 88,573 directories. In each directory there are 4 files each sized at 1 block (4 KiB). This is a total of 354,292 files and 1,417,168 KiB total data.

Medium Files – Shallow Directory Structure

./fdtree.bash -d 17 -f 10 -s 1000 -l 2

This command creates 17 sub-directories from each upper level directory at each level (“-d 17″) and there are 2 levels (“-l 2″). This is a total of 307 directories. In each directory there are 10 files each sized at 1,000 blocks (4 MiB). This is a total of 3,070 files and 12,280,000 KiB total data.

Medium Files – Deep Directory Structure

./fdtree.bash -d 2 -f 2 -s 1000 -l 10

This command creates 2 sub-directories from each upper level directory at each level (“-d 2″) and there are 10 levels (“-l 10″). This is a total of 2,047 directories. In each directory there are 2 files each sized at 1,000 blocks (4 MiB). This is a total of 4,094 files and 16,376,000 KiB total data.

Comments on "Metadata Performance of Four Linux File Systems"

typhoidmary

I thought this was interesting and useful. I would like to know how much Hardware makes a difference. SATA vs. PATA vs. SCSI. Do different chipsets perform differently? And of course different drives with different cache sizes, RPMs etc…

Reply
mbainter

Definitely interesting, though I would\’ve liked to have seen xfs and the reiser3/reiser4 filesystems compared as well. There are some significant differences there that are worth considering.

You should also include large files. Particularly with the advent of media-servers and the like being able to perform efficiently with large files is important, and that\’s not covered here.

Last but not least I\’d like to see some comparison of storage efficiency for these different types of files. If you can move them fast, that\’s great, but if you can\’t store a particular type of file efficiently and I\’m going to lose say 20% of my storage because of it that\’s important to consider when making the choice.

Reply
laytonjb

In general I agree with both your comments (typhoidmary and mbainter). But let me comment really quickly on the details.

@typhoidmary:
I would love to test different chipsets and different drives. I just need the money to buy it :)

BTW – thanks for all of your comments. I\’ve noticed you read my articles and post comments. That\’s always appreciated.

@mbainter:
I wanted to test xfs and the reiser\’s but I ran out of time and the article was getting a little long. I will try to do a follow-up at some point with those numbers (maybe next week).

I also didn\’t do large files (400 MiB+?) because of time, but I do want to do those runs.

For both of you – thanks for the comments.

Jeff

Reply
chrisjoelly

Thanks for that comparison.

Is it possible to include some other, not so often used filesystems as well? e.g. GFS or GFS2 with various storage systems below like DRBD? And tuning opportunities for filesystems in typical scenarios would be a great article too :-)

Chris

Reply
mdavid

hi Jeff
I have read the article carefully, and have also read the review article about 9 years of FS and Storage benchmarking.

Let me make some remarks which I hope are constructive criticism.
I have downloaded the fdtree source.
First off, it\’s a single thread
Your machine has 8 GB ram.

In my opinion, and from experience (also done some FS benchmarking), testing with a total file size <= 1.5 x amount of RAM can go to caches first. Your tests with small sizes amount to around 1.3-1.4 GB

While the tests with 4MB size the total ranges between 12-16GB.

I think the results of \”creation\” are not so bound by caches, while removal can be cached, and that\’s why I think removal of files and dirs, have timings which are quite small to the extent of not being able to draw conclusions in certain cases.

For metadata, and AFAIK, each file or dir has 4KB for the inode, (at least for the ext3), don\’t know for the others. One could imagine testing \”pure\” metadata with a \”touch\” *nix command, instead of dd, and being careful to make a total of 12GB/4KB = 3 million files+dirs for example.

Furthermore, you mention in the beginning why you are benchmarking metadata, but the fdtree misses completely one very important operation which is \”stat\” or a read of the inode, there are some workloads where you write once and read many (even if it\’s small files).

this leads to some sugestions:
I have used recently bonnie++ 1.03e, which does also metadata benchmarking including the \”stat\” operation.
The iozone tarball includes an exec called fileop (though I never tried it).

Though I imagine that you don\’t have a long time to do the testing (as some of us can), just to give you an example, one of my last tests with the above bonnie++ version, each run could take between 2 to 4 hours depending on the filesystem, and I also run it 10 times.

finally, if in your future tests you include the read/stat operation, just try out mount the FS with and without atime,diratime.

OK, that\’s it, sorry if I was too obvious in some things, or to strong, as I said I tried to be constructive, and continue your good work.

I read your other article at a time when I was starting some benchmarks, and I stopped to read first

regards

Mario David

Reply
laytonjb

@mdavid,

I think you have some interesting points but let me explain a few things.

fdtree, while a simple bash benchmark, also uses all of the cores on my test box. While I didn\’t show the image, I have a picture of gkrellm while the benchmark is running. All 4 cores are being used. I\’m not entirely sure how this works but I think it\’s because of the recursion in the script. But this shows how little I know about bash.

Second, fdtree is not an all encompassing benchmark. It only tests file and directory create and removal in a specific order. I\’m hoping to test another benchmark named mdtree which also stresses other aspects of metadata performance.

Third, to be honest, I\’m not sure about the caching aspect of fdtree. Linux might cache the file operations, but since there are so many, I\’m not sure if it does or doesn\’t. Perhaps the recursion affects the caching. Something to look into. (thanks for pointing that out).

One thing I didn\’t do and should have done was to watch the CPU load during the runs. I sort of watched it using gkrellm but I didn\’t gather any statistics.

But you do correctly point out that almost any benchmark doesn\’t always stress all aspects that you are interested in. As you correctly point out, fdtree doesn\’t stress stat. Other benchmarks will stress the file systems in a different manner. For example, as you mention, Bonnie++ does stress metadata operations and is perhaps a reasonable benchmark to test.

Thanks for your comments. They are really appreciated. Don\’t hesitate to post.

Thanks!

Jeff

Reply

You understand thus significantly relating to this matter, made me personally consider it from numerous numerous angles. Its like men and women are not interested until it is one thing to do with Girl gaga! Your personal stuffs outstanding. At all times take care of it up!

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>