Metadata performance is perhaps the most neglected facet of storage performance. In previous articles we've looked into how best to improve metadata performance without too much luck. Could that be a function of the benchmark? Hmmm...
In previous articles about metadata performance, several file system journal options were examined. The last article about metadata performance examined the influence of the size of the journal relative to the size of the file system as well as where the journal is located – i.e. what device (disk or ramdisk).
This article reuses the journal size and location combination so that the basic trends can be compared to the previous benchmark, fdtree. There are two journal devices used: (1) a second disk that is separate from the file system disk, and (2) a ramdisk. The ramdisk is used to represent the fastest possible storage device to the journal to better understand if the speed of the journal device has an impact on performance. For real production systems the ramdisk is perhaps not the most useful location for the file system journal but the intent of using a ramdisk is to have data points where the fastest possible storage device is used for the journal.
The journal size and devices combination used in this article are:
- 16MB Disk (journal is located on a second disk)
- 64MB Disk
- 256MB Disk
- 1GB Disk
- 16MB Ramdisk (journal located on a ramdisk)
- 64MB Ramdisk
- 256MB Ramdisk
- 1GB Ramdisk
As with previous benchmarks metarates, with the command line mentioned in the previous section, will be run at least 10 times each for each journal size/device combination.
Enough with the test setup, let’s see some results! The following plots are the average of the results reported by metarates for the particular journal size/device combination. The standard deviation of the tests is also reported as error bars on the charts.
The first chart, Figure 1 below, plots the file create/close rate in number of creates/closes per second for the eight journal size/device combination. It uses only one process (NP=1 where NP means Number of Processes).
Figure 1: Metarates performance for NP=1 for File Create and Close Rate (in number of file creates and closes per second).
The general trend is fairly obvious. As the journal size is increased the file create/close rate increase dramatically from a low of about 1,466 create/close per second for the 16MB Ramdisk combination to almost 54,000 create/close per second for the 1GB Ramdisk combination. Also notice that as the journal size increases the ramdisk based journal performance pulls ahead of the disk based journal performance but not by a large margin.
The second chart shows the file stat rate (stats per second) for the NP=1 case (one process) for the eight journal size/device combination.
Figure 2: Metarates performance for NP=1 for File Stat Rate (in number of file stats per second).
In this chart it’s plainly obvious that there is very little change in performance with the size of the journal or its location.
Figure 3 below plots the utime rate (number of utime updates per second). This performance measure can be important for applications that update a large number of files such as those in the bioinformatics world or even databases (if keeping track of the access and modify dates is important). The figure contains the results for the NP=1 case.
Figure 3: Metarates performance for NP=1 for File Utime Rate(in number of file utimes per second).
These results are interesting in that the 16MB cases, both for the disk based and ramdisk based journals, are actually faster than the 64MB and 256MB cases. But for the 1GB size journal cases the results dramatically improve. This general trend is definitely repeatable although the standard deviation for the 1GB disk combination is fairly large relative to the average. Also, for this case, the 1GB Ramdisk combination is almost 33% faster than the 1GB disk combination illustrating the importance of the IO performance of the device holding the journal (at least for this performance measure).
The next three plots are for the NP=2 case. That is, two processes are run at the same time and are hitting the same file system and the same directory. Metarates outputs two numbers when more than one process is run. The first number is the “per-process” result. That is, the average result for each process. The second number is the total. That is, the sum of the results of the two processes. Both results are potentially interesting because the per-process results can be compared to the NP=1 results to understand the impact of multiple processes using the same file system (i.e. contention). The second number is useful because it can be examined for the general trend of performance with the number of processes. If the total results don’t change much as the number of processes increases, then it’s likely that the performance has reached it’s limit. Each of the following figures has two plots. The upper plot is the “per-process” result and the lower chart is the “total” result.
Figure 4 below plots the file create/close rate for NP=2 for the eight journal size/location combination.
Figure 4: Metarates performance for NP=2 for File Create and Close Rate (in number of file creates and closes per second).
These results are even more dramatic than for the NP=1 case. The per-process results at the top show the very large increase in performance as the journal size is increased. In addition, the ramdisk based journal is about 20% faster than the disk based journal (both at 1GB).
If you compare Figure 4 with Figure 1 you will see that the per-process results for NP=2 is actually a bit slower than for NP=1. This could indicate that the file system is already experiencing some contention due to two processes performing IO at the same time. However, if you compare the total rates in the lower part of Figure 4 to Figure 1, you will see that NP=2 has a much larger create/close rate than for NP=1 (approximately 59.5% better).
Figure 5 below plots the file stat results for NP=2. However, due to the large stats per second, what is actually plotted is “thousands of stats per second” for the journal size/device combination.
Figure 5: Metarates performance for NP=2 for File Stat Rate (in thousands of file stats per second).
As with the NP=1 case, there is little variation in performance as the journal size and device location changes.
The third performance measure, file utime, is plotted in Figure 6 for NP=2. The file utime rate is large enough that what is actually plotted is the “thousands of file utime per second.”
Figure 6: Metarates performance for NP=2 for File Utime Rate (in thousands of file utimes per second).
The general trends are fairly similar for this case as for the NP=1 case. There is a slight decrease in performance as the journal size is increased from 16MB to 64MB and 256MB. But when a 1GB size journal is used, the performance jumps dramatically. For both the disk and ramdisk cases, the performance at 1GB is about twice the performance at 16MB.
The next set of results are for the NP=4 case. This is running 4 processes at the same time using the same file system and the same directory. Recall that the test system is a quad-core system so there is one process per core. As with the NP=2 case there are two plots in each result – the top result is for the per-process results and the bottom chart is for the total results.
Figure 7 below plots the file create/close rate for NP=4 for the eight journal size/location combination. However, the results are getting large enough that what is actually plotted is “thousands of file create/close per second.”
Figure 7: Metarates performance for NP=4 for File Create and Close Rate (in thousands of file creates and closes per second).
Again, we see the general trend that the performance dramatically improves as the size of the journal increases. What is also interesting is to compare the top chart with the bottom chart. The bottom chart shows almost 4 times the performance of the top chart. So while we started to see some contention in this performance measure for NP=2, it appears that the contention isn’t as dramatic for NP=4.
Figure 8 below plots the file stat results for NP=4. However, due to the large stats per second, what is actually plotted is the “thousands of stats per second” for the journal size/device combination.
Figure 8: Metarates performance for NP=4 for File Stat Rate (in thousands of file stats per second).
Once more, not much change in the performance as the journal size/device is varied.
The third performance measure, file utime, is plotted in Figure 9 for NP=4. The file utime rate is large enough that what is actually plotted is the “thousands of file utime per second.”
Figure 9: Metarates performance for NP=4 for File Utime Rate (in thousands of file utimes per second).
The trend from these results is slightly different than for NP=1 and NP=2. For those cases the result at 16MB was better than 64MB and 256MB. However, for this case, NP=4, the 16MB case is worse than the 64MB case and close to the 256MB result.
Also somewhat obvious in the figure is the huge impact the 1GB journal size makes on performance. From the 16MB Disk case to the 1GB Ramdisk case, there is a factor of 30 in performance. It’s also obvious that at 1GB, the Ramdisk based journal is much faster than the Disk based journal but about a factor of about 3.5.
Metadata Trends with Number of Processes
One of the more interesting things we can do is to track the performance as the number of processes is increased from NP=1 to NP=2 to NP=4. The three figures below do this for the three performance measures: (1) File create/close, (2) File stat, and (3) File utime. Only the 1GB Ramdisk case is examined because it has the highest overall performance of the eight journal size/device combination.
Figure 10 compares the per-process file create/close performance (top figure) and the total create/close performance (bottom figure) for the three numbers of processes. It plots the performance in number of thousands of file create/close per second.
Figure 10: Comparison of Per-Process and Total File Create/Close Rates (K per second) for the 1GB Ramdisk case.
From the top chart we can see that the per-process performance decreases as we add processes indicating that contention is in effect. However, even though there is contention, performance is improving as the number of processes increases (see the bottom chart).
Figure 11 compares the per-process and total stat performance in thousands of file stats per second (K per second) for NP=1, NP=2, and NP=4.
Figure 11: Comparison of Per-Process and Total File Stat Rates (K per second) for the 1GB Ramdisk case.
In the prior examination of stat performance there was little change with the journal size or location. But in this figure we can see that there is contention affecting stat performance. The per-process performance almost drops by half going from NP=1 to NP=4. However, as with file create/close performance, the overall stat performance keeps increasing as seen in the bottom chart.
The last comparison, shown in Figure 12, is of the utime performance in thousands of file utimes per second (K per second) for NP=1, NP=2, NP=4.
Figure 12: Comparison of Per-Process and Total File Utime Rates (K per second) for the 1GB Ramdisk case.
The per-process performance almost decreases by 50% going from NP=1 to NP=4 for this performance measure. But again, as with the other two performance measures, the total performance keeps increasing.
A simple way to measure the contention is to compare the “efficiency” of the performance measures as the number of processes is increased. This is found by dividing the total performance for NP=2 and NP=4 by the performance at NP=1. Ideally, if there is no contention, the performance should increase linearly with the number of processes. So NP=2 would have twice the performance of NP=1 and NP=4 would have four times the performance of NP=1.
Table 1 below contains the results for the three performance measures as well as the ideal (used as a comparison).
Table 1 – Performance Efficiency as a function of the number of Processes
You can see the contention for performance even at NP=2 where the efficiencies are less than ideal. The difference between the ideal performance and the actual performance is even more pronounced for NP=4.
The results in the table are very interesting because they highlight the performance differences as the number of processes was increased. We can see that the file create/close performance efficiency scales better than the other two measures. But it still is about 25% below where it should be for NP=4. But the file utime performance suffers badly as the number of processes increases. At NP=4 it is more than 50% below the ideal performance.
There are likely to be many reasons for the contention and would require a great deal of study to determine the exact cause of the performance degradation. However, this article is already long enough and that is a topic for another day.
Comparison between fdtree and metarates
Let’s compare the general trends we obtained from running metarates with running fdtree by looking at some highlights from both. Starting with the highlights from fdtree.
- Small shallow case:
- Not much variation in file create performance as a function of journal size of journal device
- Larger journals are better for file remove (40% better for 16MB vs. 1GB)
- Small deep case:
- Larger journals are better for directory creates (factor of 4 for 16MB to 1GB)
- Ramdisk journals are a bit faster than disk bsed journals (but not much)
- Large journals are better for file creates. Ramdisks are a little bit better than disk based journals.
- Larger journals are better for file remove performance (factor of 2)
- Larger journals are better for directory removes (factor of 2)
- Medium shallow case:
- File creates – not much change based on journal size or journal device
- Medium deep:
- File creates – not much change based on journal size of journal device
Given these four cases and the observations, we can generalize into the following statements:
- Larger journal sizes are better
- Ramdisk do not make an appreciable performance difference
Now let’s summarize the results from metarates:
- For file create/close performance, larger journals are much better. Going from 16MB to 1GB can improve performance by over a factor of 20! Ramdisk based journals can improve performance by about 25% especially as the number of processes is increased.
- For file stat performance, journal size nor journal device made discernable change in performance.
- For file utime performance, a 1GB journal can be over 30 times faster than a 16MB journal. Perhaps even more importantly, a ramdisk based journal can be 2-3 times faster than a disk based journal. This is especially true as the number of processes is increased.
As with fdtree, we can reduce the observations down to two:
- Larger journal sizes are better
- Ramdisks are better
One of the intents of this article is to examine metadata performance using the metarates benchmark with a combination of journal size and devices as well as a function of the number of processes. But perhaps more importantly, an intent is to examine the trends of metadata performance when a different benchmark is used.
In the many figures in this article we can see that for file create/close performance and file utime performance, the journal size/device combination can make a big difference on performance. File stat performance was not really affected by journal size/device combinations. In addition, we also saw that in some cases, ramdisks can really help performance, particularly as the number of processes is increased.
In comparing the metarates results to those of fdtree, we see much more variation in the metarates performance as a function of the journal size/device combination with metarates. This can useful in that it allows us to see performance differences as we tune the system. But one trend that metarates highlighted that fdtree did not, is that a ramdisk based journal can help performance particularly as the number of processes run against the file system increases. However, don’t rush out and start building file systems with ramdisk based journals just yet. There are a number of considerations that need to be addressed. But it does point out that something like an SSD could be a serious consideration for file system journals.
The final question I wanted to address is which benchmark is “better” – fdtree or metarates? Choosing one over the other is difficult, but if you forced me to pick one, I would pick metarates. Since it is MPI based it can put more pressure on a file system by running more processes. In addition, it appears as through you can discern performance differences for various “tweaks” than you can with fdtree. However, I must also say that I like fdtree because it is a simple shell script (infinitely portable) and it allows you to attack various directory/file scenarios which metarates cannot.
Jeff Layton is an Enterprise Technologist for HPC at Dell. He can be found lounging around at a nearby Frys enjoying the coffee and waiting for sales (but never during working hours).