SandForce 1222 SSD Testing, Part 1: Initial Throughput Results

SandForce has developed a very interesting and unique SSD controller that uses real-time data compression. This can improve performance (throughput) and extend the life of the SSD but it hinges upon the compressibility of your data. This article is the first part in a series that examines the performance of a SandForce 1222-based SSD and the impact of data compressibility.

The tests were run on the same system as previous tests but with a slightly different kernel. The highlights of the system are:


  • CentOS 5.4 with a 2.6.32 kernel
  • GigaByte MAA78GM-US2H motherboard
  • An AMD Phenom II X4 920 CPU
  • 8GB of memory (DDR2-800)
  • The OS and boot drive are on an IBM DTLA-307020 (20GB drive at Ultra ATA/100)
  • /home is on a Seagate ST1360827AS
  • Micro Center SandForce 1222, 64GB SSD. This is mounted as /dev/sdd
  • ext4 is used as the file system with the default options

For IOzone the system specifications are fairly important. In particular, the amount of system memory is important because this can have a large impact on the caching effects. If the problem sizes are small enough to fit into the system or file system cache (or at least partially), it can skew results. Comparing the results of one system where the cache effects are fairly large to a system where cache effects are not large, is comparing the proverbial apples to oranges. For example, if you run the same problem size on a system with 1GB of memory versus a system with 8GB you will get much different results.

With that in mind, the next section presents the actual IOzone commands used in the testing.

IOzone Command Parameters

As mentioned previously there are a huge number of options available with IOzone (that is one reason it is so popular and powerful). For this exploration, the basic tests are run are: write, re-write, read, re-read, random read, random write, backwards read, record re-write, strided read, fwrite, frewrite, fread, and refread.

One of the most important considerations for this test is whether cache effects want to be considered in the results or not. Including cache effects in the results can be very useful because it can point out certain aspects of the OS and file system cache sizes and how the caches function. On the other hand, including cache effects limits the usefulness of the data in comparison to other results.

For this article, cache effects will be limited as much as possible so that the impact of the data compressibility on performance can be better observed. Cache effects can’t be eliminated entirely without running extremely large problems and forcing the OS to eliminate all caches. However, it is almost impossible to eliminate the hardware caches such as those in the CPU, so trying to eliminate all cache effects is virtually impossible (but never say never). But, one of the best ways to minimize the cache effects is to make the file size much bigger than the main memory. For this article, the file size is chosen to be 16GB which is twice the size of main memory. This is chosen arbitrarily based on experience and some urban legends floating around the Internet.

Recall that most of the IOzone tests break up a file into records of a specific length. For example, a 1GB file can be broken into 1MB record so there are a total of 1,000 records in the file. IOzone can either run an automatic sweep of record sizes or the user can fix the record size. If done automatically IOzone starts at 1KB (1,024 bytes) and then doubles the record size until it reaches a maximum of 16 MB (16,777,216 bytes). Optionally, the user can specify the lower record size and the upper record size and IOzone will vary the record sizes in between. With a 16GB file size and a 1KB record size there are 1,000,000 records would be used for each of the 13 tests. The run times for this test are very, very long (days). Using our good benchmarking skills where each test is run at least 10 times, the total run time would be so large that, perhaps, only 1 benchmark every 2-4 weeks could be published. Consequently, to meet editorial deadlines (and you don’t want to be late for the editor), the record sizes will be larger. For this article, only four record sizes are tested: (1) 1MB, (2) 4MB, (3) 8MB, and (4) 16MB. For a file size of 16GB that is (1) 16,000 records, (2) 4,000 records, (3) 2,000 records, (4) 1,000 records. These record sizes and number of records do correspond to a number of applications so they do produce relevant results.

The command line for the first record size (1MB) is,

./IOzone -Rb spreadsheet_output_1M.wks -s 16G -+w 98 -+y 98 -+C 98 -r 1M > output_1M.txt


The command line for the second record size (4MB) is,

./IOzone -Rb spreadsheet_output_4M.wks -s 16G -+w 98 -+y 98 -+C 98 -r 4M > output_4M.txt


The command line for the third record size (8MB) is,

./IOzone -Rb spreadsheet_output_8M.wks -s 16G -+w 98 -+y 98 -+C 98 -r 8M > output_8M.txt


The command line for the fourth record size (16MB) is,

./IOzone -Rb spreadsheet_output_16M.wks -s 16G -+w 98 -+y 98 -+C 98 -r 16M > output_16M.txt


The options “-+w”, “-+y” and “-+C” define the how much “dedupability” there is in the data and buffers. In the above command lines “98″ means that the data can be deduplicated (compressed) 98%. Note that these options are only available with a later version of IOzone.

For this article I examine three levels of compression: (1) 98%, (2) 50%, and (3) 2%. The first level corresponds to data that is very compressible (98%). The last level corresponds to the data that is not very compressible (2%).

Building ext4

For the tests in this article I used CentOS 5.4 but I used my own kernel – 2.6.32. I was using this kernel for other testing, so I decided to stick with it for these tests. Following my own advice from a previous article, I did not partition the drive and used the whole thing for the test. I used the default options when building and mounting the ext4 file system.

[root@test64 ~]# mke2fs -t ext4 /dev/sdd


Please note that building ext4 in this fashion puts the journal on the SSD as well.

Results

The results are plotted using a bar chart to make them easier to compare. However, carefully examine the y-axis since the major and minor divisions are not the same for every graph.

The plots are of the average values with error bars representing the standard deviation based on 10 runs of the same test for the four record sizes. Each plot has three groups of four bars each with a different color. Each bar is a different level of “dedup”: 98% dedupable (very compressible), 50% dedupable, and 2% dedupable (very little compression). The legend tells you what color corresponds to what record size. Finally, each chart represents one of the 13 tests. The first 6 charts are for the write tests and the last 7 charts are for read tests.

The three levels of dedupability were chosen arbitrarily. I didn’t want to test 100% compressible data since that is a bit unrealistic, and I didn’t want to test something that is 100% incompressible (no compression) since that too is very uncommon. So I chose to test something close to those extremes (98% and 2%) and one in the middle (50% dedupability).

In the figures and commentary below, dedupability and compressibility are used somewhat interchangeably even though it can be confusing. For example, if data can be deduplicated very easily then it has a “high dedupable” percentage within IOzone. We also say that this data is very compressible or has a “high compressibility” percentage. In essence,


  • High Dedupability, High Compressibility
  • Low Dedupability, Low Compressibility

I apologize for the confusion and will try to use Dedupability and Compressibility to mean the same thing.

Figure 1 below is the write test for the four record sizes for the three levels of dedupability (compressibility).

iozone_write.png

Figure 1: Average Write Throughput (KB per second) for the Four Blocks Sizes for the Three Levels of Dedupability (Compressibility)

The first thing to notice is that as the level of compressibility decreases (this corresponds to a smaller level of “dedupability”) the performance goes down. For a block size of 1MB with very compressible data (very dedupable data), the write performance is quite good reaching about 260 MB/s. But at 50% dedupable data, the performance drops to about 128 MB/s (still faster than most 7,200 rpm hard drives) and at 2% dedupability, the performance only reaches a little over 97 MB/s (less than half the performance for very compressible data).

SandForce designs their controllers to compress data in real-time. If the data is very compressible you will get great performance. If the data has low compressibility, then the performance will drop.

Also notice in Figure 1 that for 98% dedupable data, the performance drops off for larger block sizes. But as the data becomes increasingly less compressible, the performance stays pretty much the same across the block sizes tested (1MB, 4MB, 8MB, and 16MB) for a fixed level of dedupability (compressibility). In other words as compressibility decreases, the write performance is about the same regardless of the record sizes over the range of block sizes tested.

Finally, notice that the standard deviation is fairly small for all of the tests except for the 8MB, and 16MB record sizes for 98% dedupable data. Moreover the standard deviations get very small as the data becomes less compressible.

Figure 2 below is the re-write test for the four block sizes for the three levels of dedupability (compressibility).

iozone_rewrite.png

Figure 2: Re-Write Throughput (KB per second) for the Four Blocks Sizes and for the Three Levels of Dedupability (Compressibility)

The general performance trend as the data becomes increasingly less compressible (left to right in Figure 3), is the same for the the rewrite test as the write test. For smaller record sizes and very compressible data (98% dedupable), the performance is quite good reaching about 262 MB/s for a 1MB record size. For 50% dedupable data, the performance for a 1MB record size is about 202 MB/s and for 2% dedupable data, the performance for a 1MB record size is about 97 MB/s.

As with the write performance, for the 8MB and 16MB record sizes at 98% dedupable data, the performance drops off a bit. But as the data compressibility decreases the performance is almost the same regardless of the record size.

Figure 3 below is the random write test for the four block sizes for the three levels of data dedupability (compressibility).

iozone_random_write.png

Figure 3: Random Write Throughput (KB per second) for the Four Blocks Sizes and the Three Levels of Dedupable Data (Compressibility)

Again, we see the same trend that for very compressible data (98% dedupable data), the performance is very high but drops as the data becomes less compressible (dedupable). The same general trends appear here as they do for the write and rewrite tests.

Figure 4 below is the record rewrite test for the four block sizes for the three levels of data dedupability (compressibility).

iozone_record_rewrite.png

Figure 4: Record Rewrite Throughput (KB per second) for the Four Blocks Sizes for the Three Levels of Data Dedupability (Compressibility)

What is interesting is that in this test, the performance for 50% dedupable data and 2% dedupable data is better than the 98% dedupable data (on average) but the standard deviation is quite large. This is the opposite of what is expected (decreasing performance as the data becomes less compressible). Recall that this test measures the performance when writing and re-writing a particular spot within a file.

Notice that there is a fairly sizable drop in performance when moving to the larger block sizes. Also notice there there is a fair amount of standard deviation relative to the average performance for the 1MB record size for the 50% dedupable and 2% dedupable tests.

Figure 5 below is the fwrite test for the four block sizes for the three levels of dedupable data (compressibility).

iozone_fwrite.png

Figure 5: Fwrite Throughput (KB per second) for the Four Blocks Sizes for the Three Levels of Dedupability (Compressibility)

The general observations from this test are the same as for the write test (Figure 1), and the rewrite tests (Figure 2).

Figure 6 below is the frewrite test for the four block sizes for the three levels of dedupability (compressibility).

iozone_frewrite.png

Figure 6: Frewrite Throughput (KB per second) for the Four Blocks Sizes and the Three Levels of Dedupability (Compressibility)

The general trends for this test are the same as for the write (Figure 1) and rewrite (Figure 2) tests.

Figure 7 below is the read test for the four block sizes for the three levels of dedupability (compressibility).

iozone_read.png

Figure 7: Read Throughput (KB per second) for the Four Blocks Sizes and for the Three Levels of Dedupable Data (Compressibility)

The best performance is for a record size of 1MB with 98% dedupable data (very compressible). The read performance reached an average performance of about 225 MB/s.

Interestingly, in contrast to the general trends for the write performance, the read performance does not really decrease much as the dedupability of the data is decreased (making it less compressible). Performance stays about the same and in fact, actually improves for larger block sizes and less dedupable data where it stays near 190 MB/s.

Figure 8 below is the re-read test for the four block sizes for the three levels of dedupable data (compressibility).

iozone_reread.png

Figure 8: Re-Read Throughput (KB per second) for the Four Blocks Sizes and for the Three Levels of Dedupability (Compressibility)

The absolute values of throughput are lower for the re-read case than the read case, but the general trends are the same.

Figure 9 below is the random read test for the four block sizes for the three levels of dedupable data (compressibility).

iozone_random_read.png

Figure 9: Random Read Throughput (KB per second) for the Four Blocks Sizes and for the Three Levels of Dedupable Data (Compressibility)

The general trends for random read is the same as for the read case (Figure 7) and the re-read case (Figure 8).

Figure 10 below is the backward read test for the four block sizes for the three levels of dedupable data (compressibility).

iozone_backward_read.png

Figure 10: Backward Read Throughput (KB per second) for the Four Blocks Sizes and for the Three Levels of Dedupable Data (Compressibility)

The general trends for random read is the same as for the read case (Figure 7) and the re-read case (Figure 8).

Figure 11 below is the stride read test for the four block sizes for the three levels of dedupable data (compressibility).

iozone_stride_read.png<

Figure 11: Stride Read Throughput (KB per second) for the Four Blocks Sizes and for the Three Levels of Dedupable Data (Compressibility)

The general trends for random read is the same as for the read case (Figure 7) and the re-read case (Figure 8).

Figure 12 below is the fread test for the four block sizes for the three levels of data dedupability (compressibility).

iozone_fread.png

Figure 12: Fread Throughput (KB per second) for the Four Blocks Sizes and for the Three Levels of Dedupable Data (Compressibility)

The general trends for random read is the same as for the read case (Figure 7) and the re-read case (Figure 8).

Figure 13 below is the freread test for the four block sizes for the three levels of dedupable data (compressibility).

iozone_freread.png

Figure 13: Freread Throughput (KB per second) for the Four Blocks Sizes, for Three Levels of Dedupable Data (Compressibility)

The general trends for random read is the same as for the read case (Figure 7) and the re-read case (Figure 8).


Summary

Comments on "SandForce 1222 SSD Testing, Part 1: Initial Throughput Results"

solanum

Can you rerun the performance tests with a 2.37.x kernel? They made a number of changes to the kernel in the block device layer, and I am wondering how much that impacts the performance. :)

Reply
    laytonjb

    Stay tuned! That is my plan for the last part in this series.

    The next part will cover initial IOPS performance. Part 3 will cover a more in-depth throughput study (and comparison to an Intel SSD). Part 4 will do the same more in-depth study and comparison but for IOPS. Then Part 5 will compare the 2.6.32 kernel to the latest kernel (probably 2.6.37 but maybe 2.6.38 if it comes out).

    Jeff

    Reply
pjwelsh

Ahh forget 2.6.37! Add the mainline kernel tracker repo with version 2.6.38 (currently) from the GREAT folks at ElRepo

Reply
sdanpo

Excellent article!

I liked the thoroughness of the test and the great data derived form it.
Looking forward to the coming parts.

Disclaimer : The comment is written by an Anobit Employee.
Anobit is an Enterprise SSD vendor with Data-pattern-Agnostic behavior.

Reply
storm

I’ve been reading about and testing SSDs for years and am finally leaving my first comment. I’m doing so because none of the benchmarks I’ve read test the achilles heal of SSDs which happens to be our production workload.

I would suggest doing a mixed random read/write workload with a 64GB file (full extend of the drive) with 4k write size that runs for a long time to arrive at steady state behavior, e.g., a day. When I was working with their engineers when beta’ing the FusionIOs IOdrive, they said this is the most toturesome workload they’ve ever seen. They had to make a number of changes to the driver for us as a result. Caches get quickly overwhelmed, wear leveling/grooming quickly gets pinned shuffling blocks around and can lead to huge periodic drops in performance unless they are amortized over time (SSDs are over provisioned under the hood to help with this), block aligning/elevator algorithms don’t help due to the randomness, the small IO size kills throughput, the mixed nature of the IOPS r/w (especially when done in parallel) can cause havoc with the rewrite algorithm, etc.. The dirty little secret in the industry is to quote inflated random IOPS performance using a file that is 1/4-1/3 the size of the drive.

Another surprise that we’ve found during testing is how drives perform as you increase the number of parallel read/write threads. With Fusion, for instance, it doesn’t make much of a difference positively or negatively. Virident’s tachIOn drive, however, tripled in performance! We were blown away. FYI this is the best SSD we’ve tested to date.

Ok, that was cathartic :) Thanks for letting me rant.

Thanks for the great article and I look forward to the rest.

Reply
detroitgeek

I have been looking at SSD to put my OS on, and plan on having my home directory on a standard drive. I worry about the lifetime of the SSD under these conditions because of all of the writing the OS does. My /var/ directory would also be on a standard drive. Is my concern realistic?

Reply
eoverton

I had issues with my drive going off-line randomly. Seems it was a bios issue. But which bios? see http://ssdtechnologyforum.com/threads/835-Sandforce-SSD-Firmware-Version-Confusion. So I ugraded my bios from Adata site. The drive does not have the issue anymore.

Reply
    laytonjb

    Was this the same MicroCenter drive that I tested? I was told it was an Adata drive but I haven’t been able to confirm that.

    What were the symptoms of the drive going off-line? What distro/kernel were you using?

    Thanks!

    Jeff

    Reply
      eoverton

      I was using windose at that time :(. The drive would go off-line and I would get BSD or sometimes would reboot and halt at “Could not find bootable drive”. I would power off, wait! Then power on, everything was then ok. I confirmed mine was Adata by small manual the drive came and googling. The issue did not look like it was OS related.

      Reply
venugopalan

This article has given very nice heads up on IOPS & SSD controllers.

Reply

Can you rerun the performance tests with a 2.37.x kernel? They made a number of changes to the kernel in the block device layer, and I am wondering how much that impacts the performance. :)FiberYes

Reply

Blog looks nice. I’m still trying to make a blog but it won’t be as professional as yours /: Keep on blogging :) pirater un compte facebook

Reply

SandForce 1222 SSD Testing, Part 1: Initial Throughput Results | Linux Magazine
[url=http://www.g09a0q587nsk20f8vm60d4725xrmul3cs.org/]uqdzymqwj[/url]
aqdzymqwj
qdzymqwj http://www.g09a0q587nsk20f8vm60d4725xrmul3cs.org/

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>