The key to good SSD performance is the controller. One SSD controller that has received good reviews is the SandForce SF-1200. However, a recent test of a SF-1200 SSD reveals some interesting things about what this controller does and just how it does it. Depending upon your point of view and, radically, your data, performance can be amazing.
I’ve been writing about SSDs for awhile now. It’s a cool technology that has great potential. However, given the universal law of TANSTAAFL (“There ain’t no such thing as a free lunch”) there are some trade-offs in the design of SSDs. One of the ways to better address some of the trade-offs is to put more compute power or more capability into the SSD controller. But even this idea is a trade-off (TANSTAAFL again).
Putting more capability into the controller allows the designers to improve many aspects of SSDs. It can reduce write amplification which improves the life of the SSD. It can provide performance improvements by various techniques and help data retention by checking for data problems. But all of this “gain” in capability comes at the price of a more complex controller probably adding cost. Is this trade a good one or a bad one?
But what if you could make the controller just a bit smarter and a bit more capable with very little increase in cost? Or what if you could devote some of the capability of existing controllers to something else? What would you do with that extra logic? Would you embed iSCSI on the controller and insert a network connection onto the SSD? Would you create an OSD (Object Storage Device) from the SSD? Would you devote the logic to a larger cache? Would you do more internal RAID-0 to improve throughput? Or would you do something different? (And, why am I writing a paragraph that is nothing but questions?)
The SandForce SF-1200, while appearing to be an ordinary SSD controller, has something very interesting in its design that falls into this category of having some capability that is used for something very unique that potentially can make your SSD performance just scream.
Testing of a SandForce SF-1200 Based SSD
A good friend named Dr. Joe Landman, who runs a great company named Scalable Informatics, posted some testing he did with a SandForce SF-1200 based SSD. While I don’t want to parrot everything he posted, the testing he did is definitely worth summarizing.
As a good benchmarker and system admin, Joe tested the performance of the SSD against the performance claims which state about 260 MB/s for reads and 260 MB/s for writes for SF-1200 based SSDs (these numbers are typically for sequential reads and writes). Joe uses a good benchmarking code named fio to test the drives. He ran a simple streaming write test and got about 65 MB/s (which are uncached which typically means the file is larger than the memory capacity of the test system to reduce the impact of OS caching effects). He then ran a streaming read test and got about 200 MB/s.
The read performance looked close to the listed performance but the write performance was alarming low compared to the listed performance. Perhaps it was the benchmark? So Joe tried simply using dd with zeros as the input. Joe then states,
So I tried a simple dd, which uses zeros. And I got the marketing rated speed.
Just to be sure, Joe tried Bonnie++ and says he got the same performance as the general claims for SF-1200 based drives.
At this point Joe has two tests that resulted in the claimed speed of the controller: Bonnie++ and dd (both using zeros as input data). So Joe tried a switch in fio that uses zeros instead of random data and he reran his initial streaming read and write tests. He achieved the reported speed of the SSD.
To check again, he took two of the drives and put them in a RAID-0 (lucky dog) and reran the same streaming test. He used the same benchmark binary, same command line (except for using zeros instead of random data), same file system, same mount point, and so on. In the first test he ran a new benchmark, io-bm, using zeros as the input data.
[root@localhost ~]# mpirun -np 1 ./io-bm.exe -n 10 -f /data/d1/big.file -b 1 -w -d -Z
Thread=00000: host=localhost.localdomain time = 24.305 s IO bandwidth = 421.317 MB/s
Naive linear bandwidth summation = 421.317 MB/s
More precise calculation of Bandwidth = 421.317 MB/s
Then he ran the exact same test but used random data as the input to io-bm.
[root@localhost ~]# mpirun -np 1 ./io-bm.exe -n 10 -f /data/d1/big.file -b 1 -w -d
Thread=00000: host=localhost.localdomain time = 88.818 s IO bandwidth = 115.292 MB/s
Naive linear bandwidth summation = 115.292 MB/s
More precise calculation of Bandwidth = 115.292 MB/s
What is causing the performance difference? The only differences in the two runs was that one used zeros and the other used random data.
The difference is from data compression. The SandForce SF-1200 is doing data compression on the fly before writing the data to the drive and does data expansion (uncompression) when reading.
In the design of the SandForce SF-1200 controller, some of the logic has been used for data compression/expansion and is in the direct data path of the drive. Whether the logic has been added to a typical controller or whether logic has been carved out from a typical controller is unknown. However, it is pretty obvious that that controller is doing compression on the fly.
The technology in the SandForce controller is proprietary so I can only speculate (guess) what is going on inside the controller. The uncompressed data comes into the drive (controller) and is likely to be stored in a special buffer or even the cache. Then the controller compresses the data in some fashion perhaps compressing the data chunks individually or by coalescing neighboring pieces of data together prior to compressing them or by planning data placement first even if it is non-contiguous, and then coalescing the data chunks and finally compressing them (or perhaps something completely different). Regardless of the approach, the controller has to spend some time and computational power to examine the data and compress it prior to writing the data to the storage media.
Then the controller takes the compressed data and places it on blocks within the SSD. More than likely these are blocks from a pool of erased (unused) blocks. However, to ensure that the drive is reporting the correct amount of data stored, presumably, the uncompressed size of the data is also stored in some sort of metadata format on the drive itself, perhaps within the compressed data. So if a data request comes into the controller with a request for the size of a particular data block, the actual size can be quickly reported.
However, remember that the amount of data being written to the NAND chips is smaller than the original data (except for the case where the data cannot be compressed). This means that the amount of time it takes to write the data is less and the apparent write throughput performance is better. However, the level of improvement is entirely dependent on the compressibility of the data.
During a read operation the compressed data is probably read into a cache, uncompressed, and then sent on to the OS. But again, the amount of data read is smaller which should take less time, so the apparent throughput performance is better. As with write operations, the level of performance improvement depends upon the compressibility of the data.
There are several important and fundamentally cool implications of compressing the data before it is stored on the SSD.
If we use Joe’s original numbers for a single drive, data that is almost infinitely compressible (just zeros), the throughput speed for reads and writes is about 260 MB/s. Very respectable performance. But, if the data is virtually incompressible (very random), then the throughput performance was 200 MB/s for reads and 65 MB/s for writes. That means that write performance varies by about a factor of 4 and read performance varies by about 30% depending upon the compressibility of the data.