On-the-fly Data Compression for SSDs

The key to good SSD performance is the controller. One SSD controller that has received good reviews is the SandForce SF-1200. However, a recent test of a SF-1200 SSD reveals some interesting things about what this controller does and just how it does it. Depending upon your point of view and, radically, your data, performance can be amazing.

Introduction

I’ve been writing about SSDs for awhile now. It’s a cool technology that has great potential. However, given the universal law of TANSTAAFL (“There ain’t no such thing as a free lunch”) there are some trade-offs in the design of SSDs. One of the ways to better address some of the trade-offs is to put more compute power or more capability into the SSD controller. But even this idea is a trade-off (TANSTAAFL again).

Putting more capability into the controller allows the designers to improve many aspects of SSDs. It can reduce write amplification which improves the life of the SSD. It can provide performance improvements by various techniques and help data retention by checking for data problems. But all of this “gain” in capability comes at the price of a more complex controller probably adding cost. Is this trade a good one or a bad one?

But what if you could make the controller just a bit smarter and a bit more capable with very little increase in cost? Or what if you could devote some of the capability of existing controllers to something else? What would you do with that extra logic? Would you embed iSCSI on the controller and insert a network connection onto the SSD? Would you create an OSD (Object Storage Device) from the SSD? Would you devote the logic to a larger cache? Would you do more internal RAID-0 to improve throughput? Or would you do something different? (And, why am I writing a paragraph that is nothing but questions?)

The SandForce SF-1200, while appearing to be an ordinary SSD controller, has something very interesting in its design that falls into this category of having some capability that is used for something very unique that potentially can make your SSD performance just scream.

Testing of a SandForce SF-1200 Based SSD

A good friend named Dr. Joe Landman, who runs a great company named Scalable Informatics, posted some testing he did with a SandForce SF-1200 based SSD. While I don’t want to parrot everything he posted, the testing he did is definitely worth summarizing.

As a good benchmarker and system admin, Joe tested the performance of the SSD against the performance claims which state about 260 MB/s for reads and 260 MB/s for writes for SF-1200 based SSDs (these numbers are typically for sequential reads and writes). Joe uses a good benchmarking code named fio to test the drives. He ran a simple streaming write test and got about 65 MB/s (which are uncached which typically means the file is larger than the memory capacity of the test system to reduce the impact of OS caching effects). He then ran a streaming read test and got about 200 MB/s.

The read performance looked close to the listed performance but the write performance was alarming low compared to the listed performance. Perhaps it was the benchmark? So Joe tried simply using dd with zeros as the input. Joe then states,


So I tried a simple dd, which uses zeros. And I got the marketing rated speed.

Just to be sure, Joe tried Bonnie++ and says he got the same performance as the general claims for SF-1200 based drives.

At this point Joe has two tests that resulted in the claimed speed of the controller: Bonnie++ and dd (both using zeros as input data). So Joe tried a switch in fio that uses zeros instead of random data and he reran his initial streaming read and write tests. He achieved the reported speed of the SSD.

To check again, he took two of the drives and put them in a RAID-0 (lucky dog) and reran the same streaming test. He used the same benchmark binary, same command line (except for using zeros instead of random data), same file system, same mount point, and so on. In the first test he ran a new benchmark, io-bm, using zeros as the input data.

[root@localhost ~]# mpirun -np 1 ./io-bm.exe -n 10 -f /data/d1/big.file -b 1 -w -d -Z
Thread=00000: host=localhost.localdomain time = 24.305 s IO bandwidth = 421.317 MB/s
Naive linear bandwidth summation = 421.317 MB/s
More precise calculation of Bandwidth = 421.317 MB/s


Then he ran the exact same test but used random data as the input to io-bm.

[root@localhost ~]# mpirun -np 1 ./io-bm.exe -n 10 -f /data/d1/big.file -b 1 -w -d
Thread=00000: host=localhost.localdomain time = 88.818 s IO bandwidth = 115.292 MB/s
Naive linear bandwidth summation = 115.292 MB/s
More precise calculation of Bandwidth = 115.292 MB/s


What is causing the performance difference? The only differences in the two runs was that one used zeros and the other used random data.

The difference is from data compression. The SandForce SF-1200 is doing data compression on the fly before writing the data to the drive and does data expansion (uncompression) when reading.

Closer Look

In the design of the SandForce SF-1200 controller, some of the logic has been used for data compression/expansion and is in the direct data path of the drive. Whether the logic has been added to a typical controller or whether logic has been carved out from a typical controller is unknown. However, it is pretty obvious that that controller is doing compression on the fly.

The technology in the SandForce controller is proprietary so I can only speculate (guess) what is going on inside the controller. The uncompressed data comes into the drive (controller) and is likely to be stored in a special buffer or even the cache. Then the controller compresses the data in some fashion perhaps compressing the data chunks individually or by coalescing neighboring pieces of data together prior to compressing them or by planning data placement first even if it is non-contiguous, and then coalescing the data chunks and finally compressing them (or perhaps something completely different). Regardless of the approach, the controller has to spend some time and computational power to examine the data and compress it prior to writing the data to the storage media.

Then the controller takes the compressed data and places it on blocks within the SSD. More than likely these are blocks from a pool of erased (unused) blocks. However, to ensure that the drive is reporting the correct amount of data stored, presumably, the uncompressed size of the data is also stored in some sort of metadata format on the drive itself, perhaps within the compressed data. So if a data request comes into the controller with a request for the size of a particular data block, the actual size can be quickly reported.

However, remember that the amount of data being written to the NAND chips is smaller than the original data (except for the case where the data cannot be compressed). This means that the amount of time it takes to write the data is less and the apparent write throughput performance is better. However, the level of improvement is entirely dependent on the compressibility of the data.

During a read operation the compressed data is probably read into a cache, uncompressed, and then sent on to the OS. But again, the amount of data read is smaller which should take less time, so the apparent throughput performance is better. As with write operations, the level of performance improvement depends upon the compressibility of the data.

There are several important and fundamentally cool implications of compressing the data before it is stored on the SSD.

Performance
If we use Joe’s original numbers for a single drive, data that is almost infinitely compressible (just zeros), the throughput speed for reads and writes is about 260 MB/s. Very respectable performance. But, if the data is virtually incompressible (very random), then the throughput performance was 200 MB/s for reads and 65 MB/s for writes. That means that write performance varies by about a factor of 4 and read performance varies by about 30% depending upon the compressibility of the data.

Comments on "On-the-fly Data Compression for SSDs"

bugmenot3

Mr Layton,

Please I beg you, you must correct your rampant misuse of the innocent and worthy apostrophe. It is ssds, not ssd’s. The apostrophe means it is either a contraction, for example ssd is, or possessive, the ssd’s comic book collection. There is no apostrophe for the plural. We use ssd drives, not ssd drive’s. We have corn chips for a snack, not corn chip’s. You put on socks this morning, not sock’s. This article is several hundred words long, not word’s. I read several magazines, not magazine’s.

thank you, your high-performance article fan

Reply
    bryanjrichard

    It is ssds, not ssd’s.

    Unfortunately, the rules for constructing the plural form of an acronym aren’t quite as straightforward as one would assume. In fact, they can nearly be summed up as “whatever looks most correct and causes the least confusion.”

    While it may feel incorrect, CD’s is actually no less valid than CDs when constructing a plural. What matters is preference by the writer/editor and consistency.

    Frankly, I dislike the idea that we would use case to denote a plural when we deal with so much CamelCasing in this industry. Even some of the acronyms are CamelCased; SaaS, as an example.

    I can assure you we know the difference between plurals and possessives. If we used an apostrophe to denote a plural of SSD it was to reduce confusion. Not as a mistake or to incite a grammer riot. Clearly we failed at the latter.

    While the world seems content to try and batter our language into the space of 140 characters or less (dashed off as an afterthought from a smartphone), we’re trying to be a bit more thoughtful than that.

    Regardless, Jeff’s not really to blame for any of this. I am. Feel free to share your thoughts.

    Reply
      hemarcin

      Hi

      I’m not sure I agree.

      “While it may feel incorrect, CD’s is actually no less valid than CDs when constructing a plural. What matters is preference by the writer/editor and consistency.”

      Actually, the rules of English grammar (btw, it’s grammar, not grammer) state it clearly that apostrophes have nothing to do with marking the plural form of a noun. Hence, the form CD’s is unacceptable and it’s really nothing to do with a writer’s preferences.

      Martin

      Reply
laytonjb

My profuse apologies. I can explain why I do this but it’s a long and boring story confounded by English professors and technical marketing people getting confused in my mind (which can be an unusually busy place – at least in my mind).

Thanks for feedback – appreciated.

Jeff

P.S. I will update the article but updating posted articles takes some time and effort (i.e. it’s not easy).

Reply
rattusrattus

So SanDisk should be taken to task for false claims. Once manufactures are permitted to make performance claims of “it go this fast (down hill with a following wind etc.)” then there is no level playing field.

Now I have some experience with SSDs (note to bugmenot3: SSD is capitalised because it is a TLA and not a word) having designed several over the last 3 years, and you are quite correct in your summation that you can gain performance by compressing your data because of the time taken to physically write blocks to NAND. Of cause any good performance tester knows this, which is why you always use large pseudo random datasets (pseudo-random data sets are used to allow repeatable tests, and large because the data set must be bigger than all of the possible cache on the system – typicly this means 2 or 3 times the total RAM installed in a machine because the OS may well still have the file hanging around without the need to go and pull it back from backing store).

Wouldn’t it be nice to have manufactures to agree on a standard performance benchmark? The SD card association (memory cards are nothing more than removable SSDs after all) has attempted to achieve this by instigating card Classes. The Class of card gives the minimum sustainable transfer speed for either direction.

However this doesn’t go far enough because we have to take into account shuffling data around for static and dynamic wear levelling, invariably this means that when writing data to a previously full drive (full minus enough space for our test write file that is), the SSD must actually perform a minimum of 2 block writes and 1 block read for every block of data in the test file.

I am yet to find an SSD that gets anywhere near advertised write speed in such tests. Most don’t come within 50% of the advertised figure.

Regards

Reply
    laytonjb

    Let me try to state things differently. There are literally thousands of applications that all have different IO patterns. Trying to develop an SSD that handles all of them well is impossible. So you have to select a target set of applications or perhaps a target market. Then you have to select representative applications and understand their IO pattern. Then you can tune the SSD firmware to perform well on those patterns.

    You also couple this will the reality of running several applications at the same time which all may have different IO patterns.

    The downside is that the performance may stink for other applications outside what is being targeted (as well as different loads).

    So how do you market such a device? Do you tell the world that on these types of applications it will work fantastic? Do you tell the world that it will stink on other applications so don’t buy it?

    My guess is that SandForce did the best they could and chose a representative set of applications that was part of their target market. Then they report the performance based on the IO patterns and applications they targeted. I don’t think there is anything wrong with that at all – in fact it was I would have done as well.

    Benchmarks are a completely different subject. Do benchmarks represent real world applications? If so, which ones? Part of the difficulty is understanding the “compressibility” of the data for the applications versus the “compressibility” of the data in the benchmark. Are they comparable? Can you make any correlation between them? Is there a casual relationship?

    Lots of details in designing and testing SSDs. Personally I think the technology behind SandForce is really great. You now have the opportunity to change your applications (I tend to work with applications that have source or that I write myself) to allow them to run better with SandForce based SSDs. Think of it as tuning your application for a particular chip architecture. We’ve never before had the opportunity to tune and application for storage hardware (it’s usually for memory, processor, or NICs). I personally think this is pretty cool and applaud SandForce for doing this.

    Jeff

    Reply
gerhardk

I use btrfs with lzo compression on my SSD drive, that, is the compression is done before data goes into the controller.

In the case of the Sandforce controller, if I leave the compression to it, I wouldn’t gain the storage space that gets freed due to the controller compression, whereas I increase my storage volume with the file system compression. I prefer file system compression solution.

Regards

Reply
    laytonjb

    But there is a tradeoff in your approach. Allowing btrfs to compress the data for you uses CPU time whereas SandForce uses the processor on the SSD. If CPU time is important (it is in HPC), then you might want to offload the compression to the SSD.

    However if you do this you may not save any space since the SSD can’t report a variable capacity and btrfs allows you to save space before it hits the storage device. But at the very least, the SandForce controller also increases the “life” of the SSD depending upon your data.

    One other observation – there aren’t many file systems that do compression. If you rely on the file system then you are stuck with that file system. But SandForce based SSDs allow you to use any file system.

    It’s all a trade IMHO. The great thing that SandForce has done for us to give us some cool technology allowing us to do the things we like to do (more performance, more life in our SSD, etc). Pretty great idea IMHO.

    Jeff

    Reply
mikepurdie

Alternatively it could be implementing the ‘TRIM’ SATA command, also know as ‘write same’. This is a new command that basically allows the O/S to tell the SSD that it’s writing the same block again and therefore the SSD can discard it. It’s a bit like deduplication. It’s also partly supports zero page reclaim in file systems that support it as a page of zero bytes is considered to be reclaimable and doesn’t need to be written.

What happens is that the SSD doesn’t have to do the read page/update block/write page cycle that it would normally have to do. It can simply mark a block as being empty.

Have a look at http://en.wikipedia.org/wiki/TRIM

Reply
leo_fischer

nice discovery? Somebody has been living under a rock.
The fact that the sandforce controller compresses the data in the controller has been clearly explained since the earliest reviews of the drived.

Reply
    joe landman

    @Leo

    The question I was asking in the original work was, what sort of bandwidth should we expect … how far off the marketing numbers should we expect performance to be?

    Remember, the marketing numbers call out very firm, very specific bandwidths. What we don’t see are the caveats that should go along with this. That is, we know that in a best case scenario, we should get “X”. So … how hard is it to get there?

    Jeff did an excellent job of introducing the info and discussing the tradeoffs.

    Reply
jlinton

I have a drive with one of these controllers, this was one of the first things I checked (compressed vs uncompressed data performance).

I’m not sure his results are 100% valid. What he is describing sounds more like a fragmentation problem caused lack of TRIM/secure erase in his setup.

I’m betting that if he secure erases the drive, creates/mounts a FS, and reruns the test, the performance between compressed and uncompressed will be similar until he reaches the native capacity of the drive. Then he needs to run wiper.sh to restore it.

Basically, once the drive becomes full/fragmented the write performance goes to hell. This fact is masked by writing 100% compressible data. He should rerun the tests with 2:1 data. I’m betting his numbers are still going to be significantly less than the advertised drive speeds until he wipes the drive. Once he wipes the drive and runs wiper.sh (the linux TRIM workaround) on a frequent basis the performance will stay steady regardless of the compression ratio.

Reply
cweberusa

For me these discussions miss the boat almost entirely. The real breakthrough with SSDs is their low latency, two orders of magnitude lower than for hard drives. Also, in my tests with a Sandforce-based SSD and an Intel G2 SSD the latency was much more consistent and predictable than for my various 7200 rpm SATAII drives. All told, SSDs address the age old issue of slow disk vs. fast CPU and memory at the latency level, and as a result they universally speed up computational work regardless of the actual workload (except for very long running number crunching jobs). Everyone around me who has started to use an SSD won’t ever give it back, and it is not due to the overall read or write throughput- although we can quibble all day about nuances of such, it’s because of the much improved responsiveness of the system. We finally have a semblance of balanced systems back in our hands.

Reply
jc2it

So if my controller gets ruined I would need to ensure I purchase the exact controller in order to ensure reliable access my data. Sounds like YAFOVLI Yet Another Form Of Vendor Lock-in. Hmmm, perhaps it might be even necessary to purchase a spare for the shelf from the get go.

Reply
    rattusrattus

    jc2it
    No not a lock in. SSDs use the same interface as their hard disk counterparts. The controllers we are talking about are on the SSD, just like there is a controller on every hard disk…

    Reply

This is true, but of course the lack of real support is true of many of their products.ring die

Reply

I tried to like Openfiler but I couldn’t take the package manager. Conary. Why didn’t they use rpm or debian?biomass stove

Reply

Useful info. Lucky me I found your web site accidentally, and I am shocked why this accident did not came about earlier! I bookmarked it.

Reply

I like the valuable info you provide in your articles.
I’ll bookmark your blog and check again here frequently. I am quite sure I’ll learn lots of new
stuff right here! Best of luck for the next!

Reply

I’m more than happy to uncover this site. I wanted to thank you for ones time due to this wonderful read!! I definitely loved every part of it and I have you book-marked to look at new things in your site.

Have a look at my blog … Fatima

Reply

Amazing! I’m genuinely enjoying the design of your blog. Are you using a custom theme or is this readily available to all individuals? If you don’t want to say the
name of it out in the public, please be sure to e-mail me at:
rudy.parkinson@web.de. I’d love to get my hands on this template! Cheers.

Reply

Manganese ascorbate is introduced to improve the
achievement of glucosamine and aid in the healing of cartilage.
Veterinarians often recommend treating the pain of arthritis with an aspirin.

Nothing could be further from the truth.

Reply

Hello this is kinda of off topic but I was wondering if blogs use WYSIWYG editors or
if you have to manually code with HTML. I’m starting a blog soon but have no coding experience so I wanted to get advice from someone with experience. Any help would be enormously appreciated!

My homepage – wedding makeup

Reply

An intriguing discussion is definitely worth comment.
I do believe that you need to write more about
this issue, it may not be a taboo matter but typically people do not talk about such
subjects. To the next! Kind regards!!

Reply

Hi there, i read your blog occasionally and i own a similar one and
i was just wondering if you get a lot of spam remarks? If
so how do you protect against it, any plugin or anything you can recommend?
I get so much lately it’s driving me crazy so any help is very much appreciated.

Reply

You are so interesting! I do not suppose I’ve read through something like this before. So wonderful to find another person with a few original thoughts on this issue. Really.. thanks for starting this up. This web site is one thing that’s needed on
the web, someone with some originality!

Also visit my weblog – quickest way to learn spanish

Reply

Hi there just came upon your website from Bing
after I entered in, “On-the-fly Data Compression for SSDs | Linux Magazine” or perhaps something similar (can’t quite remember exactly). In any case, I’m grateful I found it because your subject material is exactly what
I’m searching for (writing a college paper) and I hope you don’t mind if I collect some information from here and I will of course credit you as the source.
Thanks.

Reply

I know this if off topic but I’m looking into starting my own blog and was curious
what all is needed to get setup? I’m assuming having a blog like
yours would cost a pretty penny? I’m not very web smart so I’m not 100% positive.
Any suggestions or advice would be greatly appreciated.
Thank you

Reply

Leave a Reply to Zulma Cancel reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>