Caching is a concept used through computing. CPUs have several levels of cache; disk drives have cache; and the list goes on. Adding a small amount of high-speed data storage relative to a large amount of slower-speed storage can make huge improvements to performance. Enter two new kernel patches -- bcache and flashcache -- that leverage the power of SSDs.
The hard drives we use today are very complex beasts. Cramming that much data onto a very small space while insuring that the data will be there when we need it, is not something easy or simple. The pace of data density increase in hard drives has been nothing short of amazing. At the same time, we all want faster and faster performance from our drives. To help improve hard drive performance, manufacturers introduced a small amount of RAM to the drive. It can be used to hold data for a period of time before writing it to disk, allowing the drive to tell the operating system that the data is actually on the disk, improving application performance. It can also be used to pre-fetch data for reading. In either case, a small amount of RAM can make a very large improvement in performance. If you don’t believe me, run some benchmarks with the disk cache turned on and then off and compare the two results.
SSD’s (Solid State Disks) are a competitor to hard drives that have become much more mainstream in the last few years. SSD’s use a different storage technology, floating-date transistors, to store data. They can be very, very fast relative to hard drives but are much smaller in capacity and much more expensive than hard drives. At the same time, even SSD manufacturers have added RAM caches to SSD’s to improve performance.
If we rank the performance of general storage media from fastest to the slowest we usually end up with something like the following.
- SSD’s with RAM caching
- SSD’s with no RAM caching
- Hard drive with RAM cache
- Hard drive with no RAM cache
At the same time if you ranked the most expensive storage media from the most expensive to the cheapest, the ranking would look exactly the same as the performance ranking. That is, RAM is the most expensive for the same capacity, SSD’s with RAM caching is next, and so on. So while we would like to use RAM to store all of our data, it would just be too expensive (never mind the headache of keeping the power on all of the time). So then we move down to SSD’s. It would be lovely to store all of our data on SSD’s but they have limited capacity and are very expensive relative to hard drives. So then we move to hard drives and we arrive at today’s situation – very large capacity hard drives but with low performance.
What people have realized is that using small amounts of RAM on drives as a caching mechanism has led to much better overall drive performance (once again – try running a hard drive with caching turned off to understand the impact of the cache on performance). The largest drives today only have about 64MB of RAM cache on very large drives (usually 500GB and up). But the amount of RAM is still fairly small compared to the capacity of the drive.
But with the advent of mainstream SSD’s people have realized that SSD’s have much better performance than disk drives while having more capacity than RAM for an equivalent price. So perhaps SSD’s can also be used in some caching role to help performance. The performance difference between SSD’s and hard drives can be very large, as is the price, but if you could take a 32GB or a 64GB SSD and use it to cache a 2TB or many-TB disk array, perhaps you could get a performance improvement that would justify the price of the SSD.
The article talks about two new kernel patches that allow you to use SSD’s as disk caches. Generically these approaches are block caching since really they are caching block devices which are typically hard drives (but don’t have to be). The first patch is called bcache, and the second patch is called flashcache.
Bcache takes one block device, preferably a SSD based device, and uses it to cache another block device, typically a hard drive. Hence the name, “block cache” or bcache. It implicitly assumes that the caching device is SSD based where it could be a single device or a RAID-array of SSD devices (e.g. RAID-0 array). This assumption affects how bcache works.
Recall from past articles, that SSD’s erase data in blocks. Even if a single cell in the block needs to be erased, the entire block is erased by first copying the data from the block that is not being erased to some other storage, then erasing the entire block, and finally copying the saved data back to the original block. Current SSD’s, for the most part, have controllers that use techniques to avoid this dramatic sequence of events, to keep performance as high as possible. But fundamentally, erasing always takes place on a block level.
Bcache is designed to work with buckets of data, as it refers to them, that are block sized. Even better, it fills them sequentially, so that random writes never really happen. Bcache will start using these buckets in sequence to store data. If the buckets don’t cache data any more, they are marked for deletion. Then a “lazy” garbage collection process will erase the marked buckets. Remember that since the buckets are block sized the erasing is very efficient (if a complete bucket is marked for erasing).
Even though bcache is a “cache”, it is caching a file system so it has to keep track of data in much the same manner as a file system. Bcache uses a btree to track the cached data. The data structure is also designed to use the previously mentioned garbage collection to clean up stale pointers in the data structure as well as freed buckets that no longer cache the latest data on the cached block device.
There is another disk or block device caching patch in the wild. This patch, called flashcache, was actually developed by Facebook to help them scale the performance of InnoDB/MySQL which is used by Facebook. Flashcache is somewhat similar to bcache in that it is a write back cache concept for block devices. It also assumes that the caching device is an SSD or a RAID array of SSD devices such as RAID-0.
As with bcache, flashcache assumes that data is cached on block boundaries to line up with the block size of SSD’s. It too refers to these as “buckets of data”. But flashcache is different from bcache in how it implements the caching.
flashcache is built using the Linux kernel Device Mapper (DM). Even if you’ve never heard of DM before if you are using software RAID or LVM then you are using DM. Basically DM is a way of mapping one block device onto another by taking data passed to it from a virtual block device that the DM provides and passing it to another block device. For example, if you use LVM, the DM provides, among other things, the mapping between the VG’s and PV’s, and ultimately the block devices.
Then flashcache uses a “set associative hash” to cache data for the drives it is associated with. Without diving into details, the README associated with the patch says that this approach allows very simple invalidation of cache blocks to help improve performance. That is, it helps identify and flag buckets that are no longer needed and can be erased by garbage collection.
Summary and Next Steps
This article is really an introduction into the concept of using SSD’s for caching hard drives. In particular, it mentions two current patches that enable SSD caches for disks – bcache and flashcache. These are still patches that have not been incorporated into the kernel and are under development and testing. Given the popularity of SSD’s and the mad desire for more storage performance (who doesn’t want more performance after all) and despite of their limitations in terms of price and capacity, it makes sense for SSD’s to be used as caches for spinning media. These two patches represent what appears to be a push for using SSD’s to cache block devices (really it’s using block devices to cache block devices).
However, just a word of caution. If you expect to use these patches with an SSD and a disk and get immediate almost SSD-like performance you may be disappointed. The effectiveness of caching hard disks using SSD’s depends upon a number of factors, but the IO pattern of the application is the biggest driver. If the application is doing mostly random data access (read or write), then it is unlikely that these two patches will help. They are not really designed for that. The good news is that not many applications exhibit truly random IO behaviour. There is always some underlying pattern in the IO of the application. The question becomes, does this application have an IO pattern that can take advantage of the SSD caching?
Also remember that these are patches so they are under heavy development and testing so your performance may vary and there is even the possibility of data loss. For example, the current version of flashcache as described in the README, has a “torn page problem”.
“It is important to note that in the first cut, cache writes are non-atomic, ie, the “Torn Page Problem” exists. In the event of a power failure or a failed write, part of the block could be written, resulting in a partial write. We have ideas on how to fix this and provide atomic cache writes (see the Futures section).”
This isn’t to say that flashcache is irrevocably broken, just that it is in development and has known problems.
But one of the coolest things is that if you try these patches, you have the opportunity to influence a patch that goes into the kernel. By running applications, benchmarks, etc., using these patches, and providing good constructive feedback to the developers, you can help them tweak/change/alter the patch for better performance for applications that you care about. That is not to be underestimated by any stretch.
So in future articles I will be testing both bcache and flashcache using the benchmarks that I have been using throughout my articles. I will be using IOzone for testing both throughput and IOPS, and I will be using metarates for testing metadata performance. Plus, don’t forget our good testing skills.
Keep an eye out for results in future articles. I think the results will be interesting and fun.
Jeff Layton is an Enterprise Technologist for HPC at Dell. He can be found lounging around at a nearby Frys enjoying the coffee and waiting for sales (but never during working hours).