Ramdisks can offer a level of performance that is simply amazing. More than just a tool for benchmarking, there are new devices that utilize ramdisks for a bit of the ultra-performance.
Uses for Ramdisks
As mentioned previously, when the subject of ramdisks comes up many sysadmins won’t even engage in a discussion because of the problem that when the power is shut off all data on the ramdisk is lost. So what good are ramdisks? The answer is that there are very good for certain applications.
One thing that is obvious and fairly easy to do, is to add a battery backup to the system that is using a ramdisk. Then you create a script to copy the data from the ramdisk to a hard drive or flash drive before shutting down. Alternatively, you can create a script to copy the data from the hard drive to the ramdisk when the system starts up. You have to do some testing to make sure that the battery capacity is enough to give the system enough time to copy the data from the ramdisk to the hard drive. In addition, you have to make sure the battery is charged at any time (you can always connect several batteries to prevent a single point of failure). If this can be accomplished then you can use a ramdisk for almost any application. An example of this is an article that describes how to customize ramdisks for CentOS. It provides some simple scripts for setting up the ramdisk and copying data to/from permanent storage.
With a ramdisk you will get performance that is at least an order of magnitude greater for throughput compared to a hard drive. Moreover, typically ramdisks have at least two order of magnitude more IOPS performance than a hard drive. But for some storage devices that are RAM based, the difference in performance can reach four orders of magnitude!
One of the most common applications for ramdisks are for databases. Databases do a great deal of IO, particularly IOPS. Ramdisks are a perfect compliment to a database and in fact, almost all database benchmarks you will read about, use one. Just do a Google search on MySQL and ramdisk and you will see a great number of entries. However, the trick with using a ramdisk with a database is to make sure that the data contained on the ramdisk is protected by a battery system and that it is copied to a permanent storage solution in the event of a power loss.
In a similar fashion, one could use a ramdisk for storing web pages for a web server. It’s very easy to do for static data that does not change. You have to make sure you have a copy of the data on permanent media and copy the data to the ramdisk before starting the web server. If you use a ramdisk for web data that is not static, then be sure you copy the data to a permanent media and that you have a battery backup for the ramdisk.
A second application of a ramdisk is for applications where the data is “scratch” data. In this case, scratch data is just used during the operation of an application and does not need to be kept after the application is done. There are a number of applications in the HPC or technical space that fulfill this requirement. But the most common one that every one uses every day, is a web browser. The cache for a web browser is just temporary data that is intended to help web browsing performance. There are some articles such as those here, here, here, and here that discuss how to configure Firefox to use a ramdisk based cache. Since Firefox uses SQLite, which can be rather IO intensive, the articles describe how to configure the browser to use a ramdisk as a cache.
Another fairly common application of a ramdisk is to use one on a netbook that have limited memory but even more limited storage performance. For example, articles here and here describe how to use ramdisks to help with IO performance in netbooks. In addition, you can use one to help reduce the number of writes and rewrites to SSD drives improving the life of the drive.
One common area that ramdisks are encountered is HPC. In a cluster there are hundreds of compute nodes and trying to maintain a common consistent image across the nodes can become a problem. In the case where the OS is installed on a drive in the node you have to worry about a consistent image across all nodes. If a node is down while all other nodes are updated, then you have to make sure that the node is down is updated to match. This is referred to as “OS skew” and can become a big problem in clusters.
Cluster tools such as Scyld Clusterware or Perceus can create a small ramdisk and install a stripped down OS (perhaps 100-200 MB). Then when the power is lost on the node or it is rebooted, it has to go back to the master node to get a new image. This forces the nodes to get the latest image when they reboot and ensures that it is the same image for all nodes. This process is usually referred to as “diskless provisioning”. In addition, if the cluster has to be reimaged, it can be done much faster than installing an entire OS on the hard drive in each node.
Ramdisk Storage Devices
The performance of a ramdisk is far away better than any hard drive (or even hundreds of hard drives). To harness this performance, there are some companies that have created RAM based storage devices. These devices use RAM as the storage media and they are either attached to host systems via a SAS/SATA cable, or via a PCIe connector, or are attached directly to a network or a SAN.
Probably the highest-end RAM based storage box is from Texas Memory Systems. There are three primary models. The RamSan 300 that has a 16-32GB capacity in a 3U box with a performance of 1.5 GB/s and 200,000 IOPS. It also has Fibre Channel or InfiniBand connections. The mid-range box, the RamSan 400 has 32-128 GB of storage in a 3U box with a performance of 3.0 GB/s and 400,000 IOPS. It too has either Fibre Channel or InfiniBand ports. The top model, the RamSan 440 has 128-256 GB of capacity with a performance of 4.5 GB/s and 600,000 IOPS. It is a 4U box that also has Fibre Channel connections.
All three models have battery backups inside the box. In the event of the loss of power, the units will copy their data to either on-board flash storage or hard drives. The systems are designed to dump data very quickly to the storage and copy it back to the RAM storage when power is restored.
Violin Memory has a new RAM based storage unit, the Violin 1010, that can have up to 500GB in a 2U box. It is capable of 1.7 GB/s bandwidth and 3,000,000 IOPS. It connects to one or two host systems via a PCIe connector. So to the OS it appears as a PCIe device that Violin provides a driver so you can create a file system on it. Each PCIe port has a 1.7 GB/s throughput capability so the sum is around 3.4 GB/s bandwidth.
However, these devices are rather expensive (really expensive). But there have been two devices that are designed for desktops. The first one is the Gigabyte i-RAM PCI ramdisk card. It has 4 DIMM slots (DDR DIMMs up to 400 MHz) and can accommodate up to 4GB total (1GB DIMMs). It connects to the motherboard via a 1.5 Gb/s SATA connector. It has a battery backup on the card that can last 4-16 hours depending upon the configuration and the motherboard. It is capable of about 12,500 IOPS. But it only works with certain chipsets.
More recently, Acard has released the ANS-9010 which is a unit resembling a CDRW that fits into a 5.25 inch drive bay. There are a number of reviews of the unit, here, here, here, and here.
The unit has 8 DDR2 DIMM slots and a battery backup. It accommodate up to 8GB DDR2-800 DIMMs for a total of 64GB (maximum capacity). It also has two SATA ports (3.0 Gb/s) that can be used. If two ports are used, the capacity is split between the two ports (you can easily create a RAID-0 across the two ports).
Figure 1 below is an image of the unit without the cover courtesy of Tech Report.
Figure 1 – Acard ANS-9010 SATA-to-DDR2 Storage Device
It has a battery (7.4V, 2400 mAh) inside the unit for supporting the DIMM storage when the system power fails (similar to the Texas Memory units). The box also contains a Compact Flash (CF) slot. The CF card can be used to backup up the RAM-based storage or restore data to the storage at the press of a button (see Figure 2 below). In a recent review the author tested a 16GB unit and the battery lasted more than four hours.
Figure 2 – Front of Acard ANS-9010
There are also indicator lights on the front of the unit to tell you the status including ghe battery capacitance. Also notice that the button on the right allows you to dump the data from the ramdisk to the CF card. The button on the left allows you to restore data from the CF to the ramdisk (be sure not to confuse them). You can also pop out the CF card.
The Acard ANS-9010 also has two SATA ports each rated at about 300 MB/s that allow the unit to split the capacity between the two ports (with two ports, you can split the storage and use software RAID-0 if you want even higher performance). Acard says that you can achieve 400 MB/s using both ports. Also, Acard says you can achieve up to 130,000 IOPS per SATA port. That makes for a fairly inexpensive, extremely high performance, storage device.
Ramback – The Future of Ramdisks?
It is fairly obvious that the the biggest problem with ramdisks is that when the power is lost, the data disappears. Consequently, batteries are needed to maintain power to the ramdisk and software logic is added to dump the contents of the ramdisk to some sort of permanent storage such as a hard drive or a flash drive. However, it can take a fair amount of time to dump the entire contents of the ramdisk to the permanent storage on a power outage and restore it when the power comes back on line. This increases the size of the battery (adding cost as well).
Another approach to overcoming the limitations of ramdisks is to perhaps do a continuous dump of the ramdisk to a storage device. Daniel Phillips recently, posted a patch for a new virtual device called Ramback that has the ability to back a ramdisk with a real block device (i.e. a hard drive) continuously. Of course, if power is lost there is still a time period where the file system on the backing store device is not in a consistent state matching the ramdisk. But this window is much much smaller than the case where the backing store has none of the data from the ramdisk. Consequently, the battery can be much smaller.
Lwn posted an article about the patch and there was a great deal of discussion about it. One of the fundamental concerns was about the assumption of the reliability of the battery, the file system, and Linux. The combination has to work correctly or you might lose data. Some people are resistant to the combination working correctly but as with everything in life there is a trade-off in this situation. On the plus side is the tremendous
performance from ramdisks. On the minus side is the possibility of the lose of data in the event of a power failure. The choice is up to you of course, but given the explosive growth in IO requirements this is definitely something that should be tested.
Ramdisks are typically a technology that people love or hate. They can provide magnitudes more performance than hard drives (or hundred of hard drives). But with great performance comes some additional requirements. Systems using ramdisks where the data is important need a battery backup on the system along with scripts to copy the data from the ramdisk to permanent media in the event of a power failure. Some admins do not like to depend upon the combination of battery backup and scripts to ensure that no data is lost. However, the temptation of unbelievable performance is always there.
Linux has two primary ramdisk file systems, RamFS and tmpfs. RamFS is very easy but it has no file system limit, so it’s possible to exhaust memory and force the OS to lock. Consequently, RamFS should only be used by root and not users.
The second ramdisk file system is tmpfs. It is based on RamFS but adds file system size limits and can also used swap space. Consequently, it is a more desirable file system for users than RamFS.
Finally, there are some efforts to create the next generation of file systems for RAM based devices. Daniel Phillips has created a set of patches called Ramback. These patches allow a RAM based storage device to use a permanent storage device as a backing store. Ramback continually updates the backing store so that if power is lost, the window of time where the file system on the backing store is not consistent with the RAM based file system. Consequently, the battery requirements are fairly small and in the event of an extreme set of failures (e.g. battery, software, OS), a good portion of the data is retained on the backing store device. Without something like Ramback, all data is lost.
Despite the additional hardware requirements for supporting ramdisks and the potential for losing data if the hardware/software combination fails, ramdisks can be considered serious storage devices. There are companies selling extremely high performance RAM based storage devices. There are also consumer level RAM based storage devices that give performance levels that are amazing.
Ramdisks are another quiver in your arsenal of file system and storage choices to match application requirements or to solve problems. Don’t be afraid to consider them, but be sure you understand them and their implications.