<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Size Can Matter: Would You Prefer the Hard Drive or the Ramdisk this Evening? Part 3</title>
	<atom:link href="http://www.linux-mag.com/id/7682/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.linux-mag.com/id/7682/</link>
	<description>Open Source, Open Standards</description>
	<lastBuildDate>Sat, 05 Oct 2013 13:48:18 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
	<item>
		<title>By: genghiskhat</title>
		<link>http://www.linux-mag.com/id/7682/#comment-7760</link>
		<dc:creator>genghiskhat</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7682/#comment-7760</guid>
		<description>&lt;p&gt;In your summary charts, shouldn\&#039;t the 6th bar be 256 MB, not 64?
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>In your summary charts, shouldn\&#8217;t the 6th bar be 256 MB, not 64?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ironarmadillo</title>
		<link>http://www.linux-mag.com/id/7682/#comment-7761</link>
		<dc:creator>ironarmadillo</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7682/#comment-7761</guid>
		<description>&lt;p&gt;Don\&#039;t you think you should go ahead and add SSDs to this comparison?  After all, they are growing in storage and performance.  And you mentioned them as a future player in this article.  Just because you didn\&#039;t start out with an SSD in these articles doesn\&#039;t mean you can\&#039;t add it now.  I know when I\&#039;ve read these articles I was wondering where an SSD fit into these comparisons.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Don\&#8217;t you think you should go ahead and add SSDs to this comparison?  After all, they are growing in storage and performance.  And you mentioned them as a future player in this article.  Just because you didn\&#8217;t start out with an SSD in these articles doesn\&#8217;t mean you can\&#8217;t add it now.  I know when I\&#8217;ve read these articles I was wondering where an SSD fit into these comparisons.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ctryon</title>
		<link>http://www.linux-mag.com/id/7682/#comment-7762</link>
		<dc:creator>ctryon</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7682/#comment-7762</guid>
		<description>&lt;p&gt;Isn\&#039;t putting the journal on something like a ramdisk sort of antithetical to the whole idea of a journaling file system, where one of the main purposes is to make the file system more robust in the case of a crash?  If you\&#039;ve gone down hard for any reason, there is no way to recover and replay the journal to ensure that, at least the file system is intact, even if you might possibly lose some a few of the last file writes before the crash.  Putting a journal on a different disk might still carry some of those risks, but it seems like you\&#039;re going to be a lot better off.&lt;/p&gt;
&lt;p&gt;Not much sense writing the data Really Really Fast, if you end up losing the entire file system and &lt;em&gt;all the data&lt;/em&gt; when someone trips over the power cord...
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Isn\&#8217;t putting the journal on something like a ramdisk sort of antithetical to the whole idea of a journaling file system, where one of the main purposes is to make the file system more robust in the case of a crash?  If you\&#8217;ve gone down hard for any reason, there is no way to recover and replay the journal to ensure that, at least the file system is intact, even if you might possibly lose some a few of the last file writes before the crash.  Putting a journal on a different disk might still carry some of those risks, but it seems like you\&#8217;re going to be a lot better off.</p>
<p>Not much sense writing the data Really Really Fast, if you end up losing the entire file system and <em>all the data</em> when someone trips over the power cord&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jmoondoggie</title>
		<link>http://www.linux-mag.com/id/7682/#comment-7763</link>
		<dc:creator>jmoondoggie</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7682/#comment-7763</guid>
		<description>&lt;p&gt;I am not surprised by the results.  I would expect smaller block, deep file system results to be not as good as disk because of the amount of metadata created from the write process.&lt;/p&gt;
&lt;p&gt;The manipulation of this data adds to the latency of the SSD.  The larger the block size, the better the write latency results, because the control information is relatively low.  &lt;/p&gt;
&lt;p&gt;Reads, of course, remain almost instantaneous.  &lt;/p&gt;
&lt;p&gt;Since SSD\&#039;s are still very expensive, a customer needs to correctly characterize his transaction processing to make sure solid state storage will pay for itself in the long run.&lt;/p&gt;
&lt;p&gt;Some popular SSD manufacturers don\&#039;t tell you up front that for every 40 Gig of SSD storage, you need to allocate 4 Gig of system ram to process the metadata overhead of small block writes.&lt;/p&gt;
&lt;p&gt;On top of that, systems are tuned to allow for the latency of mechanical disk drives.  So, routines for buffering need to be identified and turned off for SSD\&#039;s.  My question is: Was that done for these tests?
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I am not surprised by the results.  I would expect smaller block, deep file system results to be not as good as disk because of the amount of metadata created from the write process.</p>
<p>The manipulation of this data adds to the latency of the SSD.  The larger the block size, the better the write latency results, because the control information is relatively low.  </p>
<p>Reads, of course, remain almost instantaneous.  </p>
<p>Since SSD\&#8217;s are still very expensive, a customer needs to correctly characterize his transaction processing to make sure solid state storage will pay for itself in the long run.</p>
<p>Some popular SSD manufacturers don\&#8217;t tell you up front that for every 40 Gig of SSD storage, you need to allocate 4 Gig of system ram to process the metadata overhead of small block writes.</p>
<p>On top of that, systems are tuned to allow for the latency of mechanical disk drives.  So, routines for buffering need to be identified and turned off for SSD\&#8217;s.  My question is: Was that done for these tests?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jmoondoggie</title>
		<link>http://www.linux-mag.com/id/7682/#comment-7764</link>
		<dc:creator>jmoondoggie</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7682/#comment-7764</guid>
		<description>&lt;p&gt;My mistake, SSD\&#039;s were not used, but the principals are still similar.  In addition, I agree with ctryon about the security of putting transaction data in ram.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>My mistake, SSD\&#8217;s were not used, but the principals are still similar.  In addition, I agree with ctryon about the security of putting transaction data in ram.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: laytonjb</title>
		<link>http://www.linux-mag.com/id/7682/#comment-7765</link>
		<dc:creator>laytonjb</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7682/#comment-7765</guid>
		<description>&lt;p&gt;If you go back and read the original chain of articles the intent of using the ramdisk is to \&quot;bound\&quot; the performance. Ramdisks have theoretically better performance than anything else and are used to bound the upper end of journal device performance. So at the \&quot;high-end\&quot; you have ramdisks and at the \&quot;low-end\&quot; you have a plain disk. SSD\&#039;s should be somewhere in between.&lt;/p&gt;
&lt;p&gt;I\&#039;m not advocating using ramdisks for the journal. You can do it if you like and there are things you need to do ensure data integrity if you do it. But it is possible. There are some DRAM devices you can use for this approach: ACARD has a cool box, Texas Memory, Violin Memory, etc. All of them should be coupled with a UPS and a mechanism to flush the journal completely, unmounting the file system, and then dumping the contents of the DRAM device to permanent media all in the event of a power failure. Reversing the process on start up involves bringing up the DRAM device, restoring the journal contents from permanent media to the DRAM device, bringing up the file system and mounting it. I\&#039;ve tried this process one as an experiment and ext4 didn\&#039;t mind so I\&#039;m assuming everything went correctly (didn\&#039;t do an fsck but I should have).&lt;/p&gt;
&lt;p&gt;But again, it is possible to use a DRAM device. As with anything though, there are tradeoffs: you can potentially get better performance but it\&#039;s much more of an administrative task.&lt;/p&gt;
&lt;p&gt;The reason I haven\&#039;t tested an SSD is, well, I don\&#039;t have one. I have looked at buying one but my price range is fairly low right now and I didn\&#039;t want to test a substandard SSD (then we get the ensuing argument about \&quot;... that SSD is a piece of garbage and doesn\&#039;t reflect what a REAL SSD can do.. blah, blah\&quot;.&lt;/p&gt;
&lt;p&gt;Some specific answers:&lt;/p&gt;
&lt;p&gt;@genghiskhat - you are correct. I\&#039;m surprised that slipped by - my bad. I will get those fixed.&lt;/p&gt;
&lt;p&gt;@jmoondoogle - I don\&#039;t know of any file systems where the buffering is tuned for mechanical drives. I don\&#039;t think they get that specific although I can ask Eric Sandeen or Theodor Ts\&#039;o. Just remember that there are a number of layers and buffering that happen between the application and the actual drive. Flipping on/off these buffers is not always trivial and does not always produce the desired affects (unless you are a kernel hacker and can read the code - Larry McVoy is quite good at doing this). There are buffers in the file system, there are buffers in the actual VFS as controlled by the kernel, there are buffers in the IO scheduler, there can potentially be buffers in the driver layer, and there are buffers (cache) in the drives themselves. Determining how they all interact is, well, difficult.&lt;/p&gt;
&lt;p&gt;Also, I wanted to examine the impact on performance using just the default options. Trying to determine the impact of \&quot;tuning\&quot; is load dependent and difficult to analyze a priori\&#039;. &lt;/p&gt;
&lt;p&gt;@jmoondoggie: I\&#039;m not aware of any SSD drives using system RAM for buffering. I\&#039;m not exactly sure how they do that because it would have to be in the driver and I haven\&#039;t seen any drivers with buffering but I could definitely be wrong. Can you point to some examples of this behavior?&lt;/p&gt;
&lt;p&gt;Jeff
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>If you go back and read the original chain of articles the intent of using the ramdisk is to \&#8221;bound\&#8221; the performance. Ramdisks have theoretically better performance than anything else and are used to bound the upper end of journal device performance. So at the \&#8221;high-end\&#8221; you have ramdisks and at the \&#8221;low-end\&#8221; you have a plain disk. SSD\&#8217;s should be somewhere in between.</p>
<p>I\&#8217;m not advocating using ramdisks for the journal. You can do it if you like and there are things you need to do ensure data integrity if you do it. But it is possible. There are some DRAM devices you can use for this approach: ACARD has a cool box, Texas Memory, Violin Memory, etc. All of them should be coupled with a UPS and a mechanism to flush the journal completely, unmounting the file system, and then dumping the contents of the DRAM device to permanent media all in the event of a power failure. Reversing the process on start up involves bringing up the DRAM device, restoring the journal contents from permanent media to the DRAM device, bringing up the file system and mounting it. I\&#8217;ve tried this process one as an experiment and ext4 didn\&#8217;t mind so I\&#8217;m assuming everything went correctly (didn\&#8217;t do an fsck but I should have).</p>
<p>But again, it is possible to use a DRAM device. As with anything though, there are tradeoffs: you can potentially get better performance but it\&#8217;s much more of an administrative task.</p>
<p>The reason I haven\&#8217;t tested an SSD is, well, I don\&#8217;t have one. I have looked at buying one but my price range is fairly low right now and I didn\&#8217;t want to test a substandard SSD (then we get the ensuing argument about \&#8221;&#8230; that SSD is a piece of garbage and doesn\&#8217;t reflect what a REAL SSD can do.. blah, blah\&#8221;.</p>
<p>Some specific answers:</p>
<p>@genghiskhat &#8211; you are correct. I\&#8217;m surprised that slipped by &#8211; my bad. I will get those fixed.</p>
<p>@jmoondoogle &#8211; I don\&#8217;t know of any file systems where the buffering is tuned for mechanical drives. I don\&#8217;t think they get that specific although I can ask Eric Sandeen or Theodor Ts\&#8217;o. Just remember that there are a number of layers and buffering that happen between the application and the actual drive. Flipping on/off these buffers is not always trivial and does not always produce the desired affects (unless you are a kernel hacker and can read the code &#8211; Larry McVoy is quite good at doing this). There are buffers in the file system, there are buffers in the actual VFS as controlled by the kernel, there are buffers in the IO scheduler, there can potentially be buffers in the driver layer, and there are buffers (cache) in the drives themselves. Determining how they all interact is, well, difficult.</p>
<p>Also, I wanted to examine the impact on performance using just the default options. Trying to determine the impact of \&#8221;tuning\&#8221; is load dependent and difficult to analyze a priori\&#8217;. </p>
<p>@jmoondoggie: I\&#8217;m not aware of any SSD drives using system RAM for buffering. I\&#8217;m not exactly sure how they do that because it would have to be in the driver and I haven\&#8217;t seen any drivers with buffering but I could definitely be wrong. Can you point to some examples of this behavior?</p>
<p>Jeff</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jmoondoggie</title>
		<link>http://www.linux-mag.com/id/7682/#comment-7766</link>
		<dc:creator>jmoondoggie</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7682/#comment-7766</guid>
		<description>&lt;p&gt;As far as LINUX system tuning for SSD\&#039;s it is good to turn on O_Direct, which bypasses the page/buffer cache.  Normally, this would decrease performance on a relatively slow mechanical drive, but in the case of NAND Flash based SSD\&#039;s it significantly INCREASES performance, on reads and writes.&lt;/p&gt;
&lt;p&gt;Fusion-io is such a product.  But the problem with NAND Flash is that it inherently has latency on writes due to the way it commits the write internally.  So, the smaller the write block, the more control overhead to handle, thus the need to overflow into system resources because it can\&#039;t handle it on such a small card.  Again, reads are not a problem.  It is the write process inherent in NAND Flash.&lt;/p&gt;
&lt;p&gt;One area Fusion-io is different, is that they don\&#039;t use standard disk drive protocol like SCSI or SATA.  They pass data directly from the \&quot;drive\&quot; to the PCIe bus.  So there is no latency injected from the standard disk channel protocol.  To compensate for the increased speed through PCIe, it is necessary to set O_Direct.&lt;/p&gt;
&lt;p&gt;This is a double edged sword, because although the performance is fast, the drive is not SNIA-compliant or SMART compliant, and a disk controller card such as Promise can\&#039;t manage it, or use it for hardware based RAID.  It also means it\&#039;s not bootable (it can\&#039;t be used to load an OS).
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>As far as LINUX system tuning for SSD\&#8217;s it is good to turn on O_Direct, which bypasses the page/buffer cache.  Normally, this would decrease performance on a relatively slow mechanical drive, but in the case of NAND Flash based SSD\&#8217;s it significantly INCREASES performance, on reads and writes.</p>
<p>Fusion-io is such a product.  But the problem with NAND Flash is that it inherently has latency on writes due to the way it commits the write internally.  So, the smaller the write block, the more control overhead to handle, thus the need to overflow into system resources because it can\&#8217;t handle it on such a small card.  Again, reads are not a problem.  It is the write process inherent in NAND Flash.</p>
<p>One area Fusion-io is different, is that they don\&#8217;t use standard disk drive protocol like SCSI or SATA.  They pass data directly from the \&#8221;drive\&#8221; to the PCIe bus.  So there is no latency injected from the standard disk channel protocol.  To compensate for the increased speed through PCIe, it is necessary to set O_Direct.</p>
<p>This is a double edged sword, because although the performance is fast, the drive is not SNIA-compliant or SMART compliant, and a disk controller card such as Promise can\&#8217;t manage it, or use it for hardware based RAID.  It also means it\&#8217;s not bootable (it can\&#8217;t be used to load an OS).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: markseger</title>
		<link>http://www.linux-mag.com/id/7682/#comment-7767</link>
		<dc:creator>markseger</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7682/#comment-7767</guid>
		<description>&lt;p&gt;I fear you\&#039;ve made a common basic mistake many people seem to make and that is evaluating a benchmark on its runtime alone.  While those numbers certainly provide a good first level approximation, they only tell part of the story.  For example, were there any unexpected spikes in CPU during the test?  Maybe there were unexpected stalls in the disk I/O during the run?  Or maybe something else that was unexpected.&lt;/p&gt;
&lt;p&gt;Whenever I run an I/O benchmark I always run collectl in parallel, measuring a wide variety of metrics every 10 seconds, and graph the output.  Often times something jumps out to indicate an invalid test that might reveal a kernel bug or mistuned system.  If you can\&#039;t get a relatively smooth I/O rate, you\&#039;re not reporting valid numbers in your result OR simply identifying a system limitation that itself can be important to note.  &lt;/p&gt;
&lt;p&gt;-mark
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I fear you\&#8217;ve made a common basic mistake many people seem to make and that is evaluating a benchmark on its runtime alone.  While those numbers certainly provide a good first level approximation, they only tell part of the story.  For example, were there any unexpected spikes in CPU during the test?  Maybe there were unexpected stalls in the disk I/O during the run?  Or maybe something else that was unexpected.</p>
<p>Whenever I run an I/O benchmark I always run collectl in parallel, measuring a wide variety of metrics every 10 seconds, and graph the output.  Often times something jumps out to indicate an invalid test that might reveal a kernel bug or mistuned system.  If you can\&#8217;t get a relatively smooth I/O rate, you\&#8217;re not reporting valid numbers in your result OR simply identifying a system limitation that itself can be important to note.  </p>
<p>-mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jmoondoggie</title>
		<link>http://www.linux-mag.com/id/7682/#comment-7768</link>
		<dc:creator>jmoondoggie</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7682/#comment-7768</guid>
		<description>&lt;p&gt;Mark,&lt;br /&gt;
Thanks for the collectl tip.  I immediately downloaded it and ran it while running a fio benchmark in another terminal.  Watching them side by side was very enlightening. Fio is about the most flexible open source benchmarking tool I\&#039;ve seen.  With collectl I can see a lot more metrics at play. I can\&#039;t wait to set up samba and start watching some differences in network load.&lt;br /&gt;
But I do agree that runtime alone doesn\&#039;t tell the whole story.  Thanks again.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Mark,<br />
Thanks for the collectl tip.  I immediately downloaded it and ran it while running a fio benchmark in another terminal.  Watching them side by side was very enlightening. Fio is about the most flexible open source benchmarking tool I\&#8217;ve seen.  With collectl I can see a lot more metrics at play. I can\&#8217;t wait to set up samba and start watching some differences in network load.<br />
But I do agree that runtime alone doesn\&#8217;t tell the whole story.  Thanks again.</p>
]]></content:encoded>
	</item>
</channel>
</rss>