<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Saving Your Data Bacon with Write Barriers and Journal Check Summing</title>
	<atom:link href="http://www.linux-mag.com/id/7773/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.linux-mag.com/id/7773/</link>
	<description>Open Source, Open Standards</description>
	<lastBuildDate>Sat, 05 Oct 2013 13:48:18 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
	<item>
		<title>By: stevemadere</title>
		<link>http://www.linux-mag.com/id/7773/#comment-8332</link>
		<dc:creator>stevemadere</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7773/#comment-8332</guid>
		<description>&lt;p&gt;how about putting the journal on a separate disk with the write cache turned off?  Performance should be better than a single drive with write cache turned on because the journal disk only writes linearly (80 Mb/sec sustained typically) and the data disk can cache to its heart\&#039;s content.  This seems like it might provide a \&#039;perfect\&#039; solution if it weren\&#039;t for that \&#039;little\&#039; problem of using up one of your disk drive bays and one of your controller ports (sata or scsi).&lt;/p&gt;
&lt;p&gt;Yet another arrangement that can be a practical compromise is to place any cold storage (rarely accessed and typically in large contiguous blocks when it is accessed) on the same physical drive with the journal&lt;br /&gt;
for your hot storage FS.  The data for the hot storage FS is on a separate dedicated drive.  This way, you\&#039;re not wasting drive space or bays but you do have to do the homework up front of figuring out which of our data fits the hot access profile and which fits the cold access profile.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>how about putting the journal on a separate disk with the write cache turned off?  Performance should be better than a single drive with write cache turned on because the journal disk only writes linearly (80 Mb/sec sustained typically) and the data disk can cache to its heart\&#8217;s content.  This seems like it might provide a \&#8217;perfect\&#8217; solution if it weren\&#8217;t for that \&#8217;little\&#8217; problem of using up one of your disk drive bays and one of your controller ports (sata or scsi).</p>
<p>Yet another arrangement that can be a practical compromise is to place any cold storage (rarely accessed and typically in large contiguous blocks when it is accessed) on the same physical drive with the journal<br />
for your hot storage FS.  The data for the hot storage FS is on a separate dedicated drive.  This way, you\&#8217;re not wasting drive space or bays but you do have to do the homework up front of figuring out which of our data fits the hot access profile and which fits the cold access profile.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: aotto</title>
		<link>http://www.linux-mag.com/id/7773/#comment-8333</link>
		<dc:creator>aotto</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7773/#comment-8333</guid>
		<description>&lt;p&gt;Activating write barriers or disabling the disk cache are both things that most readers would be reluctant do do because of the reduced performance that will most certainly result.&lt;/p&gt;
&lt;p&gt;Am I wrong to assume that any hard drive worth a salt has a capacitor on it that can power the drive long enough to commit any uncommitted parts of the local drive cache upon power failure? &lt;/p&gt;
&lt;p&gt;With a properly designed hardware device, the need for write barriers should be redundant. With fewer cache flushes on the device, sustained performance should be much higher.&lt;/p&gt;
&lt;p&gt;Also, using SSD drives could be another way to get around this, at least having one small SSD drive to hold the filesystem journal boosts performance a whole lot.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Activating write barriers or disabling the disk cache are both things that most readers would be reluctant do do because of the reduced performance that will most certainly result.</p>
<p>Am I wrong to assume that any hard drive worth a salt has a capacitor on it that can power the drive long enough to commit any uncommitted parts of the local drive cache upon power failure? </p>
<p>With a properly designed hardware device, the need for write barriers should be redundant. With fewer cache flushes on the device, sustained performance should be much higher.</p>
<p>Also, using SSD drives could be another way to get around this, at least having one small SSD drive to hold the filesystem journal boosts performance a whole lot.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jafcobend</title>
		<link>http://www.linux-mag.com/id/7773/#comment-8334</link>
		<dc:creator>jafcobend</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7773/#comment-8334</guid>
		<description>&lt;p&gt;Thank you for the article. It has caused me to think quite a bit. Before I begin I would ask, \&quot;PLEASE, PLEASE, PLEASE proof read!\&quot; I\&#039;m still not sure how a couple of the sentences were supposed to be read. I haven\&#039;t spent any time investigating how the on-drive caches work so some of my thoughts may be from pure ignorance. &lt;/p&gt;
&lt;p&gt;1. Does the on-board drive cache add that much in performance? Does the extra 3GB of system RAM, that I have, not trump whatever cache improvements I would get from the on-drive cache? I thought the schedulers, especially with the ability to pick the one that best fits your work load would do a better job of caching and ordering writes than whatever mechanism is on the drive. I think I\&#039;m going to investigate turning off the on-drive caches on my system just to find out.&lt;/p&gt;
&lt;p&gt;2. It seems from a theoretical standpoint that caching should not be done on the drive if you have an OS that is designed well at all. For which I\&#039;m fairly confident that Linux is. :-) The drive can never know the correct ordering required to push the data to platter to maintain data integrity. Nor can drives bring the kind of cache sizes to bare that a computer can.&lt;/p&gt;
&lt;p&gt;3. Putting a file system log on another drive without caching ensures the integrity of the log but not the data that it represents. Specifically in the case of expanding files or new files. The data written to the file may have not made it to the data drive while the log says it did. So even though you have space allocated the data within it is erroneous. And then again there are no guarantees with the log on the same drive and on-drive caching enabled.&lt;/p&gt;
&lt;p&gt;4. Are the drives guaranteed to flush the caches before the system powers off? I would assume that the drive would flush the cache when it receives a power down signal. But that also assumes that the drive is given enough time to do so before the system power goes off. Perhaps the caps mentioned by a previous poster are real indeed?&lt;/p&gt;
&lt;p&gt;5. Do all drives with cache support write barriers and cache control?&lt;/p&gt;
&lt;p&gt;I hope this sparks some interesting discussion. But it seems to me that regardless of whatever slowdown you might get you are better off letting Linux figure out the best way to order writes then you are letting the drive, except for the case of battery backed caches.&lt;/p&gt;
&lt;p&gt;I am a firm believer in decently long lived battery backups regardless. Any computer handling data that is worth anything, and especially if its shared on a network, &lt;strong&gt;MUST&lt;/strong&gt; have a decent battery backup, which is checked regularly to make sure the battery is still viable. But then during the spring and fall in my area the 60 MPH winds will cause numerous power failures in a single day.&lt;/p&gt;
&lt;p&gt;Thanks again for the article.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Thank you for the article. It has caused me to think quite a bit. Before I begin I would ask, \&#8221;PLEASE, PLEASE, PLEASE proof read!\&#8221; I\&#8217;m still not sure how a couple of the sentences were supposed to be read. I haven\&#8217;t spent any time investigating how the on-drive caches work so some of my thoughts may be from pure ignorance. </p>
<p>1. Does the on-board drive cache add that much in performance? Does the extra 3GB of system RAM, that I have, not trump whatever cache improvements I would get from the on-drive cache? I thought the schedulers, especially with the ability to pick the one that best fits your work load would do a better job of caching and ordering writes than whatever mechanism is on the drive. I think I\&#8217;m going to investigate turning off the on-drive caches on my system just to find out.</p>
<p>2. It seems from a theoretical standpoint that caching should not be done on the drive if you have an OS that is designed well at all. For which I\&#8217;m fairly confident that Linux is. :-) The drive can never know the correct ordering required to push the data to platter to maintain data integrity. Nor can drives bring the kind of cache sizes to bare that a computer can.</p>
<p>3. Putting a file system log on another drive without caching ensures the integrity of the log but not the data that it represents. Specifically in the case of expanding files or new files. The data written to the file may have not made it to the data drive while the log says it did. So even though you have space allocated the data within it is erroneous. And then again there are no guarantees with the log on the same drive and on-drive caching enabled.</p>
<p>4. Are the drives guaranteed to flush the caches before the system powers off? I would assume that the drive would flush the cache when it receives a power down signal. But that also assumes that the drive is given enough time to do so before the system power goes off. Perhaps the caps mentioned by a previous poster are real indeed?</p>
<p>5. Do all drives with cache support write barriers and cache control?</p>
<p>I hope this sparks some interesting discussion. But it seems to me that regardless of whatever slowdown you might get you are better off letting Linux figure out the best way to order writes then you are letting the drive, except for the case of battery backed caches.</p>
<p>I am a firm believer in decently long lived battery backups regardless. Any computer handling data that is worth anything, and especially if its shared on a network, <strong>MUST</strong> have a decent battery backup, which is checked regularly to make sure the battery is still viable. But then during the spring and fall in my area the 60 MPH winds will cause numerous power failures in a single day.</p>
<p>Thanks again for the article.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: laytonjb</title>
		<link>http://www.linux-mag.com/id/7773/#comment-8335</link>
		<dc:creator>laytonjb</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7773/#comment-8335</guid>
		<description>&lt;p&gt;@jacfcobend:&lt;br /&gt;
I have to admit that this article was difficult to write because it is so technical and so involved. But I felt it was an important topic to at least broach so I went for it. If you have problems with any particular sentence or paragraph, please point it out. I did my best to proof the article - I can\&#039;t tell you how many times I re-read it :)  But I\&#039;m sure I missed something (actually one person just pointed out a typo in a past article).&lt;/p&gt;
&lt;p&gt;Now on to the questions... But before I answer them let me say that I don\&#039;t know that much about how drives work internally. I don\&#039;t know how the cache interacts with the kernel, etc. (what is really interesting is that the new Seagate drive with the built-in SSD apparently can \&quot;learn\&quot; and move certain blocks onto the SSD so drives have far more intelligence that I knew about).&lt;/p&gt;
&lt;p&gt;1. It\&#039;s always good to experiment with various options (write barriers, disabling the drive cache). It helps us understand the performance implications and then we can make an informed decision about choices we make. But to be honest, I don\&#039;t know the relative impact of drive cache vs. system memory. The only \&quot;gotcha\&quot; is that system memory can\&#039;t generally be allocated for IO - the kernel will do what it wants with it (I wish there was a way to force that - need to ask the kernel gods about that). BTW - if you have any data to share, let\&#039;s hear about it! You can always write a quick article for Linux Mag :)&lt;/p&gt;
&lt;p&gt;2. Try turning off the drive cache. Performance isn\&#039;t always that great. Plus if the cache is disabled then the kernel will have to \&quot;pause\&quot; more often for the drive to return. This will definitely impact performance.&lt;/p&gt;
&lt;p&gt;3. Great observation.&lt;/p&gt;
&lt;p&gt;4. The drives will flush as part of the shutdown. I forgot the details but there was some discussion on the ext4 mailing list where Larry McVoy (I hope I got his name correct from memory) pointed out that until the file system is unmounted, the drive may not truly flush it\&#039;s cache. I admit that I don\&#039;t know the details of sync() or fsync() to know when data is truly flushed from cache.&lt;/p&gt;
&lt;p&gt;5. I believe that all drives (within reason) obey the write barrier. &lt;/p&gt;
&lt;p&gt;Thanks for the comments! Greatly appreciated. And again, if you have particular sentences or passages that seem goofy, let me know. Either I didn\&#039;t explain things well or I made mistakes.&lt;/p&gt;
&lt;p&gt;Jeff
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>@jacfcobend:<br />
I have to admit that this article was difficult to write because it is so technical and so involved. But I felt it was an important topic to at least broach so I went for it. If you have problems with any particular sentence or paragraph, please point it out. I did my best to proof the article &#8211; I can\&#8217;t tell you how many times I re-read it :)  But I\&#8217;m sure I missed something (actually one person just pointed out a typo in a past article).</p>
<p>Now on to the questions&#8230; But before I answer them let me say that I don\&#8217;t know that much about how drives work internally. I don\&#8217;t know how the cache interacts with the kernel, etc. (what is really interesting is that the new Seagate drive with the built-in SSD apparently can \&#8221;learn\&#8221; and move certain blocks onto the SSD so drives have far more intelligence that I knew about).</p>
<p>1. It\&#8217;s always good to experiment with various options (write barriers, disabling the drive cache). It helps us understand the performance implications and then we can make an informed decision about choices we make. But to be honest, I don\&#8217;t know the relative impact of drive cache vs. system memory. The only \&#8221;gotcha\&#8221; is that system memory can\&#8217;t generally be allocated for IO &#8211; the kernel will do what it wants with it (I wish there was a way to force that &#8211; need to ask the kernel gods about that). BTW &#8211; if you have any data to share, let\&#8217;s hear about it! You can always write a quick article for Linux Mag :)</p>
<p>2. Try turning off the drive cache. Performance isn\&#8217;t always that great. Plus if the cache is disabled then the kernel will have to \&#8221;pause\&#8221; more often for the drive to return. This will definitely impact performance.</p>
<p>3. Great observation.</p>
<p>4. The drives will flush as part of the shutdown. I forgot the details but there was some discussion on the ext4 mailing list where Larry McVoy (I hope I got his name correct from memory) pointed out that until the file system is unmounted, the drive may not truly flush it\&#8217;s cache. I admit that I don\&#8217;t know the details of sync() or fsync() to know when data is truly flushed from cache.</p>
<p>5. I believe that all drives (within reason) obey the write barrier. </p>
<p>Thanks for the comments! Greatly appreciated. And again, if you have particular sentences or passages that seem goofy, let me know. Either I didn\&#8217;t explain things well or I made mistakes.</p>
<p>Jeff</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jtmcdole</title>
		<link>http://www.linux-mag.com/id/7773/#comment-8336</link>
		<dc:creator>jtmcdole</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7773/#comment-8336</guid>
		<description>&lt;p&gt;@jacfcobend&lt;/p&gt;
&lt;p&gt;#3: There are different levels of journaling.  The basic level of journaling is to log the file system meta data; you take a small performance hit for some security.  You\&#039;re asking about full-journaling in which ALL user data is written to the log first.  This can be done with some file systems, but you take a much bigger hit as the data has to be written to the disk twice.&lt;/p&gt;
&lt;p&gt;For people who are really worried about their data, you could invest some time in parity files (PAR2 for example).  I\&#039;m really shocked that the ext3 journal didn\&#039;t have checksums!
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>@jacfcobend</p>
<p>#3: There are different levels of journaling.  The basic level of journaling is to log the file system meta data; you take a small performance hit for some security.  You\&#8217;re asking about full-journaling in which ALL user data is written to the log first.  This can be done with some file systems, but you take a much bigger hit as the data has to be written to the disk twice.</p>
<p>For people who are really worried about their data, you could invest some time in parity files (PAR2 for example).  I\&#8217;m really shocked that the ext3 journal didn\&#8217;t have checksums!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jab1</title>
		<link>http://www.linux-mag.com/id/7773/#comment-8337</link>
		<dc:creator>jab1</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7773/#comment-8337</guid>
		<description>&lt;p&gt;(I just found this interesting article!)&lt;/p&gt;
&lt;p&gt;In a slightly ironic twist, not long after they finally got the I/O barriers working also with LVM/MD in 2.6.33, now they&#039;re getting rid of them. See http://lkml.org/lkml/2010/9/3/199&lt;/p&gt;
&lt;p&gt;No need to worry about data integrity with volatile write caches though; the barriers are being replaced with a (simpler) interface to issue cache flush and FUA commands to the devices. Beyond being simpler, the big win with the new code is avoiding IO queue draining. In the current ordered barrier code, ordering is ensured by draining the queue before flushing the cache, but in practice all file systems (well, reiserfs could optionally use the ordered barrier code for ordering) already take care of ordering themselves by waiting for completion before submitting dependent IO&#039;s. So the draining turns out to be unnecessary, and it can have a rather big performance impact.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>(I just found this interesting article!)</p>
<p>In a slightly ironic twist, not long after they finally got the I/O barriers working also with LVM/MD in 2.6.33, now they&#8217;re getting rid of them. See <a href="http://lkml.org/lkml/2010/9/3/199" rel="nofollow">http://lkml.org/lkml/2010/9/3/199</a></p>
<p>No need to worry about data integrity with volatile write caches though; the barriers are being replaced with a (simpler) interface to issue cache flush and FUA commands to the devices. Beyond being simpler, the big win with the new code is avoiding IO queue draining. In the current ordered barrier code, ordering is ensured by draining the queue before flushing the cache, but in practice all file systems (well, reiserfs could optionally use the ordered barrier code for ordering) already take care of ordering themselves by waiting for completion before submitting dependent IO&#8217;s. So the draining turns out to be unnecessary, and it can have a rather big performance impact.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jab1</title>
		<link>http://www.linux-mag.com/id/7773/#comment-8338</link>
		<dc:creator>jab1</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7773/#comment-8338</guid>
		<description>&lt;p&gt;@jacfcobend:&lt;/p&gt;
&lt;p&gt;1) It&#039;s not so much about increasing the amount of cache memory, which as you note is rather insignificant compared to all the RAM the kernel can use for the page cache. Rather it&#039;s about reducing command latency. I.e. instead of &quot;kernel issues write -&gt; device writes stuff to disk -&gt; device signals completion&quot; you have &quot;kernel issues write -&gt; device writes to cache memory -&gt; device signals completion&quot;. Also, presumably the drive has the best information about where it&#039;s head is at the moment, and is thus in the best position to decide in which order to serve I/O commands. Another way to reduce the impact of command latency is to have multiple outstanding commands; SCSI has had this since, well, forever with something called TCQ. Which perhaps explains why SCSI devices typically don&#039;t have volatile write caches. SATA nowadays has something roughly equivalent called NCQ, but it came on the scene after volatile write caches were already the norm in the (S)ATA world.&lt;/p&gt;
&lt;p&gt;2) In Linux ordering is handled by the filesystems waiting for IO&#039;s to complete before issuing dependent IO&#039;s. The block layer, and the device command queue for devices supporting such a thing, are free to reorder IO&#039;s in any way they see fit.&lt;/p&gt;
&lt;p&gt;4) Sadly, I know of no traditional drive with capacitors. If it had that, it would be awesome; we could safely mount our filesystems with barrier=0 and still be safe. But, I&#039;m sure there are mechanisms to ensure caches are flushed when shutting down; I don&#039;t know if the OS explicitly has to do that when unmounting, or if hw handles it itself.&lt;/p&gt;
&lt;p&gt;5) There are rumors about drives which don&#039;t honor cache flush commands, but AFAIK more or less all drives nowadays do honor them. Also, to be pedantic, neither the SCSI nor SATA standards know anything about write barriers, they are purely a software concept in the Linux kernel (implemented via queue draining and cache flushing). However see my previous post about how they are being replaced.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>@jacfcobend:</p>
<p>1) It&#8217;s not so much about increasing the amount of cache memory, which as you note is rather insignificant compared to all the RAM the kernel can use for the page cache. Rather it&#8217;s about reducing command latency. I.e. instead of &#8220;kernel issues write -&gt; device writes stuff to disk -&gt; device signals completion&#8221; you have &#8220;kernel issues write -&gt; device writes to cache memory -&gt; device signals completion&#8221;. Also, presumably the drive has the best information about where it&#8217;s head is at the moment, and is thus in the best position to decide in which order to serve I/O commands. Another way to reduce the impact of command latency is to have multiple outstanding commands; SCSI has had this since, well, forever with something called TCQ. Which perhaps explains why SCSI devices typically don&#8217;t have volatile write caches. SATA nowadays has something roughly equivalent called NCQ, but it came on the scene after volatile write caches were already the norm in the (S)ATA world.</p>
<p>2) In Linux ordering is handled by the filesystems waiting for IO&#8217;s to complete before issuing dependent IO&#8217;s. The block layer, and the device command queue for devices supporting such a thing, are free to reorder IO&#8217;s in any way they see fit.</p>
<p>4) Sadly, I know of no traditional drive with capacitors. If it had that, it would be awesome; we could safely mount our filesystems with barrier=0 and still be safe. But, I&#8217;m sure there are mechanisms to ensure caches are flushed when shutting down; I don&#8217;t know if the OS explicitly has to do that when unmounting, or if hw handles it itself.</p>
<p>5) There are rumors about drives which don&#8217;t honor cache flush commands, but AFAIK more or less all drives nowadays do honor them. Also, to be pedantic, neither the SCSI nor SATA standards know anything about write barriers, they are purely a software concept in the Linux kernel (implemented via queue draining and cache flushing). However see my previous post about how they are being replaced.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: grabur</title>
		<link>http://www.linux-mag.com/id/7773/#comment-8339</link>
		<dc:creator>grabur</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7773/#comment-8339</guid>
		<description>&lt;p&gt;Thank you.  I always enjoy your articles.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Thank you.  I always enjoy your articles.</p>
]]></content:encoded>
	</item>
</channel>
</rss>