<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Expecting to Fail</title>
	<atom:link href="http://www.linux-mag.com/id/7543/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.linux-mag.com/id/7543/</link>
	<description>Open Source, Open Standards</description>
	<lastBuildDate>Sat, 05 Oct 2013 13:48:18 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
	<item>
		<title>By: loesprite</title>
		<link>http://www.linux-mag.com/id/7543/#comment-7080</link>
		<dc:creator>loesprite</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7543/#comment-7080</guid>
		<description>&lt;p&gt;I\&#039;d like to say that we all know there\&#039;s so many ways to keep the data and services safe. But at some point, you may not have enough budget on redundancy, locality and caching. Maybe you have all the services running on a few servers in your house and the data would be easily distroied if the 2 or 3 disk die at the same time ( fire or earthquake ). Regarding to this, what we may think about is something like priority. And also we need some cheap and effective data keeper solutions.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I\&#8217;d like to say that we all know there\&#8217;s so many ways to keep the data and services safe. But at some point, you may not have enough budget on redundancy, locality and caching. Maybe you have all the services running on a few servers in your house and the data would be easily distroied if the 2 or 3 disk die at the same time ( fire or earthquake ). Regarding to this, what we may think about is something like priority. And also we need some cheap and effective data keeper solutions.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: markseger</title>
		<link>http://www.linux-mag.com/id/7543/#comment-7081</link>
		<dc:creator>markseger</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7543/#comment-7081</guid>
		<description>&lt;p&gt;First of all I agree completely with the notion about proactive monitoring.  I\&#039;d further add the criticality of fine-gained monitoring.  So many people run SAR at the default interval of 10 minutes and don\&#039;t realize the data they\&#039;re collecting is mush!  I\&#039;d also say it\&#039;s critical that all systems in a cluster are synchronized with NTP and that all your monitoring samples all systems at as close as possible (within a few msecs) so when there is a problem you can look at all your logs on all your systems to see what happened and in what order.&lt;/p&gt;
&lt;p&gt;That said, I\&#039;d suggest you check out the open source monitoring tool I wrote a number of years ago called collectl - see: http://collectl.sourceforge.net/ which does everything I said above and more.  Just start it and it will collect samples every 10 seconds to the nearest msec, saving very detailed logs for a week (or more if you like).  Collectl runs on some of the largest clusters in the world, many of which are on the Top500 list.&lt;/p&gt;
&lt;p&gt;Rather than me ramble more, check it out and the next time you do have a failure for which you don\&#039;t know the reason, there\&#039;s a good chance collectl will.&lt;/p&gt;
&lt;p&gt;-mark
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>First of all I agree completely with the notion about proactive monitoring.  I\&#8217;d further add the criticality of fine-gained monitoring.  So many people run SAR at the default interval of 10 minutes and don\&#8217;t realize the data they\&#8217;re collecting is mush!  I\&#8217;d also say it\&#8217;s critical that all systems in a cluster are synchronized with NTP and that all your monitoring samples all systems at as close as possible (within a few msecs) so when there is a problem you can look at all your logs on all your systems to see what happened and in what order.</p>
<p>That said, I\&#8217;d suggest you check out the open source monitoring tool I wrote a number of years ago called collectl &#8211; see: <a href="http://collectl.sourceforge.net/" rel="nofollow">http://collectl.sourceforge.net/</a> which does everything I said above and more.  Just start it and it will collect samples every 10 seconds to the nearest msec, saving very detailed logs for a week (or more if you like).  Collectl runs on some of the largest clusters in the world, many of which are on the Top500 list.</p>
<p>Rather than me ramble more, check it out and the next time you do have a failure for which you don\&#8217;t know the reason, there\&#8217;s a good chance collectl will.</p>
<p>-mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: amax</title>
		<link>http://www.linux-mag.com/id/7543/#comment-7082</link>
		<dc:creator>amax</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7543/#comment-7082</guid>
		<description>&lt;p&gt;As an \&quot;Old Guy\&quot; who started a while ago, Networking sure has gotten complicated as time goes on.  Some of us have had certain \&#039;rules\&#039; drilled into our heads from training &amp; have not bothered to update, regardless of popular opinions.&lt;/p&gt;
&lt;p&gt;This is a great article that all Networking folks should have a gawk at; it could save their jobs!
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>As an \&#8221;Old Guy\&#8221; who started a while ago, Networking sure has gotten complicated as time goes on.  Some of us have had certain \&#8217;rules\&#8217; drilled into our heads from training &#38; have not bothered to update, regardless of popular opinions.</p>
<p>This is a great article that all Networking folks should have a gawk at; it could save their jobs!</p>
]]></content:encoded>
	</item>
</channel>
</rss>