<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Deduping Storage Deduplication</title>
	<atom:link href="http://www.linux-mag.com/id/7535/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.linux-mag.com/id/7535/</link>
	<description>Open Source, Open Standards</description>
	<lastBuildDate>Sat, 05 Oct 2013 13:48:18 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
	<item>
		<title>By: Ryan Covietz</title>
		<link>http://www.linux-mag.com/id/7535/#comment-366901</link>
		<dc:creator>Ryan Covietz</dc:creator>
		<pubDate>Fri, 31 Aug 2012 17:26:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-366901</guid>
		<description>ttsiodras, I know this thread is almost 3 years old now but I had a question regarding your comment:

&quot;Notice that when you use \”–inplace\”, rsync writes directly on-top of the already existing file in the destination filesystem, and ONLY in the places that changed! This means that by using ZFS (which is copy-on-write) you get the BLOCK-level deduplication...&quot;

Have you actually observed this in practice or is this just theory?

I am attempting a similar backup scheme using rsync --inplace with the destination file residing on a NetApp WAFL filesystem (which is also copy-on-write like ZFS). What I have found is that even with the --inplace option rsync still rewrites the entire file on the destination block by block (note I am also using the --no-whole-file option). It is better than without --inplace, as by default rsync would create an entirely new temporary copy of the file on the destination before overwriting the original (thus causing the COW filesystem to incur a 200% penalty in snapshot space utilization). However, I find that rsync --inplace does not update only the changed blocks on the destination file as you described, rather it still rewrites the whole file &quot;inplace&quot;.

The only advantage to the --inplace option that I see so far is that you don&#039;t need double the storage space on the destination to temporarily keep 2 copies of the file being rsync&#039;ed.</description>
		<content:encoded><![CDATA[<p>ttsiodras, I know this thread is almost 3 years old now but I had a question regarding your comment:</p>
<p>&#8220;Notice that when you use \”–inplace\”, rsync writes directly on-top of the already existing file in the destination filesystem, and ONLY in the places that changed! This means that by using ZFS (which is copy-on-write) you get the BLOCK-level deduplication&#8230;&#8221;</p>
<p>Have you actually observed this in practice or is this just theory?</p>
<p>I am attempting a similar backup scheme using rsync &#8211;inplace with the destination file residing on a NetApp WAFL filesystem (which is also copy-on-write like ZFS). What I have found is that even with the &#8211;inplace option rsync still rewrites the entire file on the destination block by block (note I am also using the &#8211;no-whole-file option). It is better than without &#8211;inplace, as by default rsync would create an entirely new temporary copy of the file on the destination before overwriting the original (thus causing the COW filesystem to incur a 200% penalty in snapshot space utilization). However, I find that rsync &#8211;inplace does not update only the changed blocks on the destination file as you described, rather it still rewrites the whole file &#8220;inplace&#8221;.</p>
<p>The only advantage to the &#8211;inplace option that I see so far is that you don&#8217;t need double the storage space on the destination to temporarily keep 2 copies of the file being rsync&#8217;ed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: p g</title>
		<link>http://www.linux-mag.com/id/7535/#comment-9794</link>
		<dc:creator>p g</dc:creator>
		<pubDate>Tue, 26 Jul 2011 17:44:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-9794</guid>
		<description>ttsiodras:

I&#039;ve been running &quot;almost&quot; this exact backup scheme for my clients for years.  A major limitation is exposed when a client renames a directory, the backup (in this case rsync) will see this as a new directory, and it will needlessly copy it over and delete the old directory, essentially doubling the space used by that directory.  

Example: Client renames /home/user/15TB_Folder  to  /home/user/15Terabyte_Folder.

What would happen then?  No, seriously.  I&#039;m wondering how your snapshot scenario would deal with that, since I don&#039;t use snapshots, I use hardlinks, much like BackupPC and Rsnapshot.

NOTE: I&#039;ve also used the nilfs2 LFS combined with the --inplace rsync option with very good results.</description>
		<content:encoded><![CDATA[<p>ttsiodras:</p>
<p>I&#8217;ve been running &#8220;almost&#8221; this exact backup scheme for my clients for years.  A major limitation is exposed when a client renames a directory, the backup (in this case rsync) will see this as a new directory, and it will needlessly copy it over and delete the old directory, essentially doubling the space used by that directory.  </p>
<p>Example: Client renames /home/user/15TB_Folder  to  /home/user/15Terabyte_Folder.</p>
<p>What would happen then?  No, seriously.  I&#8217;m wondering how your snapshot scenario would deal with that, since I don&#8217;t use snapshots, I use hardlinks, much like BackupPC and Rsnapshot.</p>
<p>NOTE: I&#8217;ve also used the nilfs2 LFS combined with the &#8211;inplace rsync option with very good results.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: psevetson</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7033</link>
		<dc:creator>psevetson</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7033</guid>
		<description>&lt;p&gt;You might want to fix your title: I don\&#039;t think you meant to say \&quot;Depulication\&quot;, when you\&#039;re talking about \&quot;Deduplication.\&quot;
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>You might want to fix your title: I don\&#8217;t think you meant to say \&#8221;Depulication\&#8221;, when you\&#8217;re talking about \&#8221;Deduplication.\&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: psevetson</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7034</link>
		<dc:creator>psevetson</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7034</guid>
		<description>&lt;p&gt;Or maybe you meant to say \&quot;Duplication?\&quot;
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Or maybe you meant to say \&#8221;Duplication?\&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: pittendrigh</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7035</link>
		<dc:creator>pittendrigh</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7035</guid>
		<description>&lt;p&gt;fdupes -r /wherever &gt; /tmp/dupeslog&lt;/p&gt;
&lt;p&gt;deduper.pl /tmp/dupeslog    &lt;/p&gt;
&lt;p&gt;.......where deduper.pl is:&lt;br /&gt;
#!/usr/bin/perl&lt;/p&gt;
&lt;p&gt;$file = shift;&lt;br /&gt;
open FILE, $file or die \&quot;no good $file open \\n\&quot;;&lt;br /&gt;
$mode = shift;&lt;/p&gt;
&lt;p&gt;$cnt=0;&lt;br /&gt;
while (&lt;FILE&gt;) {&lt;br /&gt;
    chomp;   # just because!&lt;br /&gt;
    if ( /^\\s*$/ ) {&lt;br /&gt;
       print \&quot;save: \&quot;, $paths[0], \&quot;\\n\&quot;;&lt;br /&gt;
       for($i=1; $i&lt;$cnt;$i++){&lt;br /&gt;
           if($mode eq \&#039;delete\&#039;){ unlink ($paths[$i]); }&lt;br /&gt;
           else { print \&quot;delete: \&quot; , $paths[$i], \&quot;\\n\&quot;; }&lt;br /&gt;
       }&lt;br /&gt;
       print \&quot;\\n\\n\&quot;;&lt;br /&gt;
       $cnt=0;&lt;br /&gt;
    }&lt;br /&gt;
    else{&lt;br /&gt;
       $paths[$cnt] = $_ ;&lt;br /&gt;
       $cnt++;&lt;br /&gt;
    }&lt;br /&gt;
}
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>fdupes -r /wherever &gt; /tmp/dupeslog</p>
<p>deduper.pl /tmp/dupeslog    </p>
<p>&#8230;&#8230;.where deduper.pl is:<br />
#!/usr/bin/perl</p>
<p>$file = shift;<br />
open FILE, $file or die \&#8221;no good $file open \\n\&#8221;;<br />
$mode = shift;</p>
<p>$cnt=0;<br />
while (&lt;FILE&gt;) {<br />
    chomp;   # just because!<br />
    if ( /^\\s*$/ ) {<br />
       print \&#8221;save: \&#8221;, $paths[0], \&#8221;\\n\&#8221;;<br />
       for($i=1; $i&lt;$cnt;$i++){<br />
           if($mode eq \&#8217;delete\&#8217;){ unlink ($paths[$i]); }<br />
           else { print \&#8221;delete: \&#8221; , $paths[$i], \&#8221;\\n\&#8221;; }<br />
       }<br />
       print \&#8221;\\n\\n\&#8221;;<br />
       $cnt=0;<br />
    }<br />
    else{<br />
       $paths[$cnt] = $_ ;<br />
       $cnt++;<br />
    }<br />
}</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: webmanaus</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7036</link>
		<dc:creator>webmanaus</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7036</guid>
		<description>&lt;p&gt;Ugh, backuppc uses de-duplication (of sorts) to store multiple versions of identical files on multiple remote systems with hard-links and is open source.&lt;/p&gt;
&lt;p&gt;Also, rsync uses some form of de-duplication at the file level when transferring files between remote systems, frequently used for backups as well, and also open source.&lt;/p&gt;
&lt;p&gt;Just two open source apps that I\&#039;ve been using for years, very mature, and work exceptionally well....
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Ugh, backuppc uses de-duplication (of sorts) to store multiple versions of identical files on multiple remote systems with hard-links and is open source.</p>
<p>Also, rsync uses some form of de-duplication at the file level when transferring files between remote systems, frequently used for backups as well, and also open source.</p>
<p>Just two open source apps that I\&#8217;ve been using for years, very mature, and work exceptionally well&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cringer</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7037</link>
		<dc:creator>cringer</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7037</guid>
		<description>&lt;p&gt;How about the open source BackupPC (backuppc.sourceforge.net), this has file level deduplication and compression, allowing me to backup over 8.5TB of data on a 1TB drive.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>How about the open source BackupPC (backuppc.sourceforge.net), this has file level deduplication and compression, allowing me to backup over 8.5TB of data on a 1TB drive.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: greimer</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7038</link>
		<dc:creator>greimer</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7038</guid>
		<description>&lt;p&gt;What\&#039;s the difference between deduplication and compression?
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>What\&#8217;s the difference between deduplication and compression?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bofh999</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7039</link>
		<dc:creator>bofh999</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7039</guid>
		<description>&lt;p&gt;Ahm, and what about the downsides????&lt;br /&gt;
I mean, for an professional youve to consider always the downsides too. but i can never read about them on any article here. (virtualisation nonsense specially)&lt;/p&gt;
&lt;p&gt;Downsides here are (special for an fs implemantation)... you have a higher (sometimes much higher) System load. Since HDD capacity is very very cheap it might be not suiteable to swap hdd space with cpu and ram load.&lt;/p&gt;
&lt;p&gt;second think about FS crash.&lt;br /&gt;
the chance getting rid of more data than traditional methods is clearly much higher.&lt;/p&gt;
&lt;p&gt;third. what is if youve to split your services to another server.&lt;br /&gt;
there youve to rethink youre hdd needs even for backup.&lt;/p&gt;
&lt;p&gt;lets say youve 100gb backup space. now you split the server and youve got a second backupset... and surprise you need 150 gb now because you had many common files which are now on different backup sets.&lt;/p&gt;
&lt;p&gt;only some quick assumptions..&lt;br /&gt;
shure the idea isnt new (windows has such feature a long time)&lt;br /&gt;
but im not a real fan of making additional much more complex changes so deep in the system.&lt;br /&gt;
specially for hdd capacity which is really unbeliveable cheap wen it will slowdown the performance and may complicate the menegement and failuler procedures
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Ahm, and what about the downsides????<br />
I mean, for an professional youve to consider always the downsides too. but i can never read about them on any article here. (virtualisation nonsense specially)</p>
<p>Downsides here are (special for an fs implemantation)&#8230; you have a higher (sometimes much higher) System load. Since HDD capacity is very very cheap it might be not suiteable to swap hdd space with cpu and ram load.</p>
<p>second think about FS crash.<br />
the chance getting rid of more data than traditional methods is clearly much higher.</p>
<p>third. what is if youve to split your services to another server.<br />
there youve to rethink youre hdd needs even for backup.</p>
<p>lets say youve 100gb backup space. now you split the server and youve got a second backupset&#8230; and surprise you need 150 gb now because you had many common files which are now on different backup sets.</p>
<p>only some quick assumptions..<br />
shure the idea isnt new (windows has such feature a long time)<br />
but im not a real fan of making additional much more complex changes so deep in the system.<br />
specially for hdd capacity which is really unbeliveable cheap wen it will slowdown the performance and may complicate the menegement and failuler procedures</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ttsiodras</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7040</link>
		<dc:creator>ttsiodras</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7040</guid>
		<description>&lt;p&gt;There is a way to implement deduplication at BLOCK-level using three open source technologies (used in my company, for daily backups)&lt;/p&gt;
&lt;p&gt;1. OpenSolaris backup server&lt;br /&gt;
2. ZFS snapshots&lt;br /&gt;
3. Rsync --inplace&lt;/p&gt;
&lt;p&gt;Notice that when you use \&quot;--inplace\&quot;, rsync writes directly on-top&lt;br /&gt;
of the already existing file in the destination filesystem, and ONLY&lt;br /&gt;
in the places that changed! This means that by using ZFS (which is&lt;br /&gt;
copy-on-write) you get the BLOCK-level deduplication that you are&lt;br /&gt;
talking about... Taking a cron-based daily ZFS snapshot completes&lt;br /&gt;
the picture.&lt;/p&gt;
&lt;p&gt;Using these tools, we are taking daily snapshots of HUGE VMWARE vmdk files that change in less than 1% of their contents on a daily basis,&lt;br /&gt;
using amazingly trivial space requirements (something like 3% of the size of the original vmdk is used for one month of daily backups).&lt;/p&gt;
&lt;p&gt;I believe that OpenSolaris/ZFS/\&quot;rsync --inplace\&quot; is a combination&lt;br /&gt;
that merits a place in your article.&lt;/p&gt;
&lt;p&gt;Kind regards,&lt;br /&gt;
Thanassis Tsiodras, Dr.-Ing.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>There is a way to implement deduplication at BLOCK-level using three open source technologies (used in my company, for daily backups)</p>
<p>1. OpenSolaris backup server<br />
2. ZFS snapshots<br />
3. Rsync &#8211;inplace</p>
<p>Notice that when you use \&#8221;&#8211;inplace\&#8221;, rsync writes directly on-top<br />
of the already existing file in the destination filesystem, and ONLY<br />
in the places that changed! This means that by using ZFS (which is<br />
copy-on-write) you get the BLOCK-level deduplication that you are<br />
talking about&#8230; Taking a cron-based daily ZFS snapshot completes<br />
the picture.</p>
<p>Using these tools, we are taking daily snapshots of HUGE VMWARE vmdk files that change in less than 1% of their contents on a daily basis,<br />
using amazingly trivial space requirements (something like 3% of the size of the original vmdk is used for one month of daily backups).</p>
<p>I believe that OpenSolaris/ZFS/\&#8221;rsync &#8211;inplace\&#8221; is a combination<br />
that merits a place in your article.</p>
<p>Kind regards,<br />
Thanassis Tsiodras, Dr.-Ing.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: hjmangalam</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7041</link>
		<dc:creator>hjmangalam</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7041</guid>
		<description>&lt;p&gt;Good intro article, with the exception of: \&quot;wet your appetite\&quot;  should be \&#039;whet your appetite\&#039;, as in to sharpen it. \&#039;wet\&#039; implies to dampen or lessen.&lt;/p&gt;
&lt;p&gt;The OSS BackupPC provides a crude level of dedupe via filesystem hard links .  Therefore it only works on the file level and only across a single file system (but note that cheap single filesystems easily range into the 10s-100s of TB). For small to medium installations, BackupPC and the like can work well.  It also can use rsync to transfer only changed blocks over the wire, which decreases bandwidth requirements.&lt;/p&gt;
&lt;p&gt;You might note that all this proprietary dedupe technology effectively locks you to a vendor-specific implementation, which reduces your ability to escape when the vendor decides to jack prices.&lt;/p&gt;
&lt;p&gt;Also notable is the falling-off-the-cliff price of disk.  It might take more disk to ignore dedupe, but if it can be addressed by very cheap, flexible storage, that may weigh in its favor, especially if using a no-cost (tho admittedly less efficient) mechanism like hard links and rsync. &lt;/p&gt;
&lt;p&gt;hjm
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Good intro article, with the exception of: \&#8221;wet your appetite\&#8221;  should be \&#8217;whet your appetite\&#8217;, as in to sharpen it. \&#8217;wet\&#8217; implies to dampen or lessen.</p>
<p>The OSS BackupPC provides a crude level of dedupe via filesystem hard links .  Therefore it only works on the file level and only across a single file system (but note that cheap single filesystems easily range into the 10s-100s of TB). For small to medium installations, BackupPC and the like can work well.  It also can use rsync to transfer only changed blocks over the wire, which decreases bandwidth requirements.</p>
<p>You might note that all this proprietary dedupe technology effectively locks you to a vendor-specific implementation, which reduces your ability to escape when the vendor decides to jack prices.</p>
<p>Also notable is the falling-off-the-cliff price of disk.  It might take more disk to ignore dedupe, but if it can be addressed by very cheap, flexible storage, that may weigh in its favor, especially if using a no-cost (tho admittedly less efficient) mechanism like hard links and rsync. </p>
<p>hjm</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mat_pass</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7042</link>
		<dc:creator>mat_pass</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7042</guid>
		<description>&lt;p&gt;Hi,&lt;br /&gt;
I have already worked on a such project, I have published all my java sources on a repository http://code.google.com/p/deduplication/
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Hi,<br />
I have already worked on a such project, I have published all my java sources on a repository <a href="http://code.google.com/p/deduplication/" rel="nofollow">http://code.google.com/p/deduplication/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lescoke</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7043</link>
		<dc:creator>lescoke</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7043</guid>
		<description>&lt;p&gt;Hash collisions in two different hash algorithms at the same time is highly unlikely.  Using two or more hash signatures would be slower, but would go a long ways towards avoiding a false file match.&lt;/p&gt;
&lt;p&gt;Les
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Hash collisions in two different hash algorithms at the same time is highly unlikely.  Using two or more hash signatures would be slower, but would go a long ways towards avoiding a false file match.</p>
<p>Les</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: johneeboy3</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7044</link>
		<dc:creator>johneeboy3</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7044</guid>
		<description>&lt;p&gt;I second previous comments about BackupPC. Whenever I see an article on such subjects (backup or deduping), it never ceases to amaze me how the awesome BackupPC project continually gets overlooked. &lt;/p&gt;
&lt;p&gt;It has been backing up 10+ systems to a central backup store here at our small business for years, and has compressed/deduped 1.7TB of backups into 220GB.&lt;/p&gt;
&lt;p&gt;Better yet, I\&#039;ve never been able to fault it. &lt;/p&gt;
&lt;p&gt;I\&#039;m intrigued by that other posters OpenSolaris+ZFS+rsync solution too. Very clever!
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I second previous comments about BackupPC. Whenever I see an article on such subjects (backup or deduping), it never ceases to amaze me how the awesome BackupPC project continually gets overlooked. </p>
<p>It has been backing up 10+ systems to a central backup store here at our small business for years, and has compressed/deduped 1.7TB of backups into 220GB.</p>
<p>Better yet, I\&#8217;ve never been able to fault it. </p>
<p>I\&#8217;m intrigued by that other posters OpenSolaris+ZFS+rsync solution too. Very clever!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ryannnnn</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7045</link>
		<dc:creator>ryannnnn</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7045</guid>
		<description>&lt;p&gt;Actually the concept of deduplication at the file system is not that new. Plan 9 from bell had this concept in 1995 with their Fossil and Venti file system components. Actually you can still use Venti in linux as part of the plan 9 user space tools.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Actually the concept of deduplication at the file system is not that new. Plan 9 from bell had this concept in 1995 with their Fossil and Venti file system components. Actually you can still use Venti in linux as part of the plan 9 user space tools.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nikratio</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7046</link>
		<dc:creator>nikratio</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7046</guid>
		<description>&lt;p&gt;S3QL (http://code.google.com/p/s3ql/) is another open source, de-duplicating file system. It&#039;s designed for only storage, but can also store locally if one is just interested in deduplication.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>S3QL (<a href="http://code.google.com/p/s3ql/" rel="nofollow">http://code.google.com/p/s3ql/</a>) is another open source, de-duplicating file system. It&#8217;s designed for only storage, but can also store locally if one is just interested in deduplication.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: indulis</title>
		<link>http://www.linux-mag.com/id/7535/#comment-7047</link>
		<dc:creator>indulis</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7535/#comment-7047</guid>
		<description>&lt;p&gt;One overlooked part of deduplication is recovery from backup.  If you have (say) 500 x 10GB files on a 1TB disk, and they are all identical, then you only use up 10GB = 99% free space.  When you restore all your system from backups, then either you need a backup/restore program that is dedup aware and does the dedupe as it restores, or you have to restore some of your files (in this example you can only restore 20% before you fill up your 1TB), run the dedupe software over the files you&#039;ve restored, then restore some more, run the dedupe again.  Repeat.  In other words, you would have to iterate your restore process.  Many technologies which save time/space in normal operations can have a large and negative effect during restores. People rarely think about the effect on of their idea on system recovery.  Restoring from backups may actually turn out to be close to impossible without installing sufficient disk to store the full amount of data that you originally had (i.e. the &quot;raw&quot; undeduplicated data size = 5TB).
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>One overlooked part of deduplication is recovery from backup.  If you have (say) 500 x 10GB files on a 1TB disk, and they are all identical, then you only use up 10GB = 99% free space.  When you restore all your system from backups, then either you need a backup/restore program that is dedup aware and does the dedupe as it restores, or you have to restore some of your files (in this example you can only restore 20% before you fill up your 1TB), run the dedupe software over the files you&#8217;ve restored, then restore some more, run the dedupe again.  Repeat.  In other words, you would have to iterate your restore process.  Many technologies which save time/space in normal operations can have a large and negative effect during restores. People rarely think about the effect on of their idea on system recovery.  Restoring from backups may actually turn out to be close to impossible without installing sufficient disk to store the full amount of data that you originally had (i.e. the &#8220;raw&#8221; undeduplicated data size = 5TB).</p>
]]></content:encoded>
	</item>
</channel>
</rss>