dcsimg

Saving Yourself with Data Replication

Data can be the currency, Intellectual Property, and life blood of many a company. One technique to make sure that your data is readily available is data replication. Not quite the same as data backup but can be equally important.

One the best tools in a system administrator’s arsenal is rsync. It was developed by Andrew Tridgell of Samba fame. Rsync is a very cool tool that synchronizes files and directories between storage pools using minimal data transfers between the pools. It uses what is called delta encoding to help minimize the data transfer and can also use compression and recursion to make the synchronization very efficient. The key point is that rsync synchronizes file data between storage pools. This means that the underlying storage doesn’t have to be the same between the different storage pools.

Configuring rsync is very easy and there are lots of tutorials and articles on the web that show how to integrate it with various forms of authentication and transmission protocols such as ssh, pam, kerberos, and even can accommodate encryption. The first step in using rsync is to create two storage pools that are separated by some geographic distance (that’s the whole point of replication even if the distance is just a few feet). Ideally, the two pools should be identical but they don’t have to be in the case of rsync since replication happens at a file level (one advantage of rsync). So really you only need to have the same capacity in the secondary storage pool as in the primary pool.

The next step is to choose the primary storage which will run the rsync server (daemon). The secondary storage runs an rsync client. Based on the exact rsync command the rsync server will create the needed data and send it to the rsync client. When or how often the rsync server runs is up to you. You can put it in a cron file and choose the time interval between rsync operations. If no data has changed on the server since the last rsync then no data is sent (this could happen in the evening when most people, except for us technical types, are home watching the The Big Bang Theory).

The third step is to configure the rsyncd.conf file on the rsync server and create the rsync process (e.g. cron job). There are plenty of tutorials that describe this so I won’t discuss it in this article but it’s actually very easy to do and there are many, many articles that talk about how to use rsync with various security tools including encryption.

DRBD

DRDB (Distributed Replicated Block Device) is a mechanism for what is basically RAID-1 across a network. It takes the blocks from one storage pool and mirrors them on a different storage pool across a network (TCP based network). One of the key aspects of DRBD is that it is block based. Figure 1 below, taken from the DRBD website gives a fairly detailed overview of how it works.

overview_02.png
Figure 1: Overview of DRBD (from www.drbd.org)

The left hand side of the diagram is the primary storage pool. You can follow the data flow by tracking the orange arrows. As the data is written from the service at the top toward the actual disk at the bottom left, DRBD copies the data to the TCP/IP stack where is it sent to the secondary storage pool on the right hand side. It is grabbed by DRBD on the secondary storage pool and sent the actual storage devices. After the data is copied to the network by DRBD on the left hand side, it continues normally to the I/O scheduler and ultimately the disk.

If you haven’t noticed in Figure 2, all of these operations happen in kernel space (inside the box). For a long time DRBD existed as a set of patches outside the kernel. However, in 2.6.33, DRBD was included in the mainstream kernel.

In operation, DRBD layers block devices over existing block devices. For example, it layers a DRBD block device such as /dev/drbd1 over a physical or logical device such as /dev/sdb1. Then you use the DRBD block device for the file system. It is recommended that the underlying block device (e.g. /dev/sdb1) be a logical volume that is built using LVM. This allows you to grow the storage on the primary to meet capacity needs, but remember that you will have to also increase the capacity on the secondary storage pool to match the primary.

There are a number of tutorials on the web showing how to set up DRBD between two storage pools. Despite the fact that replication happens in the kernel, it’s actually fairly easy to configure and use DRBD.

Summary

This article has just been a quick overview of replication in Linux. Replication is the mechanism for making a copy of data from a primary storage pool to a geographically distant secondary storage pool. If you like, you can think of it as mirroring data over a network. The goal is to have a secondary storage pool that is an exact copy of the current set of data and can be used in the event that the primary storage pool becomes unavailable. As as result, many times replication will be used for disaster recovery.

A key point is that replication is fundamentally different from backups. A backup is designed to keep past versions of data that are available for restoration if needed. Plus a backup may not have the latest copy of the data whereas replication is designed to have a copy of the data that is as close as possible to the original. Theoretically you could use a backup to restore a complete storage pool but it would take a great deal of time and would be missing any changes in the data from when the storage pool went down relative to the last backup.

I briefly mentioned two replication options in Linux: (1) rsync, and (2) DRBD. Both are fairly easy to configure but they differ in one fundamental way – rsync is file based and DRBD is block based. Both accomplish the same goal of replication but DRBD, since it is in the kernel, has a smaller data “gap” than rsync. This means that the difference between the data in the primary storage pool and the secondary storage pool is smaller for DRBD based replication than rsync replication. How much smaller depends upon how rsync is configured, how much data is being replicated, and the network characteristics between the storage pools.

Replication is one of the mechanisms used in storage management if it is required to ensure that data is always available. That is, you have a copy of the data readily available so that if the first copy is lost you can still function. Replication is used in enterprise storage a great deal and can even be used for home use. If the data on your desktop or home server is important to you and you need to make sure that the current state of the data is available, then replication is something you can easily configure. Given the price of home storage and networks, it is fairly easy to configure a secondary storage pool for your home server. As a system administrator told me when I first became an admin, “It’s best to wear a belt and a pair of suspenders.”

Comments on "Saving Yourself with Data Replication"

We are a group of voluneers and opening a new
scheme in our community. Yourr website offered us with valuable information to work on. You’ve done a
formidable job and our entire community will be grateful tto you.

Here is my homepage; Cheap car insurance

I think the admin of this web page is genuinely working
hard in favor of his website, since here every stuff is quality
based material.

My blog :: Cheap car insurance

Hello friends, hhow is the whole thing,and what you woulld like to say about this post, inn my view its
in fact remarkable inn favor of me.

Look at my weblog … cheap car insurance

Have you ever thought about publishing an ebook or guest authoring on other sites?
I have a blog based on the same subjects you discuss annd would really like to
have youu share some stories/information. I know my
visitors would value your work. If you are
even remotely interested, feel free to shoot me an email.

My web page: cheap car insurance quotes

hey there and thank you for your info – I’ve certainly picked up something
new from right here. I did however expertise several technical issues using this
website, since I experienced to reload the site many times pprevious to
I could get it to lolad properly. I had been wondering if your web hosting is OK?
Not that I am complaining, but slow loading instances times will sometiumes affect your placement in google and can damage your
high-quality score if advertising and marketing with Adwords.
Welll I’m adding this RSS tto my email and can look
out for a lot more of your respective fascinating content.
Ensure that you update this again very soon.

My blog: car insurance cheap

Do you have a spam problem on this website; I alo am a blogger, and I
was wanting to know your situation; we have created some nice procedures
and we are looking to trade solutions with other folks, why not shoot me an e-mail iff interested.

Feel free to visit my site :: Cheap car insurance

Oh my goodness!Impressive article dude! Thank you, However I am having
problems with your RSS. I don’t know why I cannot subscribe to
it. Is there anybody getting the same RSS issues? Anyone who knows thee soloution wijll you kindly respond?

Thanks!!

My website: Cheap Car Insurance

Hi there colleagues, how is the whole thing,
and what you wish for tto say about this post, in my view its actually amazing in favor of me.

Feel free to surf to my web blog … Cheap car insurance

Thanks to my father who told me on the topic of this weblog, thijs webpage is really remarkable.

Look at my webpage … cheap car insurance

I’m really enjoying the design and layout of your blog.
It’s a verry easy on the eyes which makes it much more enjoyable
for me to come hefe and visit more often. Did you hire out a developer to create your
theme? Fantastic work!

Feel free tto surf to my web blog :: cheap car insurance in california

Hi there, I want to subscribe for this blog to obtain most recent updates, therefore where can i
do it plesse help.

My weblog: Cheap Car Insurance

Also visit my web site; mev4you.com

I am sure this piece of writing has touched all the internet people, its really really nice post
on building up new web site.

Have a look at my blog – Cheap car insurance

I am so grateful for your article post.Much thanks again. Really Great.

This is one awesome blog article.

Very shortly this website will be famous among
all blogging viewers, due to it’s pleasant articles

Visit my web blog – canadian pharmacy

Fantastic goods from you, man. I have understand your stuff previous to and you’re just too magnificent.
I actually like what you’ve acquired here, certainly like what
you’re saying and the way in which you say it. You make it enjoyable and you still take care of to keep it
wise. I can’t wait to read far more from you. This is actually a
tremendous website.

Also visit my web site order cialis online dream pharmaceutical

Hi there to every one, it’s actually a nice ffor me
to pay a quick visit this webb page, it contains valuable Information.

Feel free to surf to my blog :: dyson dc25 animal

No matter if some one searches for his essential thing,
therefore he/she desires to be available that in detail, thus that thing is maintained over here.

Here is my web page kevin dyson

Here are some hyperlinks to internet sites that we link to because we believe they’re worth visiting.

Usually posts some very interesting stuff like this. If you?re new to this site.

Every as soon as in a when we pick blogs that we study. Listed below would be the latest sites that we opt for.

Leave a Reply