Replication in MySQL

Master/slave replication first appeared in a beta release of MySQL back in 2000. In the three or so years since then, replication has become an essential feature for most of MySQL's high-end users. And contrary to many assumptions, MySQL's replication is quite easy to use, especially when compared to the replication systems that are part of high-end commercial databases. This month and next, let's take a look at MySQL's replication feature and the various ways you can put it to use.

Master/slave replication first appeared in a beta release of MySQL back in 2000. In the three or so years since then, replication has become an essential feature for most of MySQL’s high-end users. And contrary to many assumptions, MySQL’s replication is quite easy to use, especially when compared to the replication systems that are part of high-end commercial databases. This month and next, let’s take a look at MySQL’s replication feature and the various ways you can put it to use.

The Basics

MySQL’s replication systems allows multiple slave servers to stay in synch with changes to a single master server. Replicating your database has many advantages. For example, backup and recovery become easier. Rather than shutting down your master server to back it up, you can simply back up a slave. More importantly, replication makes it easier to scale large applications. By sending all write queries (INSERT, UPDATE, DELETE) to the master and using slaves for most of the read queries (SELECT), it’s possible to achieve nearly linear scalability for read-intensive applications.

There are two common techniques for implementing replication in a relational database server: copying logs or copying data. When copying logs, the master logs queries and sends them to the slave where they are replayed.

In contrast, data copying involves sending the changed data blocks from the master to the slaves. MySQL uses the log-based replication scheme.

Either method can be implemented either synchronously or asynchronously. Synchronous replication means that queries do not complete until the changes have been applied to all the servers involved. This has the benefit of guaranteeing that all servers have consistent copies of the data, but it can be a real performance problem if you need to replicate across long distances or slow network links. MySQL employs asynchronous log-based replication.

On the Master

To replicate, a MySQL master records all queries that change data to its binary log. The binary log is simply that — a non-ASCII log file in which queries are recorded along with other details about them, such as the database used, the return code, and so on.

To enable binary logging in MySQL, you simply add two lines to the [mysqld] section of your my.cnf file …

server-id = 1

… and then restart mysqld. (But don’t restart it just yet.)

The log-bin line tells MySQL to use the binary log. The server-id line assigns a numeric ID to the server. It’s not required, and if you don’t assign one, it will default to 1. However, it’s a good idea to make the server ID explicit on the master and slave(s).

That’s really all there is to it on the master for a basic replication setup. Next month, we’ll look at some of the more advanced options, including how MySQL uses the server ID.

On the Slave

The master has the easy job: it simply makes a note of anything it’s asked to do that results in a change to the data. The slave is responsible for connecting to the master, retrieving copies of the events recorded in the binary log and executing them.

There are two pieces to this process on the slave and each runs as a separate thread within MySQL. The I/O thread (sometimes called the relay thread) has one goal: to copy log events from the master. It connects to the master, reads events from the master’s binary log, and writes them locally to disk in the slave’s relay log. Because that’s all this thread does, it runs very efficiently and is virtually always up-to-date with respect to the master if your network is fast and healthy. Generally, it reads new entries from the master’s binary log milliseconds after they’re written.

The slave’s SQL thread picks up where the I/O thread leaves off. It reads the relay log and executes each query it finds, one at a time.

This dual-threaded design helps to ensure that the slave always has the most recent data, even if it hasn’t had a chance to process it. (In MySQL version 3.23, there was only a single replication thread on the slave. It would fetch each event from the master and then execute it locally. There was no relay log. The problem with that original approach was that a single slow query combined with a crash or shutdown of the master could result in the slave being out of date, with no hope of catching up until the master returns.)

That’s really all there is to it. The master records queries in its binary log, the slave’s I/O thread reads entries from the master’s binary log and records them in the relay log, and the slave’s SQL thread executes the queries which have been recorded in the relay log.

Slave Configuration

Setting up a slave can be a bit more involved than the master setup. We’ll look at two different scenarios here, but first we need to cover the basic configuration options.

There are several options required in the [mysqld] section of the configuration file. Here’s a sample:

server-id = 2
master-host = db-master.example.com
master-user = repl
master-password = c0pyING
master-connect-retry = 15

The settings are pretty self-explanatory. You need to assign a unique server ID to the slave, tell it the name or IP address of the master, and then specify the credentials needed to connect.

For this to work, you also need to have a user defined on the master with the necessary privileges, FILE and REPLICATION SLAVE. This SQL command provides the necessary rights:

-> ON *.* TO repl@”%”

Of course, to provide better security, replace % with the hostname or IP address of the slave.

The final option, master-connect-retry, tells the slave how often (in seconds) to attempt to reconnect to the master if it ever loses contact.

Starting from Scratch

It’s easiest to get replication running with a freshly installed MySQL master and slave. Before populating the servers with any data, take the following steps.

On the master: Add the replication user, as shown above. Add the master configuration file entries, as shown above. Then restart mysqld.

On the slave: Add the slave configuration file entries, as shown above. Restart mysqld.

After the slave restarts, things should work just fine. To verify that, connect to the slave and execute SHOW SLAVE STATUS. If the output looks like this…

Master_Host: db-master.example.com
Master_User: repl
Master_Port: 3306
Connect_retry: 15
Master_Log_File: binary-log.001
Read_Master_Log_Pos: 241452666
Relay_Log_File: relay-log.001
Relay_Log_Pos: 113733106
Relay_Master_Log_File: binary-log.001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_errno: 0
Skip_counter: 0
Exec_master_log_pos: 241452666
Relay_log_space: 113733106

… then all is well. The important values there are the two Yes entries, one for Slave_IO_Running and the other after Slave_SQL_Running. They indicate that both threads are running normally.

Adding Replication to an Existing Setup

Setting up a slave to replicate from a server that already contains data it a little more tricky. If you simply enable the binary log on the master and tell the slave to begin replicating, you won’t get far. The slave will attempt to execute queries against tables that don’t yet exist from its point of view.

To work properly, the slave needs to be given a copy of the data that existed on the master at the time the binary log was enabled. There are two basic ways to achieve this. First, if the master is idle (no write queries are being executed against it), you can execute a LOAD DATA FROM MASTER command on the slave after setting up the replication options. That instructs the slave to copy all of the tables from the master. Once this process completes, it’s safe to begin running write queries against the master. The only tables that are not replicated as part of the LOAD DATA process are those in the mysql database — the grant tables. If the slave needs those, you should replicate them manually.

Another option is to shut down the master, which you’ll need to do anyway to enable the binary log, and create a snapshot of the data. In other words, while the server is down, tar up the data directory and copy it to the slave. Once the copy is complete, restart the master to begin logging queries. The slave will then replicate queries from the master that were recorded just after the snapshot was created.

Those are the basics of MySQL replication. Next month we’ll look at more advanced topics, including replication topologies, filtering options, and digging into the logs.

Jeremy Zawodny plays with MySQL by day and spends his spare time flying gliders in California and is the author of High Performance MySQL. Reach him at: jeremy@zawodny.com.

Comments are closed.