In March's "LAMP Post" column, we started to look at MySQL's replication subsystem. We covered how replication works, as well as putting it to use by configuring the master and slave(s). This month, in this inaugural "MySQL" column, let's spend some time looking at the lesser known aspects of MySQL replication, including filtering and log inspection.
In March’s “LAMP Post” column, we started to look at MySQL’s replication subsystem. We covered how replication works, as well as putting it to use by configuring the master and slave(s). This month, in this inaugural “MySQL” column, let’s spend some time looking at the lesser known aspects of MySQL replication, including filtering and log inspection.
Filtering on the Master
There are times when you’d rather not replicate all the data from a master to its slave(s). For instance, your master database server may contain test or pre-production databases and typically these databases, much like the /tmp directory on Linux, really don’t need to survive after a disaster. Rather than wasting the bandwidth and disk space to replicate them, you can tell MySQL to simply ignore them. You can accomplish this by adding one or more configuration directives to the my.cnf file on the master.
Let’s assume that you’re using the test database on your master, but don’t wish to log any changes to tables in test to the master’s binary log. Simple add a line like binlog-ignore -db=test to my.conf file. Of course, you’ll need to restart MySQL for the change to take effect. If you want to ignore multiple databases, either specify a list…
… or add one entry per database:
If, on the other hand, you’d rather provide a list of databases to replicate and let MySQL ignore the rest, use the binlog-do-db directive to specify them.
binlog-do-db tends to be the most practical if you have a small, fixed number of databases to replicate or when it’s likely that someone will be creating temporary databases on the master that need not be replicated.
Now, you may be wondering how MySQL decides what to do when filtering is involved. Typically, you’re not going to specify a database in every query you issue. You can do so by fully qualifying every table or column in dbname.tablename.colname format, but that’s tedious and not necessary.
Instead, most software connects to a specific database and uses that one database for the duration of the connection. In PHP, for example, you might issue a mysql_connect() call followed by a mysql_select_db() to set the default database for that connection. Or, if you’re using the mysql command-line client, you may specify a database name on the command-line or employ the USE command to switch databases after connecting, as in USE testdb;.
No matter how you do it, MySQL considers only the database you’ve selected as the default database when evaluating binary log filtering rules. While this works quite well 98 percent of the time, there may be times when you need to perform cross-database operations or when you’re lazy and change data in one database while actually using a different one as your default.
To make this more concrete, let’s assume you’ve connected to MySQL and are using the sessions database. If the last three binlog-do-db rules (above) are in effect and you issue a query like this…
DELETE FROM oldstuff.user_list
… MySQL will still write the query to the binary log. Why? Because you were connected to (or USEing) the sessions database, which is in the list of database you asked it to log changes to.
MySQL does not evaluate the query you’re executing when deciding whether to log it. Even though this query affected only a database that you asked it not to log, it doesn’t bother to notice.
The correct approach is to use the correct database when issuing queries that change data:
DELETE FROM user_list
Otherwise you’ll end up with a nasty surprise. Because the query was logged to the binary log, the slave’s I/O thread copies it to its relay log and the slave’s SQL thread attempts to execute it. However, odds are that the slave doesn’t have a copy of the oldstuff database (otherwise you’d probably have asked for it to be replicated), so the SQL thread will record an error in the MySQL error log and then stop executing queries from the relay log. You’ll then need to fix the problem.
The easiest solution is to ask the slave to skip ahead to the next query in the relay log and then continue executing:
SET SQL_SLAVE_SKIP_COUNTER = 1;
SLAVE START SQL_THREAD;
Then, of course, go find out what (or who) was too lazy to change databases when executing the DELETE in the first place.
Filtering on the Slave
So far, we’ve only considered filtering on the master’s end of replication. However, it’s also possible to apply filtering on the slave. This is important because you may need to have different policies for different slaves.
For example, your backup master may need to replicate everything that the master has, while your slaves that live 1,500 miles away need only a subset of the data. The good news is that you have quite a bit more flexibility on the slave’s side of things.
To start with, you can perform per-database filtering, just like on the master. The difference is that the directives begin with replicate rather than binlog. So, to allow the database users to replicate from a master, you could specify:
Or, to prevent replication of temporary, you could use:
But on the slave end you have even more granularity. Filtering (inclusive and exclusive) can be done down to the table level using these two directives…
… and to make things even more interesting, you can provide wildcard base “do” and “ignore” rules too.
In the wildcard directives, you may use the standard SQL wildcard characters (_ (“underscore”) for a single character and % for multiple characters) in both the database and table names.
Finally, you even have the option of rewriting the database name on the fly:
That tells the slave that any queries run in the oldname database on the master should be executed against newname on the slave.
One important aspect of slave-side filtering is that the slave has to see the queries before it can decide to either skip or execute them. That means slave filtering still uses up some of your network bandwidth. The slave’s I/O thread must copy all of the queries from the master’s binary log into the slave’s relay log. Only then can the slave’s SQL thread check each query against the “do” and “ignore” rules.
Have you ever wondered what’s inside the binary log? In addition to the SQL for each query, the master records a few extra bits of metadata as well. You can use the mysqlbinlog tool to convert binary logs back into human- (and machine-) readable format.
$ mysqlbinlog binary-log.002
That command might produce output that looks like this (wrapped and reformatted for readability):
#040304 5:34:04 server id 103 log_pos
That first entry provides, as a comment, the date and time of the query, the id of the server it was first executed on, the log position, and the type of event. In this case, Intvar means that one of MySQL’s internal variables is being set — the INSERT_ID is used to tell a slave the auto-increment ID to be used in the next query it sees.
#040304 5:34:04 server id 103 log_pos
insert into history values (NULL,
This second entry again provides some interesting metadata as a comment before the query.
In this case, since it’s a Query event, we also see the id of the thread that originally executed it, as well as the execution time (in seconds) and the error code.
Then we see a timestamp being set. The server records the time at which every query is executed. Finally, we see the actual query. The NULL is presumably going to be replaced by the insert id seen in the first entry.
Jeremy Zawodny plays with MySQL by day and spends his spare time flying gliders in California. He is also the author of
High Performance MySQL (O’Reilly and Associates, 2004). Reach him at Jeremy@Zawodny.com.