In September 2002's "LAMP Post" column (http://www.linux-mag.com/2002-09/lamp_01.html), we briefly touched on the idea of using multiple storage backends (table types) in MySQL:
In September 2002′s “LAMP Post” column (http://www.linux-mag.com/2002-09/lamp_01.html), we briefly touched on the idea of using multiple storage backends (table types) in MySQL:
MySQL, like the Linux kernel, is modular. You can disable and even remove pieces that you have no need for. Modularity affords MySQL a very important advantage over many other database management systems: you can choose the type of each table at the time it is created. If a few of your tables need fine-grained locking or transactions, you can choose the table type that best suits those needs. You don’t need to have the overhead of transactions on all of your tables. Few other relational database systems offer multiple table types.
That column went on to discuss the various features of InnoDB tables — one of the most important additions to MySQL version 4.0. However, we never spent much time really looking at the variations of MyISAM tables that exist in MySQL. So, let’s do that here.
As the default table type in MySQL, MyISAM tables probably persist more than 95% of all MySQL data worldwide. (Version 3.23 of MySQL introduced MyISAM tables as a replacement for the older ISAM table format.) MyISAM is now the default because its combination of speed and simplicity works remarkably well for the vast majority of applications (often web sites) that are built using MySQL.
MyISAM tables have very little storage overhead. If you create a table to store 600 bytes per row, you’ll find that MySQL adds only a few extra bytes to each row. The underlying storage for MyISAM tables is quite transparent, too. Every table is compose of three files. So, if you create a table named mytable, MySQL produces:
* mytable.frm to record the definition of the table, such as column names, types, and indexes
* mytable.MYD to store the rows (or data) of the table; and
* mytable.MYI to maintain the indexes for the table
This makes it very easy to see what’s going on: how quickly tables are growing, which ones have changed recently, and so on.
MyISAM tables also have some handy features that make building web applications easier. For example, SELECT COUNT(*) FROM mytable runs in constant time, making it very easy to tell a user, “You’re viewing entries 1-10 of 6,213,225.” Of course, you’d then subsequently use MySQL’s LIMIT clause to move thru the results one page at a time:
SELECT * FROM mytable ORDER BY foo LIMIT 0, 10
SELECT * FROM mytable ORDER BY foo LIMIT 10, 10
SELECT * FROM mytable ORDER BY foo LIMIT 20, 10
Full-text search is also available only in MyISAM tables. Adding efficient keyword searches to your MySQL applications has never been easier (See the feature “MySQL 4.x” in the January 2003 issue for more details. It’s available online at http://www.linux-mag.com/2003-01/mysql_01.html.)
Speaking of indexes, MyISAM’s indexes are surprisingly compact. The MySQL developers went to great effort to make sure you can fit information about the maximum possible number of rows in memory at a given time.
MyISAM’s compact row storage and indexes yield MySQL’s most important feature: speed. MySQL spends very little time shuffling bytes around or reading unnecessary data when you ask it to find a row in a MyISAM table. As a result, MyISAM tables are blazingly fast compared to most other disk-based relational database engines — and even some commercial in-memory databases.
Two of the downsides of MyISAM tables (which result in a move to InnoDB for some applications) are locking and the lack of transactions.
MyISAM tables use table-level locking. SELECT queries use shared locks, while all write queries (INSERT, UPDATE, and DELETE) set exclusive locks.
While this may sound like a disaster waiting to happen, remember that MyISAM tables are fast. That means most locks are held only for very short periods of time — unless you happen to run a very poorly optimized query. Lock contention really doesn’t become an issue until the load on the server gets quite high or the ratio of read queries to write queries begins to even out.
The first variation of MyISAM tables is primarily useful to archive or distribute a database. Compressed (or packed) MyISAM tables are read-only tables that have been converted using the myisampack command-line utility. Here’s an example of how to compress a table:
Since compressed tables are read-only, the only way to change a compressed table is to dump and reload the data, make changes, and then re-compress it. Typical size reductions for MyISAM packed tables range from 20% to 60% depending on the data being stored.
If you have a collection of data you’d like to shrink and archive for very occasional use, compressed MyISAM tables are ideal. Yes, you can get better compression from gzip or bzip2, but you’d also have to manually uncompress the files every time you wanted to query the tables.
When distributing MySQL data on CD-ROM or DVD-ROM, compressed MyISAM tables are perfect. End users can use the data without needing to copy it to their hard disk and decompress it. In fact, it’s possible to distribute a fully running MySQL distribution on CD if necessary.
MyISAM Merge Tables
Sometimes you find yourself using MySQL for heavy duty logging applications. MyISAM tables are great for that. Their efficient storage means you won’t waste a lot of disk space, and their innate speed means you can handle thousands of INSERTs per second. But managing that growing mass of data can be problematic.
Rather than keep a single table for a year’s worth of data, it’s a lot easier to manage one table per month, week, or even day. Merge tables provide the best of both worlds. A merge table is essentially a “virtual table” that represents the combined data from two or more tables with identical structures.
Merge tables are best illustrated with an example. Let’s assume you have an application than logs events to MySQL. You’d like to be able to query a single week of data at once, but you’d like each day to have its own table. Merge tables let you do that:
CREATE TABLE day_01 … TYPE=MyISAM
CREATE TABLE day_02 … TYPE=MyISAM
CREATE TABLE day_03 … TYPE=MyISAM
CREATE TABLE day_04 … TYPE=MyISAM
CREATE TABLE day_05 … TYPE=MyISAM
CREATE TABLE day_06 … TYPE=MyISAM
CREATE TABLE day_07 … TYPE=MyISAM
CREATE TABLE week_01 … TYPE=MERGE
UNION=(day_01, day_02, day_03, day_04,
day_05, day_06, day_07, )
(Of course, you should substitute the actual column definitions for ….)
With that in place, you can build your application with the logic necessary to insert into the proper daily table, while running weekly reporting queries against the weekly table. A single MyISAM table can even belong to multiple merge tables, so there’s no reason you can’t also create monthly or even quarterly merge tables as well.
Now, when it comes time to remove or archive old data, you can easily DROP tables containing old data or use myisampack to compress them. MySQL will happily combine regular or compressed MyISAM tables into a merge table. Had you kept all of your data in a single table, deleting a day’s worth of data could take quite a bit of time and would likely result in fragmentation of the data within the table. It’d also cause the application to stop and wait while it waited for the DELETE query to complete.
Other database servers let you do something similar to this using views. But until MySQL merge tables work just as well.
The final variety of MyISAM tables isn’t widely known, but can be useful in some circumstances. RAID tables are simply MyISAM tables whose data (.MYD) files have been broken into multiple files.
CREATE TABLE mytable … TYPE=MyISAM
Running that SQL command breaks the MyISAM table into 4 separate files (chunks) that are written to in a round-robin fashion in 8 KB stripes.
Why do that? If you’re using a filesystem that places a limit on how large a file can be (2 GB or 4 GB), RAID tables work around that limitation. (However, file size limits are becoming more and more rare, as recent Linux kernels don’t have size limits anywhere near that low.)
Performance is another reason for RAID tables. By putting each chunk on a separate physical disk, you can spread the I/O work out more evenly. In most MySQL installations, the main bottleneck is disk I/O, so this isn’t very far fetched. However, if you have the option of using hardware RAID or even software RAID at the OS level, you’re probably better off doing so.
Jeremy Zawodny plays with MySQL by day and spends his spare time flying gliders in California and writing a MySQL book for O’Reilly & Associates. Reach him at Jeremy@ Zawodny.com.