Looking for a mature, open source database that's not owned by Oracle? Whether you're already a fan or are looking for alternatives, now might be a good time to take a look PostgreSQL. The latest release of Postgres has a slew of built-in features that allow you to improve both performance and systems administration. Learn how you take advantage of them.
The roller coaster announcements coming from Oracle and Sun this week have a number of people questioning the fate of MySQL. While the GPL’d database probably isn’t going anyplace soon, now might be a good time to remind yourself that alternatives to MySQL do exist. And that some, in the case of PostgreSQL, are incredibly feature-rich.
With the beta version of PostgreSQL 8.4 just announced, testers are already downloading and trying out some of the newest features. As with every new release of PostgreSQL, there are performance improvements that you should be aware of, as well as new and useful configuration parameters available that will make your system administrator’s life easier. Let’s dive in.
If you’ve never used PostgreSQL before, now is a great time to try it out. PostgreSQL is a powerful, open source object-relational database system with more than than 15 years of active development. PostgreSQL is also known as ‘Postgres’, and will be referred to by that nickname for the rest of the article.
New Free Space Map and Visibility Map
The Free Space Map (FSM) improvements and the new Visibility Map will both significantly improve VACUUM performance. The FSM was re-implemented, and now does not require configuration, as in releases past. This data structure keeps track of the available free space inside of a table due to UPDATEs and DELETEs on your tables. You can throw out your pre-8.4 max_fsm_pages and max_fsm_relations settings, and relax while Postgres keeps track of this information for you!
The new Visibility Map provides information about which tuples were actually modified since the last time VACUUM was run. Now VACUUM can skip over tuples the Visibility Map knows have not changed. For infrequently updated tables, this results in fewer I/O operations per VACUUM, and much faster VACUUMs overall.
Heikki Linnakangas presented a talk about this feature at FOSDEM 2009 and his slides are available on the Postgres wiki.
How often you VACUUM can have a huge impact on the performance of your database. One important distinction that beginning administrators struggle with is the difference between a VACUUM FULL and a plain VACUUM. For regular maintenance, the autovacuum daemon should be used (enabled by default since version 8.2). Autovacuum will run regular VACUUMs on all your tables that take advantage of the FSM improvements.
A related tool is autoanalyze (also enabled by default), which keeps statistics used by Postgres to craft the best possible plans for your database queries. When these statistics get out of date, the planner has trouble choosing the best way of executing a query. On busy systems, increasing the frequency of autoanalyze by adjusting the autovacuum_vacuum_scale_factor down to ‘.1′ (default is ‘.2′) will keep your table statistics more up-to-date. Other adjustments are possible, including setting parameters for individual tables.
You only need VACUUM FULL when you determine that you have â€œbloat” in a table. This can happen because of a high number of updates, inserts and deletes that a typical schedule of VACUUMs can’t take care of. Most Postgres databases do not need this treatment regularly. In particular, scheduling daily VACUUM FULLs is not only not necessary, it can have a terrible performance impact on a busy system.
Dipping a toe slightly into the developer pool, a built-in function was added called suppress_redundant_updates_trigger(). When used with an UPDATE trigger, this function will prevent an update from occurring if what is being written is exactly the same as what is already there.
If the write occurs, a value was just updated unnecessarily and Postgres keeps (essentially) a record of that change until the next VACUUM.
You can use this like so:
CREATE TRIGGER your_trigger BEFORE UPDATE on your_table
FOR EACH ROW
EXECUTE procedure suppress_redundant_updates_trigger();'
You may see this kind of application behavior from an object relational mapping (ORM) tool.
One final note on bloat: there is a very useful tool called check_postgres that has a built-in bloat check. It works on its own, or as part of an overall monitoring or graphing system (such as Nagios or MRTG). There are many other checks included in that tool worth checking out related to performance monitoring and tuning.
Next: System Administration Features