Harder, Better, Faster, Stronger: New Features in Postgres 8.4

Looking for a mature, open source database that's not owned by Oracle? Whether you're already a fan or are looking for alternatives, now might be a good time to take a look PostgreSQL. The latest release of Postgres has a slew of built-in features that allow you to improve both performance and systems administration. Learn how you take advantage of them.

The roller coaster announcements coming from Oracle and Sun this week have a number of people questioning the fate of MySQL. While the GPL’d database probably isn’t going anyplace soon, now might be a good time to remind yourself that alternatives to MySQL do exist. And that some, in the case of PostgreSQL, are incredibly feature-rich.

With the beta version of PostgreSQL 8.4 just announced, testers are already downloading and trying out some of the newest features. As with every new release of PostgreSQL, there are performance improvements that you should be aware of, as well as new and useful configuration parameters available that will make your system administrator’s life easier. Let’s dive in.

If you’ve never used PostgreSQL before, now is a great time to try it out. PostgreSQL is a powerful, open source object-relational database system with more than than 15 years of active development. PostgreSQL is also known as ‘Postgres’, and will be referred to by that nickname for the rest of the article.

New Free Space Map and Visibility Map

The Free Space Map (FSM) improvements and the new Visibility Map will both significantly improve VACUUM performance. The FSM was re-implemented, and now does not require configuration, as in releases past. This data structure keeps track of the available free space inside of a table due to UPDATEs and DELETEs on your tables. You can throw out your pre-8.4 max_fsm_pages and max_fsm_relations settings, and relax while Postgres keeps track of this information for you!

The new Visibility Map provides information about which tuples were actually modified since the last time VACUUM was run. Now VACUUM can skip over tuples the Visibility Map knows have not changed. For infrequently updated tables, this results in fewer I/O operations per VACUUM, and much faster VACUUMs overall.

Heikki Linnakangas presented a talk about this feature at FOSDEM 2009 and his slides are available on the Postgres wiki.

How often you VACUUM can have a huge impact on the performance of your database. One important distinction that beginning administrators struggle with is the difference between a VACUUM FULL and a plain VACUUM. For regular maintenance, the autovacuum daemon should be used (enabled by default since version 8.2). Autovacuum will run regular VACUUMs on all your tables that take advantage of the FSM improvements.

A related tool is autoanalyze (also enabled by default), which keeps statistics used by Postgres to craft the best possible plans for your database queries. When these statistics get out of date, the planner has trouble choosing the best way of executing a query. On busy systems, increasing the frequency of autoanalyze by adjusting the autovacuum_vacuum_scale_factor down to ‘.1′ (default is ‘.2′) will keep your table statistics more up-to-date. Other adjustments are possible, including setting parameters for individual tables.

You only need VACUUM FULL when you determine that you have “bloat” in a table. This can happen because of a high number of updates, inserts and deletes that a typical schedule of VACUUMs can’t take care of. Most Postgres databases do not need this treatment regularly. In particular, scheduling daily VACUUM FULLs is not only not necessary, it can have a terrible performance impact on a busy system.

suppress_redundant_updates_trigger()

Dipping a toe slightly into the developer pool, a built-in function was added called suppress_redundant_updates_trigger(). When used with an UPDATE trigger, this function will prevent an update from occurring if what is being written is exactly the same as what is already there.

If the write occurs, a value was just updated unnecessarily and Postgres keeps (essentially) a record of that change until the next VACUUM.

You can use this like so:

CREATE TRIGGER your_trigger BEFORE UPDATE on your_table
FOR EACH ROW
EXECUTE procedure suppress_redundant_updates_trigger();'

You may see this kind of application behavior from an object relational mapping (ORM) tool.

One final note on bloat: there is a very useful tool called check_postgres that has a built-in bloat check. It works on its own, or as part of an overall monitoring or graphing system (such as Nagios or MRTG). There are many other checks included in that tool worth checking out related to performance monitoring and tuning.

Next: System Administration Features

Comments on "Harder, Better, Faster, Stronger: New Features in Postgres 8.4"

zx5000

My 5 years of experience with pgsql is that mysql beats the pants off it in production. Vacuum and pg_dump suck tremendous amounts of CPU and the sql syntax for show databases and tables is cryptic. Mysql’s select syntax is also much more intuitive. With pgsql I always have to refer back to the documentation to get anything done.

Reply
gromm

Mmm. Yes, because I wasn’t going to be doing anything important for the next year. I could just spend all my time converting our customers’ databases from MySQL to… *anything* else.

Are you daft? Switching databases in anything but a trivial application is a tremendous amount of work. Nevermind the enormous amount of downtime and trouble it would cause your users. I wouldn’t recommend it to anyone for any reason but for the most dire. But hey, if you’re just trying to keep your DVD collection at home and like to tinker, I say go for it.

Reply
unclesmrgol

My ten years of experience with both pgsql and mysql are somewhat the opposite of what zx5000 says — and I’m speaking from the standpoints of geo-location systems and web enterprise systems. vacuums can be deferred to a period of low usage — they are a lightweight form of database reorganization; mysql has the same requirements for all but one (NDB) of its backend engines — the others all need an occasional “OPTIMIZE TABLE” executed on high-use tables, at about the same frequency as a “VACUUM” or “VACUUM FULL” are needed (or, rather, desired) in postgres. postgres has had geocoordinate friendly datatypes for far longer than mysql has had them, and the inherent optimizations of postgres over mysql in searching a geo-polygon shows.

postgres also has built-in types for manipulating both IPV4 and IPV6 addresses — a terrific boon if you are using the database to drive things like a firewall. Mysql doesn’t — you use the “2xBIGINT” approach for IPV6 and BIGINT for IPV4, which means that you have to write your own netmask manipulation code for mysql, while that code is provided “for free” with postgres.

If you carefully examine the respective licenses, the mysql one is far closer to Stallman’s “all your code is owned by the community” ethos (unless you pay mysql through both nostrils for a certain type of unrestricted license), while the postgres code base is uniformly under the far less restrictive UC Regents “you can use it any way you want but don’t blame us if your critical application fails” license.

An examination of the referential integrity of foreign keys under the two database implementations is also instructive; PostgreSQL has it now, while MySQL has it slated for a future release.

In addition, the postgres people are much more careful about backwards compatibility. One of the applications I integrated which used mysql had a variable named “key” in a certain table, and that database functioned quite well for several years, until I updated mysql and painfully discovered (using the traditional database-dump-using-old-and-reload-using-new dance) that they’d swapped out the old sql interpreter for something which made the variable “key” invalid — via a rather cryptic syntax error when the table containing the variable was defined. Since the java and c/c++ DAO code had “key” liberally sprinkled throughout, it was a large task to relabel the variable and refactor the code which used it. I’ve never had anything like that happen with postgres — database dumps and reloads just plain work.

All that said, when you go to the databases section at the bookstore, expect to find twenty books on mysql for every one book on postgresql. Mysql is more widely used, even if they can’t figure out whether they ought to use the word “cluster” or the word “database”.

Even the speed title zx5000 touts is up for grabs — read this for a counter to your assertion.

Reply
artacus

> With pgsql I always have to refer back to the documentation to get anything done.

People come into open source RDBMS development from several paths. Some folks start out with Access or FileMaker and out grow it. Others come from the Oracle or DB2 side looking for a cheaper alternative.

The first group has a tendency to gravitate toward MySQL. They tend not to appreciate the value views, stored procedures, transactions, or triggers; and instead do the bulk of their “database coding” on the middle tier (read PHP). MySQL doesn’t follow the SQL standard as closely. They’ve got a lot of shortcuts. They’re helpful, but if you only learn the shortcuts and not the “standards” way, you end up with a skillset that isn’t transferable to other RDBMS’s… or end up always looking up the documentation, as you put it.

The developers coming from Oracle/DB2/MSSQL are going to gravitate toward Postgres. They wouldn’t consider writing an application without views and stored procedures. Postgres generally follows the SQL pretty closely.

I’ve used a dozen or more different RDBMS’s over the years. But I develop mostly on Oracle. When I’m consulting and the client can’t afford Oracle, it’s Postgres, end of discussion. Oracle and Postgres are the most alike of any two databases I’ve used.

We do a lot of advanced stuff with XML so the only databases that can even handle it are Oracle, DB2 and Postgres. (Technically MS SQL can do it… if you can stand the masochistic syntax)

As for the speed difference, there is not much if you are using version 8 or better of Postgres. And lets be sure you are comparing apples to apples. Use a storage engine that supports transactions, foreign key constraints and full text search… Oh wait, MySQL doesn’t have that apple.

Reply
swlgc

Postgres is for serious applications that require transaction, xml, stored procedures, etc. When you like to do the data handling work within database system instead of your c/c++/php code, Postgres is the one you need.

For simplicity and ease of use, mysql or sqlite is the choice. imho

Reply

Postgres is for serious applications that require transaction, xml, stored procedures, etc. When you like Cueb Answers to do the data handling work within database system instead of your c/c++/php code, Postgres is the one you need.

Reply

Definitely would love to start a website like yours. Wish I had the time. My site is so amateurish compared to yours, feel free to check it out: http://tinyurl.com/o55af8p Alex :)

Reply

Then, first off, make sure you’ve chosen a mature and interesting username.

If you ever wanted to send a video greeting or add
visual effect to your webcam videos, then Web
- Cam – Max is the software program for you. It’s easier
to find thoughts and such to the other side now using web-cam chat because then you
can just speak it out to your other much like you need
to do in person.

Reply

i just havee a pair of pairs ! short-term chestnut & bailey link chocolatee & i want elder scrolls online gold < A variety of . they always keep people design and then very good all day long ! presently there value the money ! simply put i plann to gain land violet or simply a red partners in any number monthss . : )

Reply

Sustainable energy Solutions Intended for Homes

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>