Harder, Better, Faster, Stronger: New Features in Postgres 8.4

Looking for a mature, open source database that's not owned by Oracle? Whether you're already a fan or are looking for alternatives, now might be a good time to take a look PostgreSQL. The latest release of Postgres has a slew of built-in features that allow you to improve both performance and systems administration. Learn how you take advantage of them.

pg_restore -j

Systems administrators will rejoice in the new ‘-j’ option for pg_restore. Now you can specify the number of jobs that pg_restore will start in parallel for a restore. This is a brand new feature, so the authors are still working on figuring out what level of parallelization makes the most sense for a given workload. The rule of thumb for now is one job per processor.

One trick with a large database is to remove the constraints on tables before you restore, and then reapply them after, so that restore jobs don’t block against each other while the constraints are being checked. Remember to test this before you actually need to perform a production restore!

stats_temp_directory

Another useful new feature is the ability to specify a location for the statistics file. Previously, moving this file required a patch to the PostgreSQL source code and a re-compile. Now, there is a configuration parameter: stats_temp_directory. This allows you to relocate where statistics are written to a swap or memory filesystem, thereby reducing the disk I/O required for statistics gathering.

Determining that your statistics should be relocated could be determined through an application like atop, which can show per-process I/O utilization. Taking a sample of your I/O utilization during busy times could show you that your statistics writer is consuming more I/O than you’d like.

Statistics writer enhancement

Another statistics-related patch enables writing of statistics information only when a Postgres backend requests information from statistics tables. For example: through an interactive query of a statistics table or maybe an AUTOVACUUM. Previous versions of Postgres write this file every 500 ms. On a large, and very busy server, this overhead can be significant. No configuration is required to take advantage of the new behavior.

default_statistics_target default increased

A small change is that the default_statistics_target parameter was increased to 100 by default. Previously, this parameter was set to 10. ANALYZE uses this parameter to generate estimates of the selectivity of various WHERE clauses. Increasing this parameter too high can make ANALYZE very expensive, but testing and feedback from the PostgreSQL community convinced the development team to make this change.

The benefit is that your database will automatically be generating more detailed statistics for the query planner, and likely will make the query planner more informed and hopefully better at choosing the most efficient plan when executing your queries.

This was only a small slice of the new performance features in 8.4. There are more! Plus there are lots of new SQL and developer features like Common Table Expressions (recursive queries) and Windowing Functions.

More Information

Looking for more info on the latest version of PostgreSQL? Here are a few presentations about new features in 8.4 you might be interested in:

Magnus Hagander – “What’s new in 8.4
Robert Treat – “Intro to Postgres 8.4 Tutorial

Download the beta and try it out today!

Selena Deckelmann is a system administrator and co-organizer of the PostgreSQL Conference and PostgreSQL Conference East scheduled for this year on March 29 and 30, 2008.

Comments on "Harder, Better, Faster, Stronger: New Features in Postgres 8.4"

zx5000

My 5 years of experience with pgsql is that mysql beats the pants off it in production. Vacuum and pg_dump suck tremendous amounts of CPU and the sql syntax for show databases and tables is cryptic. Mysql’s select syntax is also much more intuitive. With pgsql I always have to refer back to the documentation to get anything done.

Reply
gromm

Mmm. Yes, because I wasn’t going to be doing anything important for the next year. I could just spend all my time converting our customers’ databases from MySQL to… *anything* else.

Are you daft? Switching databases in anything but a trivial application is a tremendous amount of work. Nevermind the enormous amount of downtime and trouble it would cause your users. I wouldn’t recommend it to anyone for any reason but for the most dire. But hey, if you’re just trying to keep your DVD collection at home and like to tinker, I say go for it.

Reply
unclesmrgol

My ten years of experience with both pgsql and mysql are somewhat the opposite of what zx5000 says — and I’m speaking from the standpoints of geo-location systems and web enterprise systems. vacuums can be deferred to a period of low usage — they are a lightweight form of database reorganization; mysql has the same requirements for all but one (NDB) of its backend engines — the others all need an occasional “OPTIMIZE TABLE” executed on high-use tables, at about the same frequency as a “VACUUM” or “VACUUM FULL” are needed (or, rather, desired) in postgres. postgres has had geocoordinate friendly datatypes for far longer than mysql has had them, and the inherent optimizations of postgres over mysql in searching a geo-polygon shows.

postgres also has built-in types for manipulating both IPV4 and IPV6 addresses — a terrific boon if you are using the database to drive things like a firewall. Mysql doesn’t — you use the “2xBIGINT” approach for IPV6 and BIGINT for IPV4, which means that you have to write your own netmask manipulation code for mysql, while that code is provided “for free” with postgres.

If you carefully examine the respective licenses, the mysql one is far closer to Stallman’s “all your code is owned by the community” ethos (unless you pay mysql through both nostrils for a certain type of unrestricted license), while the postgres code base is uniformly under the far less restrictive UC Regents “you can use it any way you want but don’t blame us if your critical application fails” license.

An examination of the referential integrity of foreign keys under the two database implementations is also instructive; PostgreSQL has it now, while MySQL has it slated for a future release.

In addition, the postgres people are much more careful about backwards compatibility. One of the applications I integrated which used mysql had a variable named “key” in a certain table, and that database functioned quite well for several years, until I updated mysql and painfully discovered (using the traditional database-dump-using-old-and-reload-using-new dance) that they’d swapped out the old sql interpreter for something which made the variable “key” invalid — via a rather cryptic syntax error when the table containing the variable was defined. Since the java and c/c++ DAO code had “key” liberally sprinkled throughout, it was a large task to relabel the variable and refactor the code which used it. I’ve never had anything like that happen with postgres — database dumps and reloads just plain work.

All that said, when you go to the databases section at the bookstore, expect to find twenty books on mysql for every one book on postgresql. Mysql is more widely used, even if they can’t figure out whether they ought to use the word “cluster” or the word “database”.

Even the speed title zx5000 touts is up for grabs — read this for a counter to your assertion.

Reply
artacus

> With pgsql I always have to refer back to the documentation to get anything done.

People come into open source RDBMS development from several paths. Some folks start out with Access or FileMaker and out grow it. Others come from the Oracle or DB2 side looking for a cheaper alternative.

The first group has a tendency to gravitate toward MySQL. They tend not to appreciate the value views, stored procedures, transactions, or triggers; and instead do the bulk of their “database coding” on the middle tier (read PHP). MySQL doesn’t follow the SQL standard as closely. They’ve got a lot of shortcuts. They’re helpful, but if you only learn the shortcuts and not the “standards” way, you end up with a skillset that isn’t transferable to other RDBMS’s… or end up always looking up the documentation, as you put it.

The developers coming from Oracle/DB2/MSSQL are going to gravitate toward Postgres. They wouldn’t consider writing an application without views and stored procedures. Postgres generally follows the SQL pretty closely.

I’ve used a dozen or more different RDBMS’s over the years. But I develop mostly on Oracle. When I’m consulting and the client can’t afford Oracle, it’s Postgres, end of discussion. Oracle and Postgres are the most alike of any two databases I’ve used.

We do a lot of advanced stuff with XML so the only databases that can even handle it are Oracle, DB2 and Postgres. (Technically MS SQL can do it… if you can stand the masochistic syntax)

As for the speed difference, there is not much if you are using version 8 or better of Postgres. And lets be sure you are comparing apples to apples. Use a storage engine that supports transactions, foreign key constraints and full text search… Oh wait, MySQL doesn’t have that apple.

Reply
swlgc

Postgres is for serious applications that require transaction, xml, stored procedures, etc. When you like to do the data handling work within database system instead of your c/c++/php code, Postgres is the one you need.

For simplicity and ease of use, mysql or sqlite is the choice. imho

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>