The folks at Red Hat recently selected the open source PostgreSQL database as the foundation for their commercial Red Hat Database product. This decision, however, was not made without a good deal of whining from the ranks of the MySQL faithful, who weren’t able to fully comprehend why it was that their baby had been passed over.
After all, MySQL was faster, better, cheaper, easier to install, and all those other important buzzwords. PostgreSQL was an oddity: buggy, unsupported, no longer in development, unable to easily deal with large data, and (gasp) requiring a separate process for each connection instead of being nicely threaded.
It’s obvious to me that those who discount PostgreSQL in favor of MySQL haven’t taken a look at PostgreSQL recently. I too was a “MySQL all the way” person until some fellow Perl hackers convinced me to take another look. Making one last comparison before getting to the meat of this month’s column, my conclusion is that MySQL is “putting a bit of structure onto a flat file,” where PostgreSQL is “an open source Oracle replacement at a fraction of the cost.”
To commemorate the successful installation of PostgreSQL on my system, I wanted to tackle a little project. While racking my brain for a worthy project, someone on the Perl IRC channel mentioned a “word of the day” program; this inspired me to create one with PostgreSQL.
A DBM hash could easily hold the words and their definitions, but a database could also easily hold a per-user “I’ve already seen that word” table. That way, we wouldn’t end up seeing the same word twice. In addition, someone recently showed me the trick:
SELECT word FROM list ORDER BY random() LIMIT 1;
to get a random entry, so I was hoping to put that into a program at some point. This works by first sorting the list according to a random value, effectively shuffling it, and then selecting just the first entry.
Of course, after tinkering with it a bit, I decided that there was absolutely no reason not to make it a CGI script and a crontab-able program, so I went ahead and threw in HTML-cleaning of the definitions.
My next step was then to figure out which dictionary I should use. I was originally pointed at Lingua::Wordnet and got detoured for an hour, trying to install and understand that. I discussed this online, and someone mentioned FOLDOC, the dictionary of computing terms (both historical and current), as a good source of pertinent information. So I checked FOLDOC, and it indeed turned out to be a very good source for this little project.
The program flows as follows: On each invocation, the current FOLDOC flat file is mirrored into a local cache. If the local cache was updated, the file is parsed into words and definitions and placed in the PostgreSQL database in a simple two-column table.
Next, a “deck” is consulted for every user (keyed by Unix userid) who invokes the program. Initially, the deck is empty, so a new wordlist is created in a random order by “shuffling” a copy of the terms from the dictionary. This is done using SQL similar to (for the user “merlyn”):
INSERT INTO deck (word, person)
SELECT word, merlyn
FROM foldoc
ORDER BY random()
The first entry for this user is pulled off and deleted from the deck. The definition is then pulled from the dictionary table. If the invocation was from the command line, the word and definition are displayed with minimal reformatting. However, if CGI invocation is detected, then an HTML massaging locates URLs and e-mail addresses, as well as fixes up HTML entities.
Now comes the really cool part of PostgreSQL. All this shuffling and updating of the dictionary is being done within transactions that do not block the other readers! If the dictionary is stale, the other readers see the dictionary instantaneously change from the old dictionary to the new dictionary, without blocking. Try that with MySQL. PostgreSQL provides the concurrent, consistent views that only big guys like Oracle and Interbase have been able to provide in the past.
Enough on that? Let’s look at the code in Listing One.
Linux Magazine /
December 2001 / PERL OF WISDOM
Putting PostgreSQL Through its Paces