A Linux Virtual Server cluster is a highly-scalable and highly-available network service cluster built on a set of real servers. Here's how they work, and how you can set one up yourself.
I woke up today with recall on my mind. No, not the drama playing out in California these days -- the recall I have in mind has much more significance. Hear me out.
In Paul Graham's now famous article, "A Plan for Spam" (http://www.paulgraham.com/spam.html), Graham argued for a much different and radically simplified approach to spam filtering. Instead of using extensive rule-based schemes. Graham suggested using a statistical approach that learned from your e-mail. Shortly after, Bayesian mail filters begun popping up everywhere. This month let's look at SpamBayes, one of the most popular and effective Bayesian tools around.
Linux systems use text pervasively and provide an almost- infinite number of tools to manipulate it. This month, let's look at three lesser-known text handling tools: the line editors ex (which is usually part of the vi editor) and ed, and the stream editor, sed.
Last month's column introduced Condor and presented a sample installation of the software package in a cluster environment. Condor is a system that creates a "high-throughput computing" environment by effectively utilizing computing resources from a pool of cluster nodes and disparate workstations distributed around a network. Like many batch queuing systems, Condor provides a queuing mechanism, scheduling policy, job priority scheme, and resource classification. Unlike most other batch systems, Condor doesn't require dedicated compute servers.
Qt is a set of C++ libraries and development tools commercially developed and distributed by Trolltech of Norway. Qt is available for different platforms (Win32, most flavors of Unix/Linux, as well as Mac OS X), and in several editions, including the Free Edition that's free of charge for the development of free and open source software. Trolltech's Free Edition has all of the same features of its Enterprise Edition -- at no cost!
PHP (a recursive acronym for Hypertext Preprocessor) is an open source language platform widely-used for web development. Thanks in part to its simplicity, power, and compatibility with a wide variety of web servers and operating systems, PHP has rapidly become one of the most popular scripting environments in the world, exceeding all other scripting technologies.
These days, it's increasingly rare to begin a new Java project without looking at one or more pre-existing frameworks, collections of class libraries that provide the underlying structure for an application and enforce good programming practices. There are frameworks for enterprise computing (EJB), Web applications (Struts), business process modeling (Naked Objects), graphical user interfaces (JFace), and many other areas of development. A well-designed framework provides a sturdy structure upon which to build applications.
If you run an "always on" e-commerce site (perhaps using some of the high-availability tricks described in this issue), you must ensure that search forms really operate and that the pages pointed to have reasonable content. Validation is vital for dynamic web sites, especially those that generate an "everything's OK" 200 status when the content of the page contains a Java traceback from a database connection. To truly have high availability, you have to watch the associated programs and databases -- not just that the links on your pages all go somewhere reasonable.
If you're moving a text file from Windows to Linux, and unless your ftp program has a method of converting the Windows text to Unix, the file will have a CONTROL-M (^M) character at the end of each line. In many cases, the control characters don't hurt anything, because CONTROL-M, or carriage return, is treated as white space, and is ignored by many programs. However, some programs, such as the Perl interpreter, are adversely affected.