From predicting the weather to keeping your e-commerce Web site running 24/7, nothing does the trick like a cluster of Linux systems. We show you what clusters can do,
Some problems are just hard to solve. If your job requirements include things like predicting the weather or simulating nuclear blasts without actually launching any warheads, then your problems fit neatly into this category.
Or, maybe you have more practical concerns. Maybe you are running a Web-based business that receives millions of hits per day, and it’s your job to make sure that your server response times stay lightening fast.
Of course, making sure that your company’s mission-critical applications are available 100 percent of the time is just as important as insuring a snappy Web server response to your customers…
If any of these situations apply to you, a cluster of Linux computers might just be the answer to your prayers.
But what exactly is a cluster, and why would you want to set one up? What are the potential benefits and pitfalls of deploying a Linux cluster? This article will address those questions and also provide you with a background on clustering technology, how it works and discuss the tools that are available for building clusters on the Linux platform.
What is Clustering?
The term “clustering” actually refers to a number of different technologies and configurations.
When many people hear the words “clustering” or “server cluster,” they think of high performance groups of computers used for scientific research. However, this is just one of the types of clustering available. The basic idea behind the “performance clustering” approach is to make a large number of individual machines act like a single very powerful machine. This type of cluster is best applied to large and complex problems that require tons of computing horsepower. Applications such as weather prediction, astronomy, and cryptographic research are prime candidates for high-performance clusters.
A second type of clustering technology allows a network of servers to share the load of traffic from clients. By load balancing this traffic across an array of servers, access times improve and reliability increases. Additionally, since many servers are handling the work, one failure will not cause a catastrophic breakdown. This kind of service has tremendous value to companies with extremely high-traffic Web sites.
The last major type of clustering involves having the servers act as live backups of each other. This is called “high availability clustering” (HA clustering) or “redundancy clustering.” By constantly tracking the performance and stability of the other servers, a high availability cluster allows for greatly improved system uptimes. This can be crucial in high traffic e-business sites and for other mission critical applications. Load balancing and high availability clusters share many common components, and some clusters make use of both types of clustering.
How Does it Work?
|Figure One: By sharing a common data source, the backup server can instantly take over for the primary server in case of failure. If the backup server doesn’t receive a pulse signal from the primary server, it will instantly reconfigure its network interface and take over the traffic load. For extra redundancy, the data source can also be set up as a cluster.|
At its core, clustering technology has two basic parts. The first component, made up of a customized operating system (such as the kernel modifications made to Linux), special compilers, and applications, allows programs to take full advantage of clustering.
The second component is the hardware interconnect between machines (nodes) in the server cluster. These interconnects are often highly specialized interfaces. In some cases, the hardware is designed specifically for clustered systems. But in most common Linux cluster implementations, this interconnect is handled by a dedicated fast Ethernet or gigabit Ethernet network.
Assignment of tasks, status updates and program data can be shared between machines across this interface, while a separate network is used to connect the cluster to the outside world. The same network infrastructure can often be used for both of these duties. However, doing this can cause performance to suffer when network use is high.
By breaking the problem down into tasks that can be done in parallel, the computers in a high performance cluster can share the load and complete the problem more quickly. Performance clustering works in a similar manner to traditional symmetric multiprocessor (SMP) servers.
The most widely known high-performance clustering solution for Linux is Beowulf. It grew out of research at NASA and can provide supercomputer-class processing power for the cost of run-of-the-mill PC hardware. By connecting those PCs through a fast Ethernet network, they combine their power into a penguin-powered supercomputer.
The servers of a high availability cluster don’t normally share the processing load that performance cluster servers do. Nor do they share the traffic load as load-balancing clusters do. Instead, they sit at the ready, able to instantly take over for a failed server. Although you won’t get increased performance from a high availability cluster, their increased flexibility and reliability make them necessary in today’s information-intensive business environment.
High availability clustering also allows for easier server maintenance. One machine from a cluster of servers can be taken off line, shut down, and upgraded or worked on without taking the services provided out of commission. When that server is all set, it can be brought up, and the next one in the group can undergo maintenance.
MOSIX is a package for Linux that does load balancing for particular processes across a cluster. It works like a high performance cluster in that it’s designed to take advantage of the fastest available hardware in the cluster for whatever task is thrown at it. But, it actually does this by load-balancing the various tasks across multiple machines.
The huge advantage of MOSIX is that there is no need for customized software (Beowulf-style clusters do require software customization). Users can simply run their program as normal, and MOSIX handles the rest. In fact, when you launch an application, you can do a ps. It’ll appear to be running locally even if it’s really running on the idle quad-Xeon server in the corner.
Another package, called Piranha, allows Linux servers to provide high availability without the need for expensive hardware. The cluster is totally software-based, with the servers communicating over a high-speed network. It can be set up to work as either a high availability or a load-balancing cluster.
Piranha can be configured for failover, where a backup server sits at the ready to take over for its failed counterpart. It can also work to make the cluster appear as one “virtual server,” balancing the load of traffic between the servers in the cluster.
The outside world sees only one address for the virtual server. This helps to insulate the individual machines within the cluster, and allows the cluster to dole out traffic to the least busy servers (in the case of load-balancing clusters) or send traffic to whichever server is up and running (in the case of high availability clusters).
To anyone who has worked as a network or system administrator, some of the benefits of clustering will be immediately apparent. The increased processing speed offered by performance clusters, increased transaction or response speed offered by load-balancing clusters, or the increased reliability offered by high availability clusters can be vital in a variety of applications and environments.
Take, for example, weather research and prediction. These fields require massive amounts of data and very complex calculations. By combining the power of many workstation-class or server-class machines, performance levels can be made to reach supercomputer levels (and for a much lower price than the traditional supercomputer offerings, which require extremely expensive specialized hardware and software, as well as dedicated support staff in many cases).
Or, think of a high-traffic Web site. Without a high-availability plan in place, a minor hardware problem (such as a failed $50 network card) can bring an $80,000 Web server to its knees. But, with redundant servers and instant failover the network administrator can repair the problem while the site runs on its backup, saving the company untold losses in revenue.
One doesn’t need to look far to find examples of organizations and businesses that are using clustering technology to solve their problems. NASA uses Beowulf, which was started in a project headed up by CESDIS (The Center for Excellence in Space Data and Information Sciences) under contract in 1994. Since then, it has spread to many other research houses and university labs.
NOAA (The National Oceanic and Atmospheric Administration) uses several different clustering technologies in their projects. Their HPCC (High Performance Computing and Communications) group performs research in many areas of supercomputing, load balancing, and high performance clustering.
Google.com made news earlier this year with the largest-ever Linux cluster, which powers their popular Web search engine site. Using more than 4000 servers running Red Hat Linux, Google.com was also chosen by Yahoo! to replace Inktomi as their search provider.
Now that we’ve presented the benefits and advantages that clustering technology offers, it’s time to touch on the flip side. What happens when things go wrong? What problems might you have when configuring and maintaining your own cluster?
One of the most difficult problems facing anyone building a cluster is finding and eliminating all single points of failure. If you have a high-performance cluster that relies on a single central server to dole out tasks, failure of that node will render the entire cluster useless.
Similarly, with a load-balancing or high availability cluster, you may have a great set-up that guarantees that your servers will be running. But, if they all connect to the corporate network (or to the Internet) over a single pipe, you’re asking for trouble.
Building in redundancy is the key for supporting the hardware infrastructure and clustering software. One important pitfall to watch out for is the interconnect speed between nodes of the cluster. Trying to run a cluster across a 10 megabit Ethernet may work fine for a few nodes, but as soon as the cluster grows you may find that the network is such a bottleneck that servers are sitting idle waiting for data.
Of course, this may not be as big of a concern if your site is being hosted over a few T1 lines. But if you’ve got a T3 or higher, 10 mbps just won’t cut it when the traffic starts pumping.
As is the case with any other technology, a cluster is only as good as its implementation. A poorly configured high availability cluster may just provide you with a false sense of security and then fail when it’s called on. Also, having a good clustering solution doesn’t negate the need for daily backups and common-sense server monitoring and maintenance.
Who You Gonna Call?
If you’ve decided that you want to build a cluster for your enterprise, there are a wide variety of choices available to you. The options range from the unsupported and absolutely free (Beowulf), to the commercially supported and absolutely expensive (Red Hat High Availability Server (RHHAS, which has price tag of $1,995). See the Clustering Alternatives sidebar for a brief summary of the different systems that are available.
Of course, if you don’t want to shell out two grand for “free software,” you can always download one of the freely available clustering solutions and configure it yourself. You’ll forfeit the support that comes with the commercial version, but you’ll gain the satisfaction of knowing that you are staying true to the open source spirit.
For the brave souls who have decided they can forgo the commercially supported clustering solutions, the Hands on Clustering sidebar offers a step by step guide to getting the free (unsupported) version of Red Hat’s Piranha software up and running to create your own high availability server cluster.
A Linux-based cluster can do a tremendous amount to help enhance the stability, performance, and throughput of your Web site or server farm. If you run your own Web site, develop mission-critical applications, or just love to play with cool technology, you should definitely check out the clustering options available for Linux.
The Beowulf project, originally started as a NASA endeavor, brings supercomputer-class performance to Linux clustering.
Although Beowulf requires
special software modifications for programs to utilize the cluster, the performance it offers for computer-intensive tasks is really quite staggering. For more information on Beowulf, see http://www.beowulf.org.
Linux Networx Evolocity is a
commercial clustering solution. Linux Networx packages the
hardware along with software and cluster management tools (otherwise known as ClusterWorx) for enterprises that prefer a completely integrated solution. For more information on Evolocity, see http://www.linuxnetworx.com.
SteelEye Lifekeeper is a high availability clustering solution available for Linux, Unix, and Windows NT servers. It provides both fault
tolerance and load balancing for enterprise environments. Detailed information and white papers are available at http://www.steeleye.com.
Linux Virtual Server
Linux Virtual Server is a project to provide a common set of kernel patches and management products to create a load-balancing cluster solution for Linux. It can be used to build highly scalable and very reliable clusters of servers. For more information see http://www.linuxvirtualserver.org.
MOSIX is a solution that allows all of your users to make the most of the computing resources you have available. By taking their requests and then actually running their
program on the quickest machine available in the cluster, each user gets the best possible performance. And MOSIX doesn’t require that software be specially compiled to take advantage of its features… in fact users won’t even be aware of the fact that their programs aren’t running locally. For more information on MOSIX, see http://www.mosix.cs.huji.ac.il.
Piranha is the failover/virtual server offering developed by Red Hat. It is the core technology for their High Availability Server software package, and also comes with the standard Red Hat Linux 6.2 distribution. Setting up Piranha is easy (see the Hands On Clustering sidebar, pg. 66). For more information about Piranha, check out http://www.sources.redhat.com/piranha.
The TurboCluster distribution, from TurboLinux, is a highly
integrated clustering package. It is designed around the TurboLinux distribution with high availability in mind. It can also integrate Solaris and Windows NT servers into the cluster.
TurboLinux also offers a product called EnFuzion for providing clustered application execution for existing servers and workstations. For more information see http://www.turbolinux.com.
VA Cluster (VACMsystem for monitoring and controlling Intel Intelligent Platform Interface-based cluster systems). These can be Beowulf, Piranha, or other types of clusters. VA Linux Systems also offers a product named ClusterCity for those who want a fully integrated hardware and software system. Details about both VACM and ClusterCity can be found at http://www.valinux.com.
So, you want your site to have the advantages of high-availability clustering without the steep price tag? You’ve come to the right place! We’ll go step by step through setting up and configuring your very own Piranha-based cluster.
Like all the best things in life, Piranha is free. In fact, it even comes as an installation option on some Linux distributions (such as Red Hat 6.2). If your favorite distro doesn’t come with it, it’s a quick download from the Piranha Web site.
Preparing Your Server
Before you jump into installing Piranha, take a minute to make sure your server is ready for the task. If this is an existing server, check the RAM and disk space. If your server is low on either of these, now is the time to do an upgrade.
In order to increase system security and reliability, make sure you’re only running the services and software you’ll need for the cluster. There’s no need to install SAMBA or Sendmail if you’ll be creating a high-availability Web server cluster. By carefully choosing which packages to install on your cluster server (or any server for that matter), you can save yourself a lot of trouble in the long run.
If you’re using an existing server, now is also a good time to make sure your server has the latest versions of (and security patches for) the packages you’ll be using with the cluster. You’ll also need to update to kernel 2.2.14 or later. During the installation, you’ll need to patch the kernel with the ipvs patch in order to enable clustering in the kernel. The more you prep the server now, the easier your later installation will be.
|Figure One: The web-based Piranha configuration tool provides an easy interface for setting up clustered servers and configuring failover for specific services. |
If you are setting up a new server, and you’re using Red Hat Linux 6.2, the process should be fairly straightforward. Simply choose the Custom installation; pick the packages you want and then make sure to include the Clustering option at the bottom of the list.
This will install Piranha and the patched kernel on your server during the OS installation process. Once your server is up and running, you can then proceed to configuring your cluster. If you have an existing server that you want to configure with Piranha, follow these steps:
STEP 1 Update the kernel to 2.2.14 or later
STEP 2 Download and apply the Ipvs patch to the kernel and recompile. You can get this patch at http://www.linuxvirtualserver.org/software.
STEP 3 Download the latest version of Piranha (0.4.14 at the time of writing) from ftp.redhat.com, updates.redhat.com or a mirror site.
STEP 4 Install the piranha, piranha-docs and piranha-gui RPM packages. If there are any failed dependencies, update the indicated packages, and then complete the installation.
In this walkthrough, we’re setting up failover services (FOS), as opposed to the Linux Virtual Server (LVS) setup that is used for load balancing.
STEP 1 First, we need to set up a password for the Piranha user to ensure security of the account. Do this by executing the passwd piranha command as root. Also, set a password for the Piranha Web-based configuration area by executing htpasswd /home/httpd/html/piranha/secure/passwords piranha.
Figure Two: I addresses and cluster host names.
127.0.0.1 localhost localhost.localdomain
10.0.0.1 cluster1 cluster1.railsback.com
10.0.0.2 cluster2 cluster2.railsback.com
STEP 2 Next, add the cluster hosts’ names and IP addresses to the /etc/hosts file. Your file on both cluster servers should look like Figure Two.
STEP 3 Now edit the /etc/hosts.allow and /root/.rhosts files, to enable root access through rsh and rcp for the servers in the cluster.
STEP 4 Next, you’ll need to configure the Apache Web server on both nodes and start up the Web server. Then, use a client machine to access http://cluster1/piranha on the primary server in the cluster. You should see a screen very similar to what you see in Figure One. You’ll need to enter the Web password you set earlier.
STEP 5 The Web-based Piranha config tool will let us easily set up the cluster. First, click the GlobalSettings tab, set the primary server’s IP (10.0.0.1), and click the fos button to select failover services clustering. You can change to ssh sync if you have ssh set up. Then, click the accept button.
STEP 6 Next, click the Failover tab. Click the add button to add a server to the cluster. Then, click the edit button to configure the IP settings for the backup server. Enter the server name, IP, application port (80 for Web services), device (eth0 or eth1), and timeout. Then, click accept for those changes to be saved. Do the same to add both of your cluster servers to the configuration.
STEP 7 Then, just copy the /etc/lvs.cf file to the other cluster node. Check that all services are
operational on the cluster nodes and then use the /etc/rc.d/init.d/pulse start
command to start up the clustering software on each of the servers in the cluster.
Congratulations! Your Linux servers should now fail over and keep your site running if one of them needs to be taken down.
If you’d like to cluster more than just the standard Apache Web service (port 80), you can set that up with the Piranhaconfiguration tool. Simply add more services to the Failover section, configuring the proper port numbers and startup calls.
The next step is to ensure that everything is working as planned by doing a little testing of your newly formed server cluster. First, turn off clustering by issuing the /etc/rc.d/init.d/pulse stop command. Then, make sure that both nodes are using the same cluster configuration by executing the command rcp /etc/lvs.cf cluster2:/etc/lvs.cf.
Disconnect both servers from the network and bring up the cluster software on each (with the /etc/rc.d/init.d/ pulse start command). They should each think that the other has failed, so all of the services should be live. Check what is running on each server with the ps command.
You can now perform other tests to make sure that everything is working properly by connecting both machines to the network and bringing up Piranha. Disable a service manually and check to make sure the other server compensates correctly. If this is working in both directions, you should be all set!
Kevin Railsback is the technical director for the InfoWorld Test Center, as well as a long-time Linux geek and freelancer. You can reach him at firstname.lastname@example.org.