Caos NSA and Perceus: All-in-one Cluster Software Stack

Silence the struggle around cluster software stack configuration. Caos NSA is a distribution that focuses on making things simple, easy to install and upgrade, and easy to manage.

Sometimes it’s a struggle to configure a cluster software stack. Usually the stack is split into a distribution (OS) and the cluster management tool. While there are some advantages to this separation, sometimes people want to have a cluster software stack that is all-in-one. There are several packages that satisfy this definition such as ROCKS and Clustermatic, but there is one that is arguably the best combination – Caos NSA and Perceus.

The all-in-one packages tend to build on top of existing distributions and then add a Cluster management System (CMS) and other tools to create the all-in-one package. One of the differentiators between these packages is the integration between the base distribution and the CMS. More integration requires more work but it does get the total package closer to the proverbial “cluster in a box.” More importantly it can make the configuration, deployment, updating, and monitoring of the cluster much easier.

But many of these base distributions are getting larger and heavier with every release, even for the core components. Typically, clusters don’t need this bloat (who needs to have 4 different music players on their nodes?) and just need a simple yet robust OS distribution. Caos NSA is a distribution that focuses on making things simple, easy to install and upgrade, and easy to manage.

Caos NSA

The Caos developers came together to form the Caos Foundation back in 2002. with Greg Kurtzer as the leader and founder. The Caos Foundation went public in 2003 with the goal of putting together an RPM based distribution of Linux for the community by the community. Originally they started both Centos and Caos. Centos was created as the build and development platform for Caos. Later the Centos project was split out on its own and the Caos team focused on the development of Caos-Linux which had different goals than Centos.

Caos 1 was developed and was used as an example of how one can manage an operating system at scale. It was a very usable Linux distribution but was somewhat close to Redhat or Centos. However, it did prove very useful in creating a new community supported distribution and it introduced some new ideas about how a distribution is built, maintained, and installed.

The Caos team took what it learned from Caos 1 and created Caos 2. Caos 2 is a general purpose distribution containing with about 1,800 packages. It was targeted at the advanced user and had a certain Debian’esq lightweight feel and efficiency to it.

Around the time that Caos 2 was released the development team realized that the world of Linux distributions was becoming more varied. In particular there were distributions that focused on the desktop, such as Ubuntu and Fedora Core, distributions that focused on servers, distributions that varied quickly and distributions that varied very slowly. In many ways distributions had become somewhat niche in that they focused on a particular “market.” More over, many distributions were becoming bloated and too “heavy” even at the core. Linux has always naturally excelled at being the lightest, simplest, fastest, and depending on perspective, secure choice. But now these distributions, were requiring new hardware, and giant overheads closer to Windows requirements. Obviously, not all Linux users require or much less want that, particularly for clusters. So Caos NSA Linux was developed to focus on where Linux has always naturally excelled – Simple, Lightweight, fast, easy, secure… NSA: Node, Server and Appliances. Caos NSA is focused on making a stable, somewhat slowly varying Linux distribution that can serve, not only as a great base server-like OS, but as a cluster distribution.

If you ask cluster people the attributes they want in a distribution, you are likely to get a laundry list of requirements. But, it’s fairly easy to get some consensus on some things. In no particular order these things are typically,

Requirements for Cluster distribution

  1. Stable
  2. Easy to install
  3. Easy to upgrade
  4. Robust
  5. Light weight with long term support

So Caos team focused on these things and developed a very stable, uncomplicated, easy to install and upgrade, distribution they call Caos NSA. They also included a new admin tool called Sidekick that allows you to administer the installed distribution. They use the Caos 2 installer, cinch to install the distribution. Sidekick is text based so you don’t have to worry about video cards and monitors (makes things simpler – remember this is a server oriented distribution). This can greatly speed the configuration.

The packages that Caos NSA has by default are very useful for servers but also have a cluster flavor. For example, it has (as of the time of this writing), OFED 1.2 (OFED = Open Fabrics Enterprise Distribution) which gives Infiniband support common in HPC), Open MPI 1.2.5, Slurm 1.2.19, gcc 4.1.2, and environment modules, Cports (Caos source based/ports packaging system integrated with environment modules and commercial compiler profile support, and others.

Caos NSA has other features that many other distributions don’t have. One important feature is that is uses a very recent kernel – 2.6.23.8 While some people don’t like to work with recent kernels because they want many more people to test it before they use it, the recent kernels have features or capabilities for recent chipsets and hardware as well as new features in the kernel itself. While I understand the desire for more “tested” kernels, many times these kernels cannot use recent hardware because it’s not supported. In addition, the kernel can be an exciting place with lots of new developments that can help HPC performance. So, I think having a recent kernel is important.

One feature that Caos NSA has that most Redhat based distributions don’t have, is support for XFS. XFS is important for clusters because it is arguable one of the fastest file systems, particularly for large files. This is particularly important for codes that require lots of local file system performance.

Caos NSA also configures the Ethernet interfaces for IPv6, not just IPv4. While I’m not using IPv6 at the moment, I think a distribution that can configure IPv6 at this point is far ahead of others.

A fourth feature that I think is potentially important is a firewall. While many distributions have firewall capability, I’ve found installing them to be less than easy. But in Caos NSA setting up a firewall for any and all network interfaces is very easy. You tell it which Ethernet interfaces you want to have a firewall and it automatically configures the firewall and then you can open certain ports (as desired). But, Caos doesn’t make any guarantees on the firewall for obvious legal reasons.

Finally, Caos NSA isn’t just a Linux distribution but is also tightly integrated with Perceus – a fantastic Cluster Management System (CMS).

Perceus

In last month’s column I talked about building and installing Perceus with Fedora Core 8. In that column I discussed how to build it from scratch, how to configure it, and how to get a small 2 node cluster working. In the case of Caos NSA, Perceus is already tightly integrated.

As part of the installation, Caos NSA asks if you want to install Perceus. If you want this (and who doesn’t) it will install it and configure it for you after you answer a couple of questions. It will ask you what network interface you want to use for the cluster (this should be a private network). Then it will ask you what range of IP addresses you want to use and a couple of other simple questions, and bingo, Perceus is configured and ready to go.

At this point, you can follow the same basic steps as in the previous column. You should download a VNFS capsule, import it into Perceus, boot some nodes, and point the imported capsules at these new nodes.

Now that we have the basics of what Perceus is doing, let’s actually install Caos NSA and Perceus on a small two node cluster to walk through the process.

Caos NSA Installation and Configuration

I’m working with a small two node cluster to demonstrate Caos NSA and Perceus. The two nodes are basic AMD Opteron nodes. The master node has a single IDE drive in it (80 GB) and three NICs: (1) a fast Ethernet NIC to the outside world (IP: 192.168.0.20), (2) an Intel GigE NIC (IP 10.1.0.253), and (3) a DLINK fast Ethernet NIC for management and storage (IP 10.1.1.253). I downloaded the Caos NSA image to a DVD and booted from that DVD.

To start the installation I typed,

install ipaddr=192.168.0.20 netmask=255.255.255.0 gw=192.168.0.1 \
  ifdev=eth0 fs=xfs ns=xx.xxx.xxx.xx hostname=home64 \
  layout=bigsrv disk=hda

and Caos NSA takes off. The variable ipaddr is the IP address of the master node, gw is the gateway (in this case my home router), ifdev is the ethernet device it should use for the installation, fs is the file system I want to use (thank God Caos NSA has xfs), ns is the nameserver which I’ve left blank to protect the innocent, layout is the particular file system layout I wanted (other options are “manual” “basic” and a few others).

The installation then starts installing Caos NSA that installs the kernel (2.6.23.8 – thankfully a newer kernel), configured grub, and a few other items, and then reboots. After the reboot sidekick starts up to help you finish the installation (remember than sidekick is also an admin tool that you can use after the installation). The first screen that pops up is something of a registration request that is really optional.

The next few steps deal with some details about the installation. Here is the list of steps I took up to the point of configuring Perceus.

Caos NSA Installation steps:

  1. Registration
  2. Root password
  3. Time zone configuration
  4. NIC configuration (eth0, eth1, and eth2 in my case)
  5. Firewall configuration (including opening ports)
  6. System Profile (what basic configuration do you want)
    • Desktop
    • Cluster

The last step is to actually install the packages for the requested configuration. In this case I chose to install Caos NSA as both a desktop and a cluster (the desktop configuration installs XFCE and a basic VESA driver for those of use who like to use a web browser on the master node such as me).

The next step in the installation asks if you want to configure Perceus. After answering yes, the next steps are fairly simple:

Comments on "Caos NSA and Perceus: All-in-one Cluster Software Stack"

spichardo

After the excellent experience of using Warewulf in a previous job (as post-doctoral fellow), I did not have any second thought to deploy Perceus for my new cluster, which is a diskless cluster with 16 nodes. The Caos NSA distribution was not ready yet at that time so I stayed with Centos 5.2, which to be honest I don’t have any inconvenient at all, much the contrary.

Even if I found Perceus much simpler to administer (one command to rule them all, well almost :) I was a little confused by the fact that some very basic modules (as the ones that handles the ipadrr assignation and host names) are not activated by default. The very first time I booted the nodes, they were not accessible from the master, and after activating the requested nodes I had to reboot them physically. Some of the scripts (which most of them are invoked through the main command perceus) have some minor imperfections with the access rights for non-root users to the perceus databases. However, the dev-team is simply amazing and answers promptly through the discussion list.

The new web portal in perceus 1.5, even if it looks a little too simple since it was just introduced in this release, lets you to change the VNFS capsule for a node or to update the description of the node.

Anyways, Perceus simply rocks and I keep giving it praise among my colleagues. A colleague just installed for his own beowulf cluster and is extremely satisfied with his results.

Reply
bainum

Perceus is magic. We built a cluster in the Fall 08
semester using perceus and the students had a great
experience…as did I. We experimented with running
both control and data networks to the compute nodes
and that worked fine with some changes to the
ipadrr module. Students were able to write programs
using openmpi. As a system to let students build
and program a cluster, perceus is excellent.
We started the semester with one version and midway
through the semester we did an upgrade. I want to
thank Greg Kurtzer and his team for developing
a cluster management system that works so well. I
also commend Jeff Layton on his articles on perceus.
The article in mid-2008 brought perceus to my
attention and after working with it during the
summer or 2008 I decided to use it in my cluster
class in the fall 08 semester. We continue to
run our perceus cluster with students.

Reply
kasozi5

Hi guys, i would like to get more detailed features for Perceus and also the system requirements for deploying Perceus.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>