Perceus/Warewulf: Tres Cool Cluster Tool

Using Perceus/Warewulf as your Cluster Management System (CMS) can speed cluster setup and deployment by automating a number of repetitive tasks.

If you want to build a cluster at some point you are going to need a Cluster Management System (CMS) or what is also called a Cluster Tool (CT). The CMS or CT is a software tool that allows you to perform the needed tasks such as creating the image or OS that go on the nodes, getting the image to the nodes, and monitoring the nodes. I want to discuss using Perceus/Warewulf for these tasks.

Before I jump into Perceus/Warewulf I want to talk about what constitutes a CMS. If you ask 10 experienced cluster people this question, then I bet you’ll get at least 10 different answers — and then an argument will break out about which one is better and why.

I’m going to try to stay above the fray and give you my ideas of what constitutes a “core” CMS. That is the functions related to getting the nodes operating to some extent as a single unit. I will ignore such things as job scheduling, alerts (alter the user when there is a problem), management of file systems, user management, etc. I’m going to stick to the core CMS functions:

Core CMS Functions (ala’ Jeff Layton)

  1. Creating/Managing the compute node images
  2. Getting those images to the compute nodes (provisioning) (either stateless or stateful)
  3. Monitoring the Compute Nodes

People associate a number of other tasks with CMS tools, but these are the ones that I think are at the core of any cluster. So without further ado, let’s take a look at Perceus/Warewulf.

Introduction to Perceus/Warewulf

One of the best Cluster Management Systems is Warewulf. It was a very robust CMS that created a subset of the OS on the master node for the compute nodes. It used DHCP and tftp to get the compute node image to the node and installed it in a ramdisk (tmpfs). It used a small amount of memory, typically about 50-80 MB, and relied on the master node for other OS functions that weren’t often used. It’s a very successful CMS and is in use on a number of clusters throughout the world including many on the Top500 list and even one in the top 10!

Perceus builds on the success of Warewulf but divides the Warewulf CMS into two distinct pieces to meet customer requirements. The first piece is an imaging/provisioning system to handle the management of the images. This piece is now called Perceus. The CMS portions, such as the monitoring of nodes, is incorporated into Warewulf 3.0, which is still in early development. But you can use some of the tools from Warewulf 2.x until the development of Warewulf 3.0 is complete.

Perceus calls the image that goes to the node a VNFS (Virtual Node File System) Capsule. It contains the kernel, modules, and the parts of the OS you want to go to the compute nodes. In older version of Warewulf this image was written to a ramdisk (/tmpfs) treating each node as being stateless. With Perceus you now have the option of doing the same thing (stateless) or having the image written to storage in the node (stateful). This answers some of the criticism of the earlier versions of Warewulf without compromising the stateless support.

Perceus has two stages. The first boots the compute node into what is called the “Perceus Boostrap OS.” The second stage sends what is called the “runtime operating system” to the node and installs it either as a stateless or stateful OS. Then the Perceus Operating System goes away and the node runs normally (doesn’t sound too hard does it?).

The first stage of the booting process currently uses PXE (Preboot Execution Environment) but the company the development of Perceus, Infiscale, has modified Perceus to use Intel’s Rapid Boot (basically you install the Perceus Bootstrap OS on flash storage on the board and boot from that).

Using PXE, the node requests an IP address via DHCP. The DHCP response contains the information to download the pxelinux and Perceus boot software via TFTP. The compute node then boots using the Perceus Bootstrap OS and requests an IP (via DHCP). The node then talks to the Perceus server on the master node to tell it that is it in the “init” state. The Perceus server then adds this compute node as a new node if it didn’t already exist and a creates a command sequence that initiates the second stage of booting the nodes, and transmits it over an open socket.

The second stage then gets the VNFS and boots it. For a typical stateless node (and we should all do stateless – more on that at another time), the command sequence will transfer the VNFS to the node and prepare to run stateless. Perceus then uses kexec to switch over to the kernel that came with the VNFS.

After the kernel has been started, the Perceus Bootstrap OS is purged from memory. Now the node is running the image you prepared for it without any traces of the bootstrapping OS. The OS in the VNFS capsule thinks it booted as if from a local disk. This means that it is distribution neutral and can even be operating system neutral

This explanation may be a bit long but I wanted to show you that careful thought has gone into the design of Perceus. In particular, the developers have paid close attention to making the whole imaging process very scalable.

Now that we have the basics of what Perceus is doing, let’s actually boot a small two node cluster to walk through the process.

Let’s Take it for a Test Drive!

I’m working with a small two node cluster to learn Perceus. The two nodes are basic AMD Opteron nodes. The master node has a single IDE drive in it (80 GB) and three NICs: (1) a fast Ethernet NIC to the outside world (IP: 192.168.0.20), (2) a GigE network for storage/management traffic (IP 10.1.1.253), and (3) an Intel GigE NIC for computational traffic (IP 10.1.0.253). I’ve installed Fedora Core 8 on the master node.

The Perceus documentation says that you need a few prerequisites on the master node. As you read through the documentation you learn there are a few other requirements as well. The total requirements for my system were:

Perceus Master Node Prerequisites

  1. Disable DHCP
  2. Disable tftp
  3. Install perl-DBI
  4. Make sure NFS is there
  5. Install NASM (required for pxelinux support)
  6. Make sure Perl is installed (almost everything in Perceus is written in Perl)
  7. Install Perl-Unix-Syslog (Perceus dependency)
  8. Perl-IO-Interface (Perceus dependency)
  9. Perl-Net-ARP (Perceus dependency)
  10. bash_complete (Perceus dependency)

Some of these dependencies need to be installed from your OS. For example using yum you can easily install the required package and its dependencies. After grabbing the packages, just follow the instructions in the Perceus documentation on building the RPMs and then installing them.

The final step is to build the Perceus package itself. The instructions in the documentation are very good, and I’ve never had a problem with building it. One thing you will notice is that while Perceus is building you will see the kernel being built for the Perceus Bootstrap OS. Currently (as of the writing of this column), the Perceus Operating System uses the 2.6.21.5 kernel.

For FC8 I had a small problem with an include file that somehow didn’t make it into /usr/include/linux but this was easily fixed. Be sure to follow all of the directions in the Perceus manual, particularly the one about the rpmmacros. This will help avoid some problems.

After installing Perceus itself from the rpm, the next steps are to modify just a couple of configuration files. The first file to modify is /etc/Perceus/perceus.conf. In my case, the network the management traffic goes over is 10.1.1.253. My Perceus.conf file looks like this:

# This is the configuration file for Perceus

# Define the IP Address of the network file server
vnfs transfer master =  10.1.1.253

# What protocol should be used to retrieve the VNFS information. Generally
# Supported options in this version of Perceus are: 'nfs' and 'http' but
# this maybe overridden by particular VNFS capsules.
vnfs transfer method = nfs

# Define the VNFS transfer location if it is different from the default
# ('statedir'). This gets used differently for different transfer methods
# (e.g. NFS this replaces the path to statedir, while with http it is gets
# prepended to the "/perceus" path).
vnfs transfer prefix =

# How long should we wait before considering a node as dead. Note, that if
# you are not running node client daemons, then after provisioning the node
# will never check in, and will no doubt expire.
node timeout =

The second file you need to modify or at least check is /etc/perceus/defaults.conf. I wanted my compute nodes to start with a 1, so I modified the file. My defaults.conf file looks like:

# This is the template name for all new nodes as they are configured.

# Define the node name range. The '#' characters symbolize the node number
# in the order of initialized. If you don't allocate enough number spaces
# here for what you defined in 'Total Nodes' then it will be automatically
# padded.
Node Name = n####

# What is the default group for new nodes (this doesn't have to exist
# anywhere before hand)
Group Name = cluster

# Define the default VNFS image that should be assigned to new nodes
Vnfs Name =

# Are new nodes automatically enabled and provisioned?
Enabled = 1

# What is the first node number that we should count at?
First Node = 1

# This is the total node count that Perceus would ever try and allocate a
# node to. It is safe to make this big, so you should leave it big.
Total Nodes = 10000

Otherwise I left everything as the default values. The next step is to run the command perceus to initialize it. It will ask you a few questions including some registration questions. Be sure to answer these because if you have problems Perceus has the ability to dump some information about the configuration to a file which you can send to the mailing list for help.

During the initialization a new configuration file, /etc/perceus/dnsmasq.conf is created (Perceus uses dnsmasq). I checked this file to be sure everything looked correct. This configuration file for me looked like:

interface=eth1
enable-tftp
tftp-root=/var/lib/perceus//tftp
dhcp-boot=pxelinux.0
local=//
domain=cluster
expand-hosts
dhcp-range= 10.1.1.2,10.1.1.252
dhcp-lease-max=21600
read-ethers

Everything looked correct to me so I didn’t change anything (which is a good thing). The next step is to either find a VNFS capsule or create one. For the purposes of this article I just used an existing capsule that the Caos NSA developers created.

http://mirror/caoslinux.org/Caos-NSA-1.0/vnfs/ \
x86_64/caos-nsa-node-0.9-28-1.stateless.x86_64.vnfs

Even though this capsule was created for a different distribution I was able to use for this testing (I’ll have a column about Caos NSA and Perceus next month). This is one of the powerful capabilities of Perceus. You can use capsules from other distributions or even other operating systems (but be sure to test them to make sure they work correctly).

After downloading the capsule I need to “import” it into Perceus so that it knows about.

# Perceus import vnfs \
 /root/CAPSULES/caos-nsa-node-0.9-28-1.stateless.x86_64.vnfs

During the importing process, Perceus will ask you a few quick questions. For example, it asked what root password I wanted to use for the capsule, which Ethernet device was to be used for booting, and what the address was of the machine holding the capsules. After a few minutes, I was able to check if the capsule had been imported:

# perceus vnfs list
caos-nsa-node-0.9-28-1.stateless.x86_64
#

So Perceus now knew about the capsule I wanted to use. Then I booted my compute node with a monitor attached to it. I watched the node grab the Perceus bootstrap OS via DHCP. It then said that no VNFS image had been defined for node n00001. This makes perfect sense since Perceus didn’t know anything about this compute node at this point so it didn’t know which capsule I wanted to use. Next, I told Perceus I wanted to use a specific capsule.

# perceus node set vnfs \
 caos-nsa-node-0.9-28-1.stateless.x86_64 n0000[1-9]

This command tells Perceus to use the particular VNFS image on all nodes n00001 to n00009. At this point, Perceus told n00001 about its VNFS image and sent it to the node. I watched the node boot the VNFS image and the next thing I know, the node is up and running! You can check if the node is up by using Perceus:

[root@home64 ~]# perceus node status
HostName             Status         IpAddr                    Last Contact
n00001               ready          10.1.1.134                00:00:22

As you can see the node is up and running. To make entirely sure I sshed to it. So now I’m sure the node is up and running. At this point I could add as many nodes as I wanted (but I only configured for 250).

Parting Comments

I’m a big fan of Warewulf. It’s easy to use and easy to administer. I had hoped Perceus would be even better, and I wasn’t disappointed. It’s very slick and easy to install even on a distribution that no one has used before (FC8). There were a few very small problems getting it built on FC8, but there were easily solved. After that it was easy.

There were several features that particularly impressed me about Perceus. The design of the provisioning allows it to be distribution and even operating system neutral. This allows for the possibility of using a single master node for many operating systems. Using Perceus you can easily assign various VNFS capsules for various nodes. It should also be possible to do this in a dynamic manner. The VNFS image management in Perceus is also much better than Warewulf. It is truly easy to manage multiple capsules on the same mater node. I highly recommend giving Perceus a try — you won’t be disappointed.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62