dcsimg

Master of Puppet: System Management Made Easy

Want to build a better network? Start by building better administration tools. Puppet aims to spark a new generation of monitoring software.

As networks grow larger and larger, as services become increasingly complex, and as turnaround times dwindle, a for loop and ssh just don’t cut it anymore. To keep up, system administrators need better tools. To get ahead, system administrators must share tools.

Puppet, written in Ruby and released under the GNU Public License, centralizes and unifies a significant number of system administration tasks. Better yet, Puppet is a toolkit to build other tools all of those standardized to run within Puppet.

Puppet is a centralized system administration package. A typical Puppet deployment includes one Puppet server and many Puppet clients. The Puppet server, which is usually given a CNAME of puppet to make it easy to locate, is the central repository for system configurations. Each Puppet client draws instructions from the Puppet server, such as update a system file, restart a service, or create a new user.

It’s most common to run Puppet in client/server mode; however, you can also use Puppet standalone to manage a single machine.

Pulling the Strings

Let’s jump in and create a client/server Puppet configuration. (See the Puppet Installation Guide for platform-specific installation instructions. Optionally, most distributions have packages for Puppet.)

Puppet configurations are built out of resources, or all of those things you have to maintain on your network packages, files, users, services, and so on. Resources are specified much like a hash would be in most languages:

user {
luke:
    comment => "Luke Kanies",
    shell => "/usr/bin/bash",
    ensure => present
}

Believe it or not, this is a small Puppet script. If you run it on a node, it creates the user luke with the named shell and comment:

$ sudo puppet ~/bin/test.pp
notice: //User[luke]/ensure: created

puppet is a stand-alone interpreter. You can use puppet to test small chunks of Puppet code or to configure a system not running as part of a client/server installation.

If you run the previous script (as root, because it creates a user) and the user already exists, Puppet modifies the comment field to match the given comment element. Or, if the user exists and already has the same comment, the script does nothing.

In this way, all Puppet configurations are idempotent, meaning that they can be run multiple times with the same affect each time. If the host is already configured as specified, then nothing happens, but if there are differences, the host is modified as necessary.

Multiple Resources

No configurations are so simple as to be a single resource. Typical configurations have complex relationships among tens or hundreds of resources with complex relationships (see the Puppet Type Reference for a list of resource types), so Puppet provides ways of handling the relationships and the profusion.

The canonical example for related resources is a service and its configuration file:

file { "/etc/ssh/sshd_config":
    source => "puppet://fileserver.domain.com/files/ssh/sshd_config"
}

service { ssh:
    ensure => running,
    subscribe => File["/etc/ssh/sshd_config"]
}

This tells Puppet how to retrieve the ssh configuration file, specifies that the ssh service should be running, but also configures the ssh service to subscribe to any changes in its configuration file, where the response to those changes is to restart the service. Hence, Puppet automatically restarts the service if it modifies the file.

Puppet also uses the relationships in the configuration to determine execution order. You can specify the service and then the file subscription, and Puppet still applies the file first, because Puppet guarantees that a resource’s dependencies always get applied before the resource is manipulated.

Of course, hundreds of resources in a single file wouldn’t work well, so Puppet provides resource classes for organization. You normally put all related resources into a single class, and then put each class in a separate file::

# services/ssh.pp
class ssh {
    same code as above
}

# site.pp, the main configuration file
import 'services/*'
		  include ssh
		  

Puppet’s import statement supports globbing as exercised in 'services/*' you can add new services and Puppet automatically imports them. import reads the file and include evaluates the class.

This works for simple cases, but you will normally have many classes, and not all of your hosts will include all classes. Instead, you would normally specify each node individually, and declare which classes each should include:

# site.pp
# Assume sudo.pp, ssh.pp, apache.pp,
# and mysql.pp in services/
import "services/*"

class base {
include ssh, sudo
}

node webserver {
include base, apache
}

node dbserver {
include base, mysql
}

node default {
warning "Using default node for $hostname"
notify { "Using default node": loglevel => warning }
}

This configuration includes definitions for three nodes.

  • The first two, webserver and dbserver, both inherit the ssh and sudo classes from the class base. But webserver and dbserver also get specific functional classes that differentiate each from the other. The former inherits from apache, while the latter inherits from mysql.

  • The default node matches any system that doesn’t have a specific configuration. default produces useful log messages instead of just failures.

There are two types of logging in that default node: the first, warning, is a server-side function to log the message on the server; the second, notify, looks like a resource and is sent to the client. Whether you’re looking at the client or server you’ll see the message.

Building a Puppet Network

On most platforms, the default location for your

Puppet configuration is /etc/puppet/manifests, so install your Puppet code there.

Once installed, start the server, puppetmasterd. Figure One shows how. In this example, the server’s name is phage.

FIGURE ONE: How to start the Puppet server

luke@phage $ sudo puppetmasterd v manifest ~/bin/test2.pp
info: Starting server for Puppet version 0.22.4
info: Parsed manifest in 0.00 seconds
info: Creating a new certificate request for phage.reductivelabs.com
info: Creating a new SSL key at /etc/puppet/ssl/private_keys/phage.reductivelabs.com.pem
info: Signing certificate for CA server
info: Signing certificate for phage.reductivelabs.com
info: Listening on port 8140
notice: Starting Puppet server version 0.22.4

When puppetmasterd starts, it creates a certificate authority (CA) to sign a certificate for the server itself, and then listens for client connections. A Puppet client’s initial request asks the server to sign its certificate.

How signing is accomplished is completely up to you. You can generate the certificates and deploy them manually or at install time; you can sign the certificates manually when clients ask for them (each client reconnects every five minutes to ask for a signature); or you can automatically sign certificates based on host names or IP addresses. The easiest technique is to just sign certificates as they come in you’re usually sitting at the keyboard bootstrapping the client anyway, so you can be pretty confident of the trust.

Figure Two shows what it looks like to start a new client. The machine name is fullerene.

FIGURE TWO: Starting a new Puppet client

luke@fullerene $ sudo puppetd test
info: Creating a new certificate request for fullerene.reductivelabs.com
info: Creating a new SSL key at /etc/puppet/ssl/private_keys/fullerene.reductivelabs.com.pem
warning: peer certificate won't be verified in this SSL session
notice: No certificates; exiting

puppetd is the Puppet client daemon. test is useful when running puppetd interactively, because it enables verbose output and sets a few other useful parameters. The option also forces the daemon to exit immediately if it does not successfully get its certificate signed, which is what happens here. The certificate must be signed by the server’s certificate authority to be allowed to connect for any other services.

Notice that the server name isn’t specified on the command-line. Puppet defaults to the server name puppet. As mentioned at the outset, if you set up a CNAME for puppet, you shouldn’t need to configure your clients during initial setup.

Now, head to the server and sign, so the client on fullerene can connect:

luke@phage $ sudo puppetca sign fullerene.reductivelabs.com
Signed fullerene.reductivelabs.com

puppetca is Puppet’s interface to its certificate authority. Use list to get a list of waiting certificate requests.

Once the certificate is signed, you can run the client again and see that it’s getting its configuration:

luke@phage $ sudo puppetd test
warning: peer certificate won't be verified in this SSL session
notice: Got signed certificate
notice: Ignoring cache
info: Caching configuration at /var/puppet/state/localconfig.yaml
notice: Starting configuration run
warning: Using default node
info: Sent transaction report in 0.52 seconds
notice: Finished configuration run in 1.08 seconds

Initially, the client starts with no certificate, but it gets the signed certificate right away and continues. It retrieves the configuration from the server (you can follow this process along in the server’s logs, too), caches it locally (in case the server’s down on the next run), and then applies it. Here, all the client does is emit the warning about using the default node.

Now that you’ve got the data flow, you can configure your node as appropriate; it’s just a question of getting the right classes to each node.

And yes, it really is that easy. You can have a single script that does very little, or just a few services defined and a default node that handles most cases. Of course, things can get a lot more complicated, but they don’t have to; you can start out simple, and as your needs become more sophisticated, your configuration can grow to match. You should have a functioning client/server set up in half an hour, though.

Puppet’s wiki, located at here, has a lot of useful information to get your system set up to accepted best practice quickly.

Unfortunately, system administration is not simple; you can’t just have a fixed list of classes and resources. Puppet provides a lot of mechanisms to handle dynamism, and its strengths really lie in allowing you to manage the complexity of your network. Let’s see some examples.

Per-Node Customization

You probably noticed the $hostname variable in the node definition file; a datum such as it is called a client fact, because the tool facter provides them (and, well, because they are facts). facter was written before Puppet; it’s just a simple tool for retrieving information about a host.

Every time a Puppet client connects to the server, it asks facter everything it knows and then passes that information to the server. You, too, can ask facter for everything it knows, as shown in Figure Three.

FIGURE THREE: The facter tool provides a system profile

luke@phage $ facter
domain => reductivelabs.com
facterversion => 1.3.7
fqdn => phage.reductivelabs.com
hardwaremodel => i386
hostname => phage
ipaddress => 192.168.0.7
kernel => Darwin
kernelrelease => 8.9.1
macaddress => 00:16:cb:a4:37:fa
operatingsystem => Darwin
operatingsystemrelease => 8.9.1
puppetversion => 0.22.4
rubylib => /Users/luke/lib/ruby:/Users/luke/puppet/lib
rubysitedir => /opt/local/lib/ruby/site_ruby/1.8
rubyversion => 1.8.4

The output in Figure Three is significantly trimmed down, but it nonetheless shows quite a bit.

All of the facts from facter are set as variables during the processing of a configuration file, so they can all be used to make decisions about what classes to include or how those classes should be configured. You can also easily add new facts.

Puppet supports basic conditional structures like if/else and a case statement, along with a “selector” control something of a simplified case control structure useful for selecting the appropriate value:

$rootgroup = $operatingsystem ? {
    FreeBSD => wheel,
    default => root
}

This sets $rootgroup to wheel on FreeBSD, but root otherwise. You can see this is useful for handling heterogeneity you don’t want to worry about this kind of detail all over the place, so use this basic selector at the top of your configuration and then just use the variable from then on.

Naming

How you name your resources in Puppet is critical.

*Resource names usually map directly to the system. The name you give a user is the name used with useradd, and the name you give your package is the name used with apt-get, to use examples suitable for Debian.

*The other reason is a bit more subtle. Puppet does what it can to make sure unrelated parts of your configuration do not overlap, and it uses the resource names to do this.

For instance, you can’t have two classes (say, an apache class and a mysql class) attempt to manage the mysql package, because Puppet wouldn’t know which class should win if there are differences. Puppet’s parser keeps a list of all defined resources and throws a parse error if one class attempts to specify details of a resource already specified by another class.

This often initially feels like a restriction, but it’s easily solved by abstracting the common parts of your configuration into a single class, which can then be required by the previously-conflicting classes. In this example, you might have a mysql class that handles installing the package, and then a mysqlserver class that includes the mysql class and goes further to start the service.

Naming matters to you and to the machines you’re managing, too, in that different operating systems don’t always call a resource by the same name. For instance, the ssh daemon is sometimes called ssh, sshd, and openssh. To support this, Puppet provides two kinds of names: a title, which is how you (the human) will normally refer to a resource, and a name, which is how the computer refers to the resource.

For instance, here’s a more portable version of the ssh configuration from above:

file { sshd_config:
    name => $operatingsystem ? {
	Darwin = "/etc/sshd_config",
	Solaris => "/opt/csw/etc/ssh/sshd_config",
	default => "/etc/ssh/sshd_config"
    },
    source => "puppet://fileserver.domain.com/files/ssh/sshd_config"
}

service { ssh:
    name => $operatingsystem ? {
	Darwin => "com.openssh.sshd",
	Solaris => openssh,
	Debian => ssh,
	default => sshd
    },
    ensure => running,
    subscribe => File[sshd_config]
}

This is clearly more work, but it’s unfortunately necessary because operating system vendors make no attempt to be consistent in, well, anything. The title is used to handle relationships. Puppet uses both names, but it’s best to use the title for relationships, since it works on all platforms.

Once you have these cross-platform titles described in one place, you never have to think about them again. You can just use the title and know that your references are all cross-platform.

Providers

Some resource types, like files, are inherently cross-platform because they behave similarly on all Unix- like systems, but others, like packages, differ drastically from one platform to another. To handle this, Puppet has providers for these resource types, which handle a given implementation of that resource type.

Puppet has package providers that handle apt-get, yum, blastwave, and much more. It is usually intelligent about choosing an appropriate default for your platform, but you can override it to use any other functional provider. For instance, if you want to use aptitude instead of APT, it’s pretty easy:

package { ssh: ensure => present, provider => aptitude }

Templating

It’s best to use native Puppet types to do your work, because you’ll get more useful logging, among other things, yet sometimes you just need to generate a file. Of course, you can pull whole files down using the source parameter to file, but sometimes files need to vary somewhat. To enable this, Puppet supports templates with ERB, which ships with Ruby. Here’s a simple example template to create your resolv.conf file:

domain <%= domain %>

<% nameservers.each do |server| %>
nameserver <%= server %>
<% end %>

The variables domain and nameserver build resolv.conf. You’d use it like this:

# $domain is set by Facter
$nameservers = ["ns1.$domain", "ns2.$domain"]
file { "/etc/resolv.conf":
    content => template("resolv.erb")
}

Template support is handled with a simple function, template(). It uses the variables set inside Puppet to evaluate the template. You’d normally put the template into Puppet’s template directory, which defaults to /etc/puppet/templates, but you can also specify a full path.

Custom Types

There are times when you have higher-level resources that are composed of one or more resources, or when you need to model your resources somewhat differently to provide the right semantics. For instance, this definition creates a Subversion repository:

define svnrepo($path) {
    exec { "create-svn-$name":
	command => "/usr/bin/svnadmin create $path/$name",
	creates => "$path/$name"
    }
}

svnrepo { puppet: path => "/var/subversion" }

You can use the exec resource type to run shell commands to manage resources that Puppet doesn’t yet manage. In this example, it wouldn’t be much more work to have an exec for each subversion repository, but it would make the repositories harder to maintain and it wouldn’t be as obvious. Wrapping the exec in a definition allows you to be explicit about the what and why, not just the how.

This is a very simple example, but it demonstrates one of the fundamental units of reusability within Puppet. It helps avoid the copy/paste nastiness that occurs so often in automation tools. You can find many useful definitions in Puppet’s wiki. Consider sharing your own.

The Puppet Community

This exploration of some of Puppet’s more powerful features gives you an idea of what you can do. Even more impressive is what Puppet’s community actually does with the tool.

Because the whole goal of Puppet is to get all system administrators using better tools, community is critical, and Puppet has managed to attract a great community of all skill levels and motivations, from newbies, to developers just looking to make the problem go away, to harried educational system administrators, to veterans always looking to go home earlier. There’s great conversation on the Puppet mailing list; a very active IRC channel in #puppet on irc.freenode.net; and many examples in the wiki. The community is working on a CPAN- like equivalent for Puppet) and enhancing the documentation.

Why Puppet?

With any number of both commercial and open source centralized system administration tools available, why choose Puppet?

I spent a couple of years heavily involved in the Cfengine community, contributing articles, documentation, and code, but its author is focused on research, rather than production. This resulted in enough problems with my customers (I was consulting at the time) that I found that I couldn’t build a business around it.

After some experimentation with other tools, including LCFG and Bcfg2, I saw that everyone was making most of the same mistakes that Cfengine made: the tools didn’t model the systems being managed as humans perceive them, which drastically limits the reusability of configurations and painfully increases the amount of work it takes to create them.

Humans think about resources: users not /etc/passwd; packages, not dpkg. Yet most tools force you to think about those file formats or individual commands. I wanted to build a tool that directly exposed those resources in a way that gave me access to the details I care about, but allowed me to ignore all of the implementation bits that don’t matter to me. No, I actually don’t care whether it’s crontab u luke e or crontab e luke, I just want to add a cron job. Really.

With a cross-platform library that exposes these resources directly, by calling the correct commands or parsing and generating the right files, any other tool can just reuse that library without having to worry about implementation details, like Puppet users are doing today.

I knew the library wouldn’t be enough, though, because people use tools, not libraries, so I built a simple language, heavily influenced by both Cfengine and Ruby, that allows you to specify these resources and not much more. You’ve already seen how this language is focused on further enabling platform heterogeneity, which clearly needs to be supported throughout the entire stack.

Puppet’s language and library are its core, in that they define the limits of its expressibility and functionality, respectively, but either one can be used without the other. I have already demonstrated using another language with Puppet’s library, for example, and Puppet has an XMLRPC interface that others are already using directly to do other work. You can stay within Puppet entirely and still use it in ways that you can’t easily use other tools.

For instance, I’ve written a simple tool, ralsh (for “Resource Abstraction Layer SHell”), that can query the state of local or remote resources:

luke@culain $ sudo ralsh host phage user luke
user { 'luke':
    shell => '/bin/bash',
    uid => '501',
    gid => '501',
    ensure => 'present',
    password => '********',
    comment => 'Luke Kanies',
    groups => ['appserverusr','admin','appserveradm'],
    home => '/Users/luke'
}

This does a live query of the user luke on phage, and the same API supports writing. You could use it to copy users from one host to another, or diff individual files or package lists on one or more hosts. This wouldn’t be possible if I hadn’t developed each component of Puppet to stand on its own.

Of course, even the best tool isn’t enough in a vacuum. I’ve been doing everything I can to grow the community. I’ve given talks in five countries in the last six months (mostly at open source conferences and meetings); I’m building partnerships with companies and projects of all sizes; and I spend a lot of time just helping the community get the most out of the product. Hopefully, the Puppet community can take the entire industry the next step forward.

Moving Forward

It’s long past time to stop using ssh and a for loop to manage a network of systems; system administrators must adopt widely-used tools and refine and expand those tools to satisfy a wider set of users.

Toward that end, Puppet has been developed to centralize and automate server administration. Its primary features are a cross-platform resource abstraction layer that exposes the operating system entities you really care about, while allowing you to ignore the implementation bits that just get in the way. Puppet also provides a language focused on allowing you to easily specify the configuration for your whole network. All of its components are stand-alone and reusable, but the collection as a whole provides a leap forward for sysadmins.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62