The State of Open Source System Automation

The days of DIY system administration are rapidly coming to a close. Why? Because the open source tools available are just too good not to use. Presenting Bcfg2, Cfengine, Chef and Puppet.

This summer the USENIX 2010 conference in Boston hosted the first Configuration Management Summit on automating system administration using open source configuration management tools. The summit brought together developers, power users and new adopters.

Why Configuration Management?

Internet use is growing and new services are appearing hourly.The number of servers (both physical and virtual) is becoming uncountable. Automation of system administration is a must to handle the deluge; else swarms of sysadmins would be needed to handle all these systems.

Drivers for automating system administration:

  • In companies with multiple sysadmins working the old way,in interactive root sessions, there is a potential for sysadmins making changes at the same time to step on each other’s toes (and on the config!);
  • system administration is a relatively new profession,without a standard curriculum, so practitioners have different philosophies and practices. Going from organization to organization, it is a challenge for a new sysadmin to learn:
    • how is the system setup,
    • why was it setup that way,
    • how it needs to be setup to keep operating,
    • how to set it up that way again in case of disaster or normal growth.

Automating system administration addresses all the above and makes new things possible.

For example, a CM tool can respond faster than a human sysadmin to a deviation from configuration policy to remedy it or it may automatically instantiate, configure and bring online a new virtual server instance if an old one dies.

There are over a dozen different CM tools actively used in production.

So many choices can bewilder a sysadmin searching for a CM tool.

The summit included representatives for 4 tools: Bcfg2 (pronounced “bee-config 2″), Cfengine, Chef and Puppet.

The summit had three parts: 4 presentations; a panel session; and a mini BarCamp with 6 presentations. The panel session was quite lively.

I will attempt to compare and contrast the 4 tools; however using any robust configuration management tool, with discipline, is better than administering systems manually.

Four Tools

Bcfg2: Came out of Argonne National Lab. Lightweight on the node. Each server can easily handle 1000 nodes.Relies on centralization. Uses a complete model of each node’s configuration,both desired and current.

Strengths: Reporting system and debugging.

Weaknesses: Documentation. (New set of documentation is coming out now, but still weak in examples.) Sharing policies between sites is not easy; group names need to be standardized first.

Cfengine: Came out of Oslo University. Strong philosophy of allowing decentralization and potential local autonomy. Oriented toward consensus building as opposed to top-down policy dictation. Underlying philosophies are promise theory, convergence and self-healing. Also has a healthy paranoid streak and an impressive security record (only 3 serious vulnerabilities in 17 years).

Strengths: Highly multi-platform (it even runs on underwater unmanned vehicles!).Lightweight. Largest userbase – more companies using it than all the other tools combined! Able to continue operating under degraded condition (network down,for example).

Weaknesses: It’s hard to get started because there is a lot to learn.

Chef: Has its origins in Ruby-on-Rails world in the cloud. Grew out of dissatisfaction with Puppet’s non-deterministic ordering. Resilient (each node can run stand-alone if the server disappears). Sequence of execution is tightly ordered.

Strengths: Cloud integration (automating provisioning and configuration of new instances in one fell swoop). Multi-node orchestration (more below). Reusable policy cookbooks and highest degree of recipe reuse amongst its users amongst the four tools.

Weaknesses: Attributes have 9 different levels of precedences (role, node, etc.) and this can be daunting.

Puppet: Grew out of dissatisfaction with Cfengine 2. Centralized model, however if the server is unreachable, node agents will still run, applying the cached configuration. Simple and human-readable DSL gives safety at cost of flexibility. Determines and runs delta changes only.

Strengths: Large community of users (over 2000 users on the Puppet mailing list).

Weakness: Puppet server right now is a potential bottleneck (which is solved by going to multiple servers.) Execution ordering can be non-deterministic. (But reports will always tell you what succeeded and what failed. And order can be mandated if order is required.)

Next: Bcfg2

Comments on "The State of Open Source System Automation"

redmumba

We run Bcfg2 pretty extensively at our offices, and it certainly has its pluses and minuses. However, one of the things that is a real stick in the side is TGenshi, the Bcfg2 templating system. One of the great things about TGenshi is, well, it allows you to add logic to your file–so you can generate files from the Properties plugin, dynamically encrypt passwords, etc.. Great feature, right?

Debugging is AWFUL. The errors TGenshi throws by defaulty largely generic; for example, if you have a 100 line Python file being run in the template, and an error occurs anywhere, you’ll just get a message saying “Could not generate this file.” No line number, no raising of the original Python exception, nothing. If you want to do any serious work, you’ll have to write your own wrapper to catch errors–or at least a line number for what failed.

Bcfg2′s strongest feature is keeping everything the same on every server, so I would consider combining this for day-to-day maintenance, and maybe Puppet or cfengine for deployment.

Andrew

Reply
jblaine

We’ve been using cfengine 1+2 for 11 years.

Reply
vicente

we use cfengine2 with some logic of our own to control around 130 computers, and is very nice and powerfull, when you get to understand it.

Now we are thinking to get in puppet. I’d like to post soon to tell you how it was.

Reply

    We just ran into the sccm cenlit performance issue. We had a script blow out the sccm cenlit to the domain, and the support team did this on sunday at about 1am. Every sunday at 1am since then our VMware farm grinds to a halt. Cpu spikes, storage spikes.We found that sccm was launching a dir /s inventory on all of its cenlits on the 7 day aniversary. About 200 vms.Still not sure how we will fix this, but ideas would be appreciated.

    Reply

Definitely would love to start a website like yours. Wish I had the time. My site is so amateurish compared to yours, feel free to check it out: http://tinyurl.com/o55af8p Alex :)

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>