Become a Lazy SOB Administrator

Cluster Administration is possible and even easy if you focus on the basics.

Recently, I came across an article entitled Lazy Linux: 11 secrets for lazy cluster admins. Most cluster admins who read it will probably chuckle and nod at the authors insights. There is nothing like experience to help manage a Smoothly Operating Beowulf (SOB — what did you think it meant?)

One of the often cited difficulties in this market is finding experienced cluster administrators. Indeed, this is considered as a “hold back” to fielding a production cluster. It has always been my position that clusters are different, but not that different than administrating the common Linux server.

Indeed, much of cluster administration is just really good systems administration i.e. you need a very good understanding of certain aspects of server administration. Most cluster administrators usually have some “carry over” from other areas of computing. Each of these areas have some level of an educational infrastructure (manuals, mailing lists, freely available software, and even courses) that can be leveraged by those wanting to learn about clusters. The following is a list of topics that, from my experience, are needed to become a Lazy SOB Administrator. Surprisingly, resources can be found to support almost all the areas.

Message Passing Interface (MPI): This topic is often new to administrators, but because MPI has been around before clusters hit the big time, there are numerous books and classes that facilitate learning MPI.

Compilers: Most cluster experts have a good understanding of compilers and building code. Understanding that the long stream of error messages can be due to missing library (and easily fixed) prevents the sense of overwhelm that comes with trying to build that new software package in your environment.

Operating System Administration: Opportunities to learn about operating systems are plentiful. Three inch thick books are in good supply as well as certification classes and training. Scaling good administration is addressed in the above mentioned Lazy Linux: 11 secrets for lazy cluster admins.

Commodity Hardware: Most clusters use off-the-shelf hardware. Resources for understanding commodity hardware are also plentiful. Although nothing works like have a motherboard or two with which to test ideas.

Schedulers: Resource scheduling has been around ever since people started sharing computers. There are resources to help learn about schedulers and like most things, a little hands-on time does wonders.

Networking: Networking is perhaps the toughest area to find good information — even in cluster courses. For many other cases non-optimal network performance works quite well for just browsing the web or transferring a file. Although much of Linux networking is plug-and-play, there is room for optimization when it comes to clusters. High end interconnect networks have in the past been even more obscure. Fortunately the market seems to be focusing on either 10 GigE or InfiniBand solutions and many of the high end network companies are moving in this direction as well.

Parallel Computing: This topic is perhaps the least understood of all the topics. It has been studied for quite a while and there is still no consensus on the best way to use multiple processors. For system administrators, parallel computing is often about removing bottlenecks that slow down program execution. These bottlenecks can involve one or many of the above topic areas. The multifaceted nature of these issues is why solving cluster problems requires good administrative practices — i.e. in my experience you can’t point and click your way though cluster optimizations.

In summary, the bulk of knowledge for good administration is “already out there.” Clusters use this knowledge in a very focused way and experience is the best teacher. Second to that, an open infrastructure that facilitates open discussion, co-development, and open problem solving is perhaps the second best teacher we have.

I could write more on this topic as it also begs the education issue as well. But, we will have to wait because the SC08 bus is almost here. For those that missed last weeks column don’t forget to come by the Beowulf Bash Party on Monday Night. In addition, I just got word that the camera guy will with me again this year. In case you did not see us, last year able camera man Vincent Hong followed me around the show floor while I interviewed participants and vendors alike. Amazingly, the “give Doug a mike and follow him with a camera idea” seemed to work. We are still working on getting some to this video posted, but expect so see a lot more video on the site real soon. They tell me I have a non-standard approach, but I generally get the point across. This year if you see me, run and don’t forget to smile.

Comments on "Become a Lazy SOB Administrator"


COmplete waste of time/paper (sic); could understand this as casual conversation at a party, to get some attention … Also, why is “parallel computing” figuring in here? It’s not the business of the admin to write and optimize the programs. How about some resource allocation and control ? THAT would be admin business, if someone hogs the cluster !!

Again, what a waste of time and energy !


Larry Tobos


Researches: gucci Will certainly Have A Vital role In Any Site administration Longchamp sac sortie http://www.rep-search.com/


An attention-grabbing discussion is price comment. I feel that you should write more on this matter, it may not be a taboo topic however usually persons are not enough to speak on such topics. To the next. Cheers


Magnificent beat ! I would like to apprentice while you amend your site, how could i subscribe for a blog website? The account aided me a acceptable deal. I had been tiny bit acquainted of this your broadcast provided bright clear idea


Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>