Fun and Games

Can HPC be more fun? Maybe a little more intelligence and graphics can put a smile on your face.

Story time. Just this morning I go to the supermarket to pick up a few things. No need for a shopping cart, a few items cradled in my arms. I get to the self serve cash register and place the items on the little spot just before the scanner. The mechanical voice starts telling me to move my stuff. I’m thinking, were am I going to move my stuff, it has not crossed the magic scanner line. So I move it to conveyor belt, which set off another scolding by the blasted machine. Just as I was about to start talking back to this thing, a worker comes over pushes some buttons, scans a secret bar code, and puts my stuff back at the beginning or the counter right where I had it. Everything seems fine now.

I was, by the way, ready to yell at this stupid automatic cash register. (Brief sidebar. I tell my teenage daughter not to use the word stupid, particularly when referring to other people. I tell her, use a more politically correct word like “non-optimal” instead, if they really are stupid they won’tn know what it means anyway and will not get insulted by the fact that you just insulted them.) Continuing with my travails. So I almost yelled at this non-optimal device. I know it cannot hear me, but yelling at non-optimal devices does have its merits at times. It is frustrating because as I was leaving I thought, they can make these things easier to use. Imagine if you I a similar experience, but instead of repeating itself over and over, the blasted machine said, “There seems to be some type of problem. Instead of repeating myself endlessly, I’m going to signal my human supervisor to help. Want to hear a joke while we wait?” Of course, with a little bit more ingenuity it could almost be fun to use.

On my way to the store, I also was wondering what to write about this week. Now morning of non-optimal design is over, I thought, how could clusters and HPC be more fun and easier to use? My belief is that by making things more “intelligent” they can become more interesting and fun, like the automatic cash register. I thought about three areas; administration, end-users, programming. And, by the way, I’m open to your suggestions as well.

Let’s start with system administration. I have to go on record that I have on occasion wanted to take a baseball bat to a computer or two and I am not the violent type. Usually, my sensibilities got the better of me and I just kicked the wall or something. In any case, I remember the first time I read about this hack. Pure genius. How many times had I wanted to just go shotgun those hard to kill processes on a cluster. I was hoping this technique would catch on and a whole series of administration tasks could be done via this type of interface. Of course, in this politically correct day and age you have to be careful about how you delete users.

Another thing I think would be interesting is a cluster monitor that uses AI (Artificial Intelligence). The idea would be to have a daemon running that would learn about what the normal operation of a cluster. It would then alert the admin if it detected something weird. Kind of like, “I noticed that program zeta is taking longer than usual?”, or “I am noticing a change in the communication patter of program alpha”, or “Just a moment. Just a moment. I’ve just picked up a fault in the AE-35 unit. It’s going to go 100% failure in 72 hours.” A full set of alarms and some Bayesian statistics could be easily added to make your assistant smarter still. Of course, audio output is often not the best, but text messages from you cluster would certainly be nice. Which leads me to the idea of texting (or emailing) your cluster. Would it not be nice to text your cluster and ask “How is going, any problems?” and have it reply, “Everything seems normal, there is one node that seems to be acting strangely, I’ll keep you informed.” Of course, having a application on your smart phone that provides cluster status at a glance might be fun as well — definitely give you an edge in the I have more cool tech than you department.

With end users things can really get interesting. Why not have schedulers send messages to you like the following, “The last 6 times your program ran just fine, this time it stopped after 2 hours, I see you have 3 more jobs in the queue, you want me to wait before I try these ? Let me know, Bill the scheduler daemon”. Of course, data visualization is one very cool things about HPC, but the cool images always happen after the computation is done. Would it not be fun to watch program process as if it were a car, motorcycle, or airplane. The direction could be related to cumulative processor loads. One way to show this is to assume an ideal load of 100% is straight line. The less the cumulative processor/Core loads the more the path turns. Through in some scenery even hills for disk I/O and you have something fun to watch. A twisty-turney trip up and down hills indicates high I/O while speeding down the salt flats indicates blazing speed. Subsequent runs, or runs on different hardware, can be compared to how much better or worse your program traveled to the destination. No need to mention what happens when your program crashes.

Using the same traveling analogy, programmers can learn to write codes with less turns and hills. Picture a programming tool that as you wrote your code traced how your program travels as described above. Start introducing a bunch of communication, then you start turning away from your destination. Of course, data on the type of machine and numbers of cores needs to be loaded, but some estimates might allow a programmer to see what happens when too much communication starts slowing things down more than speeding them up. The vehicle path may even start to head backwards in this case. There are plenty more ideas out there I am sure. Let your mind wander. Some may even think adding a little fun to HPC is a non-optimal pursuit. To them I say, “Ever hear of joke warfare. I didn’t think so. Stand back. There were three peanuts walking down the road. One was assaulted.”

In the end, we still need to get work done and all the fun and games should not keep us from our appointed tasks. There is, however, no reason why we cannot have as much fun as those people on television that use computer like devices to save the day. Get it, “a-salted.”

Comments on "Fun and Games"


My h refs in the above seemed to have failed leaving blank spots.
these were


Ubiquitous Computing

and LavaPS a Lava Lamp Monitor


Doug makes a point in this article that I think is subtle but very important. The idea is that we need to make monitoring much, much better. In fact, I will go as far as to say that monitoring runs neck and neck with programming as the biggest problem facing HPC (clusters in particular).

Monitoring in this case is not just monitoring the nodes from a sysadmin perspective, but also monitoring user’s behavior and tendencies as well as the progress of computations, etc. However, having been a sysadmin, I have a tendency to look at that aspect first.

One of the first things we as a community could do, is to develop a set of monitoring tools that are really more “push” than “pull”. The idea is that the compute nodes monitor themselves and when there is a substantive change in whatever their are monitoring, they push only that value to the master node or to a monitoring node. This cuts down on traffic, etc. It might increase OS jitter a bit since the nodes have to be self-monitoring to some extent, which probably means daemons waking up and doing something peridocially. But I think overall it makes life a bit easier.

For example, why not put a simple USB stick in the front USB port of each node. Then the monitoring data is written to the stick fairly quickly. You could even stash /var/log/messages on the stick. If the node fails or if a job hangs, you could either pull the data from the stick over the network, or just go grab the stick, swap in a fresh stick, and then read the data on the stick for any clues (but all sysadmin’s know that if a node fails, it usually doesn’t leave a trail of breadcrumbs that are easy to follow). You could even dedicated a single core to do nothing but monitor the node (this might be more palatable if we increase the core count yet again – but don’t blink that’s coming later this year from both AMD and Intel).

There are other aspects to monitoring as well. Doug’s comments about visualizing the results of a computation while it’s still running has been done in the aerospace world for several years. There is a package called PV3 that allows you to send back solutions from a running job and examine them. This way you can tell if a job seems to be running well or not. You can also have the running job dump the current solution periodically and then you can post-process it to examine the current solution (yep – this requires a pretty decent IO subsystem as well as a good post-processing computational capabilities and network). In the aerospace world they’ve adopted the phrase, “computation steering”.

Lots of cool things to be done in this area. It’s just not sexy enough to get funding from a gov. agency and most companies won’t fund it either. It’s not going to improve the speed of your applications to any noticeable degree. But it can improve the productivity and reduce some of the administration challenges associated with clusters. I most definite worthwhile effort in my opinion but unlikely to be done by anyone.

Regardless – thanks Doug for the reminder that we need to think more creatively about how to approach clusters.


Hmm is anyone else encountering problems with the images on this blog loading? I’m trying to figure out if its a problem on my end or if it’s the blog. Any feedback would be greatly appreciated.

F*ckin’ amazing things here. I’m very glad to see your article. Thanks a lot and i’m looking forward to contact you. Will you kindly drop me a mail?

Fun and Games | Linux Magazine. Your mode of telling all in this piece of writing is genuinely good, all be able to simply know it, Thanks a lot.

Sites of interest we have a link to.

YlR05o rgjxufxpyvbs, [url=http://wdocwokbvair.com/]wdocwokbvair[/url], [link=http://kmzdvqecomdj.com/]kmzdvqecomdj[/link], http://hdremhgdrtpi.com/

Leave a Reply