dcsimg

Over There Vs. Over Here

Matching the right solution with the right problem takes skill, flexibility, and a little luck.

It is funny how solutions develop. Very often, a “perceived problem” is identified and discussed, technical solutions are put together, and the problem is solved. Many times, however, the problem may turn out to be not as big a problem as you may have thought. Of course, along the way, some really nice technology may result, but it may not be the “killer solution” (i.e. big success, everyone uses etc.) you were expecting. Then there are other problems, often unforeseen, that find a solution from some other unforeseen area. Clusters “sort o

It is funny how solutions develop. Very often, a “perceived problem” is identified and discussed, technical solutions are put together, and the problem is solved. Many times, however, the problem may turn out to be not as big a problem as you may have thought. Of course, along the way, some really nice technology may result, but it may not be the “killer solution” (i.e. big success, everyone uses etc.) you were expecting. Then there are other problems, often unforeseen, that find a solution from some other unforeseen area. Clusters “sort of” fall into this category. In fact, there were many who thought clusters just some kind of DIY fad.

I was reminded of this scenario by a recent post to the Beowulf Mailing list by Don Becker:.

BProc is based around directed process migration — a more efficient technique than the common transparent process migration. You can do many cool things with process migration, but with experience we found that the costly parts weren’t really the valuable ones. What you really want is the guarantee that running a program *over there* returns the expected results — the same results as running it *here*. That means more than knowing the command line. You want the same executable, linked with exactly the same library versions in the same order, with the same environment and parameters.

In years past, process migration and clusters was of great interest because it brought a unified process space across the entire cluster. (Still a great feature as implemented in Scyld’s BProc), but, as Don says it is not what seems to really important right now. I recommend you read the full post and jump over to our recent Interview with Don Becker. Don is a busy cluster kind of guy and when he speaks up he usually has something interesting and important to share.

The “over there” vs “here” problem is one of scale. With a small 32 node cluster it would seem like a non-issue. Bump that up by two or three orders of magnitude and you might begin to see how this could be a problem. The less experienced, might suggest loading the same thing on all the nodes. Sure, good plan, at first, if something were changed for any number of reasons, you could have a problem. Another question worth asking, is how to verify that the execution environment is what you want.

To be clear, Don, states that you don’t need directed process migration to ensure consistency, but BProc can be used to achieve that goal and provide other nice things. Which leads me to another thought.

One of the questions I am often asked is “What is the virtualization play for HPC?” I usually reply that there are issues that need to be resolved before virtualization and HPC walk hand in hand, but process migration in the form of check pointing would be a great thing to have. Thinking about the “over there” vs “here” problem in terms of virtualization, however, may just be the killer HPC/Virtualization application that solves a big problem.

Imagine, creating a tested working image of your application, operating system, and file system and running it on a virtual HPC machine. The “over there” vs “here” problem goes away because, “over there” is “what is here.” Of course, we talking about scale and pushing a large number of images out to thousands of nodes is an issue. And, notice I threw in the file system part. I believe before HPC can be virtualized (or “clouded”) the I/O issue (both compute and file system) needs to be resolved. I suspect this will be through some form of I/O specification that travels with the job image. The specification will allow the cloud the run the application on the right hardware. The current cloud definition is rather loose when it comes to I/O (i.e. it will be there, just can’t say how exactly fast or consistent it will be).

Speaking of cloud issues, I read two interesting articles recently. The first seems to be a possible solution to what I consider a thorny issue: cloud security. That is, as soon as my data leaves my walls, I do not have 100% control over it. And, anything less than 100% means I cannot guarantee security. Of course you can encrypt it, but then to operate on it in the cloud, you need to unencrypt it in the cloud, which means it is still naked data. That is until, now. Recently, an IBM researcher has solved the problem of fully homomorphic encryption, which to you and me means the ability to use encrypted information without un-encrypting it. (i.e. data always remains encrypted which means the result is always encrypted). Problem solved. Nice work. When do we see the demo?

The other issue I read about was the lack of entropy in the cloud. (Entropy is a measure of randomness). Basically, a virtualized instance does not have access to some of the physical means to build up it’s “entropy pool” and thus could become more predictable. Since randomness is the key to security, this might make virtualized servers more vulnerable. Of course there are some ways to fix this, however, I thought about HPC applications first and how this could have an effect on Monte Carlo results.

To sum things up, it seems the HPC problem space is evolving. I noticed that I am talking about virtualization and cloud much more than in the past, but yet there is no big killer HPC service/application out there. One other thing I have noticed is that the more open the discussion, the more solutions seem to flow. I suppose that allows solutions to get from over there to over here, and vis-versa.


f” fall into this category. In fact, there were many who thought clusters just some kind of DIY fad.

I was reminded of this scenario by a recent post to the Beowulf Mailing list by Don Becker:.


BProc is based around directed process migration — a more efficient technique than the common transparent process migration. You can do many cool things with process migration, but with experience we found that the costly parts weren’t really the valuable ones. What you really want is the guarantee that running a program *over there* returns the expected results — the same results as running it *here*. That means more than knowing the command line. You want the same executable, linked with exactly the same library versions in the same order, with the same environment and parameters.


In years past, process migration and clusters was of great interest because it brought a unified process space across the entire cluster. (Still a great feature as implemented in Scyld’s BProc), but, as Don says it is not what seems to really important right now. I recommend you read the full post and jump over to our recent Interview with Don Becker. Don is a busy cluster kind of guy and when he speaks up he usually has something interesting and important to share.

The “over there” vs “here” problem is one of scale. With a small 32 node cluster it would seem like a non-issue. Bump that up by two or three orders of magnitude and you might begin to see how this could be a problem. The less experienced, might suggest loading the same thing on all the nodes. Sure, good plan, at first, if something were changed for any number of reasons, you could have a problem. Another question worth asking, is how to verify that the execution environment is what you want.

To be clear, Don, states that you don’t need directed process migration to ensure consistency, but BProc can be used to achieve that goal and provide other nice things. Which leads me to another thought.

One of the questions I am often asked is “What is the virtualization play for HPC?” I usually reply that there are issues that need to be resolved before virtualization and HPC walk hand in hand, but process migration in the form of check pointing would be a great thing to have. Thinking about the “over there” vs “here” problem in terms of virtualization, however, may just be the killer HPC/Virtualization application that solves a big problem.

Imagine, creating a tested working image of your application, operating system, and file system and running it on a virtual HPC machine. The “over there” vs “here” problem goes away because, “over there” is “what is here.” Of course, we talking about scale and pushing a large number of images out to thousands of nodes is an issue. And, notice I threw in the file system part. I believe before HPC can be virtualized (or “clouded”) the I/O issue (both compute and file system) needs to be resolved. I suspect this will be through some form of I/O specification that travels with the job image. The specification will allow the cloud the run the application on the right hardware. The current cloud definition is rather loose when it comes to I/O (i.e. it will be there, just can’t say how exactly fast or consistent it will be).

Speaking of cloud issues, I read two interesting articles recently. The first seems to be a possible solution to what I consider a thorny issue: cloud security. That is, as soon as my data leaves my walls, I do not have 100% control over it. And, anything less than 100% means I cannot guarantee security. Of course you can encrypt it, but then to operate on it in the cloud, you need to unencrypt it in the cloud, which means it is still naked data. That is until, now. Recently, an IBM researcher has solved the problem of fully homomorphic encryption, which to you and me means the ability to use encrypted information without un-encrypting it. (i.e. data always remains encrypted which means the result is always encrypted). Problem solved. Nice work. When do we see the demo?

The other issue I read about was the lack of entropy in the cloud. (Entropy is a measure of randomness). Basically, a virtualized instance does not have access to some of the physical means to build up it’s “entropy pool” and thus could become more predictable. Since randomness is the key to security, this might make virtualized servers more vulnerable. Of course there are some ways to fix this, however, I thought about HPC applications first and how this could have an effect on Monte Carlo results.

To sum things up, it seems the HPC problem space is evolving. I noticed that I am talking about virtualization and cloud much more than in the past, but yet there is no big killer HPC service/application out there. One other thing I have noticed is that the more open the discussion, the more solutions seem to flow. I suppose that allows solutions to get from over there to over here, and vis-versa.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62