A grand experiment has begun. As protons collide, data flies around the globe landing in a cluster near you.
As you read this, physicists at CERN should be circulating the first batch of protons in the Large Hadron Collider (LHC). For the non-scientist, this may seem rather mundane — isn’t that what most physicists do, smash things together and see what falls out? In any case, this is no ordinary smashing. If all goes well, and if the little proton bumper cars hit each other in just the right way, there should appear something called the Higgs Boson. Now that is something you probably have not thought about for a while. There is a lot riding on these experiments. Indeed, finding the Higgs could be one of the great achievements of modern science. If the Higgs does not show up at the LHC party, there is some rewriting and head scratching that needs to be done in the world of physics. A little background may help as well.
The whole LHC/Higgs Boson thing can get a complicated, so I’ll try to give a version so easy a caveman can understand it. (Yo, there is also a video for those that need to bust rhyme with their physics lessons.) If you smash stuff it breaks into smaller pieces. If keep breaking the pieces you get to really small things called atoms. If you start to smash the parts of the atom, like the proton, you will get even smaller stuff flying out of the collision. This smaller stuff does not stick around very long, but it can be detected or its presence can be inferred by sorting at the collision track data. There is a special particle that, in theory, should show up in some of the collisions produced by the LHC. The reason these collisions are unique is that the LHC can produce proton beams reaching 7 TeV. (Tera electron volts). From the LHC page, “1 TeV is about the energy of motion of a flying mosquito. What makes the LHC so extraordinary is that it squeezes energy into a space about a million million times smaller than a mosquito.” In any case, the protons are moving VERY fast, and if the collisions are just right, the Higgs Boson may show up. The Higgs is special because it explains why things like you and me have mass. Not in terms of how much pizza you eat, but why things like photons don’t have mass, but things like protons (that are used to make Pizza by the way) do have mass. This experiment requires computing resources on a global scale. It also will be create a massive amounts of collision data. Enter the grid and clusters.
When someone uses the term massive amounts of data, I usually think, “well it is all relative.” When the LHC is in production, they will be generating massive amount of data at an expected rate of 27 TB or raw data per day, plus 10 TB of event summary data. That is a lot of data. This data will then be sent out, in almost real time, to the eleven Tier 1 academic institutions in Europe, Asia, and North America, over dedicated 10 Gb/sec links. They will not all get copies of all the data, but rather they will each be responsible for different portions of the raw data. The Tier 1 sites will make â€œskimsâ€ of these events where data are split into different sub-samples that are enriched to show certain kinds of events. Feeding off the Tier 1 sites will be over 150 Tier 2 academic sites. The actual hunt for the Higgs (and other new physics) will be at the Tier 2 sites. There is a very nice animation depicting the whole data flow process. It is really quite amazing.
The LHC data processing effort is without a doubt a worldwide computer. Using grid, storage, and cluster technology, a world wide computer of the largest scale will jump to life when collisions at a smallest scale take place. There is a certain kind of irony in that kind of experiment. But that is not all. To help build the LHC, the LHC@home project was developed. A program called SixTrack was developed by Frank Schmidt. The program simulates particles accelerating through the 27 km (17 mile) LHC ring to find their orbit stability. The orbit stability data is used to determine if a particle in orbit goes off-course and runs into the tube wall. If such an event happened the accelerator could be damaged. Not to mention some sore protons.
Interestingly, in particle physics circles, a cluster is usually called a compute farm. The name comes from how the compute nodes are used. Much of the computation runs as a single process and as such is usually “farmed” out by a master node. And, there are a lot of jobs that will need to be run. In a way, this data parallelization starts at the source (CERN) and moves out to the Tier 1 and Tier 2 sites where much of the actual work gets done. It is part of the global machine design. Much of this huge endeavor is based on GNU/Linux, Globus, Condor, and a slew of other middle-ware packages. That “open thing” again, it just seems to make those world changing monumental scientific projects works a little better.
Douglas Eadline is the Senior HPC Editor for Linux Magazine.