The Gig is Up: The Future of Commodity Processors

Manufacturers of commodity processors are looking to increase throughput by many means other than increasing clock speeds. As chips get cheaper and faster, Linux programmers must change the way they think.


May 2005

The Gig is Up: The Future of Commodity Processors

Smaller dies, larger caches, multiple cores, and even quantum mechanics promise to turbocharge tomorrow’s processors

Manufacturers of commodity processors are looking to increase throughput by many means other than increasing clock speeds. As chips get cheaper and faster, Linux programmers must change the way they think.

Joab Jackson

Expect no miracles from tomorrow’s computers. In many ways, computers of the near future, say, the next five years, will be very similar to what you have sitting on your desk right now. Teleportation won’t be a feature, nor will you be able to record, transmit, and play holographic images. In fact, the processors inside may not even be that much faster.
But make no mistake, great changes are afoot. It’s as if, after years of sudden spurts of speed gains, silicon-based commodity processors are now maturing, being designed with a newfound empathy for users. Tomorrow’s computing workhorses should better fit users’ needs, whether the need is playing a movie on a laptop or serving terabytes of interactive web pages to millions of web surfers.
Once, commodity processors gunned to be the fastest chip. Now, chips are competing to be the best chip for the job.

Why the sea change? Partially, it’s due to the limits of the technology. For one, AMD and Intel are quickly coming up against the physical limits of how fast they can make their processors. Oh, they can make chips faster — Moore’s Law will remain true well into the next decade — but making chips faster also makes them hotter and more power hungry, unacceptably so. Faced with practical limits, manufacturers must look at other ways to boost performance than just revving clock speeds.
The user population is also changing, bringing new needs. Users of laptops, handhelds, cell phones, and countless other devices judge the performance of their devices not only by their spriteliness, but how long the batteries last as well. Since the central processor is the chief power consumer on such gizmos, the pressure is on to powerful and efficient.
How do these changes affect the Linux programmer and the Linux kernel development team? The good news is that, in many cases, not only are Linux developers ahead of the game in designing software for the new style of chips, but the new processors are also directing their development.

Split the Work

No doubt about it, 2005 is the year of multi-core processors, or processors with two or more cores on the same silicon slab. Both AMD and Intel will introduce dual-core models later this year, and both chip makers say that multi-core processors increase performance without a correspondingly linear increase in power consumption or, inversely, heat dissipation.
“Multi-core allows us to stay within better power envelopes. We can run two cores at a lower processor speed, yet give you an aggregate performance of a higher clock speed,” said Margaret Lewis, a member of AMD’s consumer marketing team. For instance, the latest AMD 64-bit Athlon chip, the 2.6 GHz AMD Athlon 64 FX-55, consumes about 104 watts. The first dual-core Athlon chip, made on a smaller 90 nm core, will run two 2.4 GHz chips at a maximum of 110 watts. The jump represents an 85 percent performance boost with a mere 6 percent increase in power consumption.
The multi-core approach also makes more sense given that most people now run multiple applications, rather than one large resource hungry application, said Daniel Snyder, a spokesman for Intel. “You may have a messenger going, be working on a document, transcoding something in the background, and streaming some media at the same time,” Snyder said. While a single processor can juggle all these tasks, the approach is a bit like “lighting a stadium with one lightbulb,” Snyder said. Multi-core spreads the resources around.
To exploit a multi-core chip, however, application and system software developers have to write software that runs multiple process threads simultaneously — a bit of a challenge. “Writing multi-threaded isn’t the easiest thing in the world,” Lewis admitted.
According to William Weinberg, an open source architecture specialist for Open Source Development Labs (OSDL), Linux itself is well-equipped to manage multiple processor threads. For instance, the Linux Posix threads library (pthread) allows programs to spawn parallel, or concurrent threads. Most languages support multithreading, as well, and Linux high-performance computing (HPC) clusters widely use the Message Passing Interface (MPI), a library of message passing techniques. “A lot of the code is moving towards MPI, and some MPI is becoming less scientific and more commercial,” Lewis said.
It should be noted that chip makers have been moving towards using multiple threads even on single processors. Intel has been offering its tradenamed “Hyperthreading” for several years. Hyperthreading is a “way of tricking the OS into seeing two CPUs,” Snyder said. For its high-end web servers, Sun Microsystems is working on an 8-core SPARC chip for 2006 that will be able to execute 32 threads at once.
How far the chip makers will take multi-core remains to be seen. The downside of divvying up the work is that at some point you hit a point of diminishing returns, according to Martin Reynolds, a research fellow for IT research company Gartner. “Parallelization is very difficult to achieve when you have multiple processors working on the same data set,” Reynolds said.
Nonetheless, Intel’s Snyder remains bullish on multi-core, noting Intel chips could eventually run dozens of cores. “You could have core for every different process on your system,” Snyder said.
The multi-core approach is also favored by IBM. In January, IBM, along with Sony, introduced the basic architecture of their new Cell processor. The Cell will be built, ostensibly, for Sony’s PlayStation 3, though the team hinted it could be used for other multimedia devices. The processor consists of a PowerPC chip surrounded by eight digital signal processors, mainly acting as control logic, according to the PC enthusiast web news site, Ars Technica (http://www.arstechnica.com).
The Cell is actually part of a bigger trend in the field of electronics, that of System-on-Chip architecture. Chip makers for embedded and consumer electronics have been thinking this way for awhile — for proof, just open up your iPod. Instead of integrating multiple functions on to circuits or chips, microprocessor manufacturers are tying together specialized cores in a single small system, according to Louis Lome, an independent technology consultant and a former network technology research specialist for the Defense Department. The Defense Department, as well as many private companies, have been funding the development of a bus-based, packet-switched architecture called “Network on a Chip.”
In fact, the work of the Linux development community lends a hand in such initiatives.
“Technology is driven by new, small things that often take the most useful ideas from the big things, and eventually end up largely displacing them,” Linux creator Linus Torvalds, told Linux Magazine Editor-in-Chief Martin Streicher (see “The Emperor Penguin” online at http://www.linuxmagazine.com/2005-01/). “So while I enjoy seeing Linux supercomputers, I don’t think they are all that relevant for the future, except perhaps as a way to learn about issues that even the small devices will eventually start hitting.”
Symmetric multiprocessing, for instance, was a technique originally used to balance large applications across multiple servers, but now can be used to divvy up communications and graphics responsibilities on cellular phones, Torvalds said.
Motorola, Samsung, and NEC have all built or have announced that they will use Linux for cellphones. In many cases, Linux’s ability to juggle multiple threads across different cores is a major selling point.
Surprisingly, the Linux kernel team has had to do little to accommodate the mobile computing world, according to Weinberg. This is in no small part thanks to the work of a group of Linux developers from ARM. Think of ARM as the Intel of the embedded chip market. “ARM is incredibly aggressive in what goes on in the Linux enterprise space,” Weinberg said.

Grease the Skids

While in five years, chips will seem like they run faster, only some of the increase will come from clock speed and multiple cores. Chip makers are now looking at other ways to hasten processing.
Making sure the chip has something to process every cycle is one way to tackle the problem. As chips got faster, the more they went idle, their supporting mechanisms unable to keep the pipeline filled. Intel will double the size of the L2 cache of its 600 series Pentium 4 chips this year, to two megabytes, according to Snyder. Processors can fetch data faster from an onboard cache than from the random access memory.
The gradual move, now underway, to 64-bit computing also addresses the memory bottleneck. A 64-bit processor can address 18 exabytes of random access memory, far better than the four gigabytes of 32-bit processors. This allows more of a running program to be placed in immediate memory, allowing it to be accessed more easily by the processor. Although 64-bit computing was first thought to be of use only for servers, AMD has been aggressively marketing 64-bit Athlon chips for the consumer market. In February, Intel added 64-bit “EM64T” extensions to some of its Pentium 4 chips, which were marketed for consumers.
But what is happening beyond the well-reported claims of 64-bit computing and ever-larger caches? One is an entirely new technology called hypervisor that is being developed by most chip makers. A hypervisor is “an operating system for a chip,” Reynolds said.
Hypervisor sits between the operating system and the hardware itself, completely invisible to the end-user. It offers the operating system better control over the processor, optimizing the processor to execute functions such as video compression. Hypervisors can even be used to run multiple OS’s simultaneously, Reynolds said.
Gartner expects to see both Intel and AMD to start to offer hypervisors within the next year or so. Transmeta uses one to mimic the x86 instruction set for its chips. IBM Power5- based iSeries servers runs a hypervisor for load balancing.
In hardware, more work is also being done to widen the pathways between the processor and the working memory. Such interconnects now are largely copper-based, which experience signal loss, bandwidth limitations, interference between the lines, and other woes. The move to multi-core processors may only exacerbate this problem.
Agilent Technologies is investigating photonics — light — to speed throughput. Photons are inherently better than electrons as data conveyers because photons don’t interact with one another, unlike electrons which have natural force fields that keep them apart, said Waguih Ishak, director of Agilent’s photonics and electronics research lab. Photons of one wavelength can travel alongside those of a different wavelength with no interference. This layering of signals is otherwise is known as multiplexing, which boosts throughput.
Agilent, in conjunction with IBM and with some funding from the Defense Department’s Defense Advanced Research Projects Agency (DARPA), is working on a high-speed optical interconnect that could be used in a bus to connect caches, random access memory, multiple multi-chip modules, and even multiple motherboards. The team has developed a package 5-by-8 millimeter prototype that carries 500 gigabits-per-second over the distance of several feet. (See Figure One.)
FIGURE ONE: 250 Gbps parallel wavelength-division-multiplexed (PWDM) transmitter developed by a team at Agilent Laboratories for the DARPA-funded MAUI program. It represents the current state-of-the art in bandwidth per unit area and per unit power consumption. The team expects to achieve 500 Gbps by the end of 2004 and is considering approaches to reaching 1 terabit per second in a similar footprint by the end of the MAUI program in 2006. Photo courtesy Agilent Technologies.



Intel is also doing some work in the area. Earlier this year, the company announced it had built a silicon-based laser that could modulate a light beam in such a way that it could encode information.

Be Cool

Despite the recent religion over multi-core chips, don’t rule out a new race for yet faster clock speeds, Reynolds predicted. After all, “a program is only as fast as its slowest thread,” he said. But to resume such a race, chip makers will have to somehow reduce the amount of heat the chips give off. The amount of heat a chip gives off is, of course, directly proportional to the power it consumes.
Much work is being done in power conservation. AMD is developing a number of power conservation features, under its PowerNow brand-name, that can be controlled by the operating system. If a processor is only running at 20 percent, the operating system can directly adjust power to meet that level of activity. Intel’s own power management, called SpeedStep, is already available in its mobile chips, and will be offered for desktops in the near future. SpeedStep also slows the processor when not being used. Unlike earlier attempts along these lines, SpeedStep should be invisible to users, Snyder claimed. Waking a sleeping processor should only take “a few cycles,” rather than a few seconds.
Beyond the careful clipping of power consumption, there are radical notions in power reduction. One is eliminating the clock cycle entirely. A growing number of companies are looking at the nascent field of clockless logic, also called asynchronous logic.
In a nutshell, the idea is simple: eliminate the clock signals as a timing mechanism to coordinate the flow of data within the processor. The gates themselves are delay-insensitive and the registers communicate with one another via handshake protocols. Since “the circuit is data-driven, it runs at a data rate, not at a clock rate,” said David Lamb, president of clockless logic design firm Theseus Logic. “If they have nothing to do, circuits inherently go into a sleep mode.”
Theseus claims that its own NCL08 eight-bit microcontroller uses about 40 percent less power than the Motorola Star08 controller executing the same task. The beauty of asynchronous logic is that, despite its unique architecture, it’d be invisible to programmers, Lamb claimed. Theseus’ NCL08 uses the same instruction set as Motorola’s HCS08 microcontroller.
ARM has taken notice of this radical technology, licensing self-timed circuitry designs from a division of Philips Electronics called Handshake Solutions. The company plans to use the design for chips that run smart cards, consumer electronics, embedded automobile applications, and other equipment where power is at a premium.
Although primarily marketed for low-power usage, clockless logic designs could drift into commodity processors as well. Both Intel and Sun now incorporate aspects of asynchronous processing in their chips, Lamb said.

Shirking the Silicon Shackles

These days, the circuit lines on AMD’s and Intel’s chips are 90 nanometers wide, down from 1.3 microns just a few years ago. And those dies will continue to shrink. By next year, Intel will start selling chips based on the 65 nanometer core — chips with over a half-billion transistors on a 300 millimeter-wide chip. Intel researchers are also developing fabrication processes that will lead to 32- and 22-nanometer chips that may make their appearances early next decade.
Inevitably, however, the manufacturing process must hit a wall — perhaps even down to the point of a single atoms. When silicon is exhausted, what will come after?
Unfortunately, possible replacements are still sketchy at best. One approach is quantum computing, which draws from a century of research in quantum mechanics. While today’s computers make rather simple uses of electronic charges — a charge on a transistor signals a “1” while the absence of one is “0” — quantum computing uses single atoms to hold information in more complex ways. An atom can hold more than a singular binary value because, under a phenomena known as superposition, a single atom can actually be in multiple states at once. Today’s three-bit register can hold one of eight numbers, but a three-quantum bit register, called a qubit, could simultaneously hold all eight numbers or any subset thereof. Bundle a fair number of atoms under an actual working system and you could have a computer crunching exponentially more data.
Many feel, however, that quantum computing on any sort of mass production scale is decades off, if even feasible. After years of sustained research, IBM announced in 2001 that it had made a 7-qubit register-in a test tube, using a billion molecules. Quantum computation will need a Moore’s Law of its own to grow into usefulness.
Another far-off technology is optical processing. Optical processing would use photons instead of electrons as the basic unit of information. A photonic logic gate would work by blocking or letting through a photon or a light beam, signifying a binary value.
But like quantum computing, even this approach’s most fervent believers admit that any working products, at least for mass production, are decades off. The very same quality that makes photons so good for communications makes them woeful at executing digital logic, according to Agilent’s Ishak. Two electrons close to one another will repel one another, Ishak said. This charged friction forms the basis of signal processing. Since photons do not interact with one another, no effective way yet exists to manipulate them in a similar manner, Ishak said.
“The bottom line is much effort was expended over the period of
a generation on optical computing,” wrote Lome by email about photonic processing. “Many interesting physical experiments were performed. Quite a few prototype systems were demonstrated. However, the technology was not competitive for any commodity market.”
“Silicon will be with us for some time to come,” Ishak said.

Joab Jackson is an associate writer for Government Computer News. He can be reached at class="emailaddress">linux@joabj.com.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62