Computer hobbyists and researchers take note: two U.S. scientists have created a step-by-step guide on how to build a supercomputer using multiple PlayStation 3 video-game consoles. The instructional guide, posted this week online at, allows users with some programming knowledge to install a version of the open-source operating system Linux on the video consoles and connect a number of consoles into a computing cluster or grid. The two researchers say the guide could provide scientists with another, cheaper alternative to renting time on supercomputers to run their simulations.

University of Massachusetts Dartmouth physics professor Gaurav Khanna first built the cluster a year ago to run his simulations estimating the gravitational waves produced when two black holes merged. Frustrated with the cost of renting time on supercomputers, which he said can cost as much as $5,000 to run a 5,000-hour simulation, Khanna decided to set up his own computer cluster using PS3s, which had both a powerful processor developed by Sony, IBM and Toshiba, but also an open platform that allows different system software to run on it. PlayStation 3 systems retail for about $400 Cdn. On the how-to-guide Khanna says the eight-console cluster is roughly comparable in speed to a 200 node IBM Blue Gene supercomputer. Khanna says his research now runs using a cluster of 16 PS3s. The fastest supercomputer in the world, IBM’s Roadrunner supercomputer at Los Alamos National Laboratory, has 3,250 nodes and is capable of 1.105 petaflops, or 1.105 quadrillion floating point operations per second, about 100,000 times faster than a home computer.

Massachusetts Dartmouth computer scientist Chris Poulin, who co-wrote the instructional manual with Khanna, wouldn’t reveal the number of flops the system can achieve, but said anecdotally the cluster has allowed him to run simulations in hours that used to take days on a powerful server computer. Khanna’s not the first researcher to use PS3s to simulate the effects of a supercomputer. The University of Stanford’s Folding at Home project allows people to help with research into how proteins self-assemble — or fold — by downloading software onto their home PS3s, creating a virtual supercomputer. Their research is currently targeting proteins relevant to diseases such as Alzheimer’s and Huntington’s disease. But the guide posted by Khanna and Poulin is the first that might allow someone to set up a supercomputer in their own home.

Poulin said there are two major practical issues, however, that might limit the practicality of a PS3 cluster supercomputer. The first issue is power. He said the video-game consoles use about 200 to 300 watts per unit, so finding a room that could hook up eight of the consoles might be an issue for hobbyists, he says. “I think if you put four or more than four of the systems on one plug you’d probably blow a fuse,” Poulin told CBC News. The second issue is memory. The console has only 256 MB of RAM, far less than most personal computers available now. Poulin said that while the low memory wouldn’t be a problem for straightforward computations, running multiple simulations or programs could tax the system. As a result, simulations running on the cluster would have to be tailored to consider the cluster’s memory limitations. Poulin said he hopes the project will help open doors to more partnerships between industry and universities that will lead to better access to supercomputing power. “That’s ultimately the goal here,” he said. “We want to make things easier, no matter what kind of supercomputer you are using.”

Gaurav Khanna
email : gkhanna [at] umassd [dot] edu

Lior Burko
email : burko [at] uah [dot] edu

This cluster of 336 PlayStation 3 video game consoles is the beginning of a cluster of more than 2,000 consoles the Air Force is purchasing to create a supercomputer called 500 TeraFLOPS Heterogeneous Cluster, which will be housed at the Air Force Research Laboratory’s Affiliated Resource Center in Rome, N.Y.

Air Force to link 2,000 PS3s to make a cheaper supercomputer
by Warren Peace / Stars and Stripes / January 28, 2010

Once thought to be just a part of home entertainment systems, Sony’s PlayStation 3 is proving itself to be more than just an online death-match machine. The console’s price-to-performance ratio inspired one Air Force research team to place an order for 1,700 of them to go with the 336 they already have. The brains behind the Air Force Research Laboratory in Rome, N.Y., are clustering the consoles, along with some off-the-shelf graphic processing units, to create a supercomputer nearly 100,000 times faster than high-end computer processors sold today. The research group was awarded a $2 million grant for the PlayStation 3 cluster. Key to the whole idea is the console’s cell processor, which was designed to easily work in concert with other cell processors to combine processing power and has been critically acclaimed for its number crunching ability.

This lets the researchers leverage power toward running such applications as Back Projection Synthetic Aperture Radar Imager formation, high definition video image processing, and Neuromorphic Computing, which mimics human nervous systems. “With Neuromorphic Computing, as an example, we will broadcast an image to all PS3s and ask if it matches an image it has in its hard drive,” said Dr. Richard Linderman, the senior scientist for Advanced Computing Architectures at the laboratory. Mimicking humans will help the machine recognize images for target recognition, said Mark Barnell, the high performance computing director for the laboratory’s information directorate. “Humans can routinely do these things, but a computer struggles to do it,” Barnell said. “In a general sense, we are interested in making it autonomous.” He added, however “this is not the Holy Grail of supercomputers.”

Because of the way the consoles connect online or to each other is relatively slow compared to regular supercomputing setups, the group is limited in what type of programs can be efficiently run on the PS3 supergroup they call the 500 TeraFLOPS Heterogeneous Cluster. Linderman said the entire system is using mostly off-the-shelf components, and will be a relatively cheap, green machine. Keeping with the off-the-shelf mentality, the Air Force is using metal shelves found at most department stores to house the PS3 cluster. They are also using Linux, which is a free, open source operating system.

The system will use 300 to 320 kilowatts at full bore and about 10 percent to 30 percent of that in standby, when most supercomputers are using 5 megawatts, Linderman said. However, much of the time the cluster will only be running the nodes it needs and it will be turned off when not in use. “Supercomputers used to be unique with unique processors,” Linderman said. “By taking advantage of a growing market, the gaming market, we are bringing the price performance to just $2 to $3 per gigaFLOPS.” As a point of reference, 10 years ago the University of Kentucky claimed the record of the first unit capable of 1 billion floating point operations per second or gigaFLOPS, to cost less than $1,000. The cost per gigaFLOPS was $640.

They have been able to take advantage of about 60 percent of the PlayStations’ performance ability, Linderman said. Once complete, they are expecting to have a unit capable of 500 teraFLOPS or 500 trillion operations per second. The Air Force plans to have the 1,700 they recently ordered fully integrated into the system by June as part of the Department of Defense’s High Performance Computer Modernization Program. Being part of the DOD program opens the use of the computer to other government agencies and universities that are a part of the program. As for now, the system will be handling unclassified data, but that may change in the future, Linderman said.

This is not the first time PlayStation 3s have been networked together for their processing ability. The Folding@home project allows gamers to volunteer their PS3s and Internet connections while the owners are not using them. The project earned a Guinness World Record for achieving the first petaFLOPS or quadrillion operations per second by a distributed computing network in 2007. “I think is just another step in a journey that has been going on for a while,” Linderman said of using consumer components for supercomputers. “But, this will be far and away the largest interactive high-performance computer.”


Astrophysicist Replaces Supercomputer with Eight PlayStation 3s
BY Bryan Gardiner  /  10.17.07

Gaurav Khanna’s eight PlayStation 3s aren’t running Heavenly Sword — they’re using Linux plus custom code to solve complex computations. Suffering from its exorbitant price point and a dearth of titles, Sony’s PlayStation 3 isn’t exactly the most popular gaming platform on the block. But while the console flounders in the commercial space, the PS3 may be finding a new calling in the realm of science and research. Right now, a cluster of eight interlinked PS3s is busy solving a celestial mystery involving gravitational waves and what happens when a super-massive black hole, about a million times the mass of our own sun, swallows up a star.

As the architect of this research, Dr. Gaurav Khanna is employing his so-called “gravity grid” of PS3s to help measure these theoretical gravity waves — ripples in space-time that travel at the speed of light — that Einstein’s Theory of Relativity predicted would emerge when such an event takes place. It turns out that the PS3 is ideal for doing precisely the kind of heavy computational lifting Khanna requires for his project, and the fact that it’s a relatively open platform makes programming scientific applications feasible. “The interest in the PS3 really was for two main reasons,” explains Khanna, an assistant professor at the University of Massachusetts, Dartmouth who specializes in computational astrophysics. “One of those is that Sony did this remarkable thing of making the PS3 an open platform, so you can in fact run Linux on it and it doesn’t control what you do.” He also says that the console’s Cell processor, co-developed by Sony, IBM and Toshiba, can deliver massive amounts of power, comparable even to that of a supercomputer — if you know how to optimize code and have a few extra consoles lying around that you can string together. “The PS3/Linux combination offers a very attractive cost-performance solution whether the PS3s are distributed (like Sony and Stanford’s Folding@home initiative) or clustered together (like Khanna’s), says Sony’s senior development manager of research and development, Noam Rimon.

According to Rimon, the Cell processor was designed as a parallel processing device, so he’s not all that surprised the research community has embraced it. “It has a general purpose processor, as well as eight additional processing cores, each of which has two processing pipelines and can process multiple numbers, all at the same time,” Rimon says. This is precisely what Khanna needed. Prior to obtaining his PS3s, Khanna relied on grants from the National Science Foundation (NSF) to use various supercomputing sites spread across the United States “Typically I’d use a couple hundred processors — going up to 500 — to do these same types of things.” However, each of those supercomputer runs cost Khanna as much as $5,000 in grant money. Eight 60 GB PS3s would cost just $3,200, by contrast, but Khanna figured he would have a hard time convincing the NSF to give him a grant to buy game consoles, even if the overall price tag was lower. So after tweaking his code this past summer so that it could take advantage of the Cell’s unique architecture, Khanna set about petitioning Sony for some help in the form of free PS3s. “Once I was able to get to the point that I had this kind of performance from a single PS3, I think that’s when Sony started paying attention,” Khanna says of his optimized code.

Khanna says that his gravity grid has been up and running for a little over a month now and that, crudely speaking, his eight consoles are equal to about 200 of the supercomputing nodes he used to rely on. “Basically, it’s almost like a replacement,” he says. “I don’t have to use that supercomputer anymore, which is a good thing. For the same amount of money — well, I didn’t pay for it, but even if you look into the amount of funding that would go into buying something like eight PS3s — for the same amount of money I can do these runs indefinitely.” The point of the simulations Khanna and his team at UMass are running on the cluster is to see if gravitational waves, which have been postulated for almost 100 years but have never been observed, are strong enough that we could actually observe them one day. Indeed, with NASA and other agencies building some very big gravitational wave observatories with the sensitivity to be able to detect these waves, Khanna’s sees his work as complementary to such endeavors. Khanna expects to publish the results of his research in the next few months. So while PS3 owners continue to wait for a fuller range of PS3 titles and low prices, at least they’ll have some reading material to pass the time.


Scientists Create Supercomputer from Sony Playstations
BY John Markoff  /  27 May 2003

NY Times/CNET News — As perhaps the clearest evidence yet of the power of sophisticated but inexpensive game consoles, the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign has assembled a supercomputer from an army of Sony PlayStation 2 devices. The resulting system, with components purchased at retail prices, cost a little more than $50,000. Researchers at the supercomputing center believe the system may be capable of a half trillion operations a second, well within the definition of supercomputer, although it may not rank among the world’s 500 fastest supercomputers.

Perhaps the most striking aspect of the project, which uses the open-source Linux operating system, is that the only hardware engineering involved was placing 70 of the individual game machines in a rack and plugging them together with a high-speed Hewlett-Packard network switch. The center’s scientists bought 100 machines but are holding 30 in reserve, possibly for high-resolution display application. “It took a lot of time because you have to cut all of these things out of the plastic packaging,” said Craig Steffen, a senior research scientist at the center, who is one of four scientists working part time on the project. The scientists are taking advantage of a standard component of the PS2 that was originally intended to move and transform pixels rapidly on a television screen to produce lifelike graphics.

That chip is not the PlayStation 2’s MIPS microprocessor, but rather a graphics co-processor known as the Emotion Engine. That custom-designed silicon chip is capable of producing up to 6.5 billion mathematical operations a second. The impressive performance of the game machine, which has been on the market for a few years, underscores a radical shift that has taken place in the computing world since the end of the Cold War in the late 1980s, according to the researchers. While the most advanced computing technologies have historically been developed first for large corporate users and military contractors, increasingly the fastest computers are being developed for the consumer market and for products meant to be placed under Christmas trees. “If you look at the economics of game platforms and the power of computing on toys, this is a long-term market trend and computing trend,” said Dan Reed, the supercomputing center’s director. “The economics are just amazing. This is going to drive the next big wave in high-performance computing.”

The scientists have their eyes on a variety of consumer hardware, he said. For example Nvidia, the maker of graphics cards for PCs, is now selling a high-performance graphics card capable of executing 51 billion mathematical operations per second. The pace of the consumer computing world is moving so quickly that the researchers are building the PlayStation 2-based supercomputer as an experiment to see how quickly they can take advantage of off-the-shelf, low-cost technologies. “I think we’d like to be able to transfer a lot of our experience to the next generation,” he said. Despite the computing promise of game consoles that sell for less than $200, the researchers acknowledged that the experiment was likely to be most useful for a group of relatively narrow scientific problems. They added that while the system was already doing scientific calculations, they cannot be certain about its ultimate computing potential until they write more carefully tuned software routines that can move data in and out of the custom processor quickly.

The limited memory of the Sony game console–32MB of memory–would also restrict the practical applications of the supercomputer, they said. But they noted that the computer was already running useful calculations on quantum chromodynamics, or QCD, simulations. QCD is a theory concerning the so-called strong interactions that bind elementary particles like quarks and gluons together to form hadrons, the constituents of nuclear matter. The ability to lower the cost of QCD simulation in itself would be significant, the researchers said, because such problems are the single largest consumer of computing resources on supercomputers at the Department of Energy and the National Energy Research Scientific Computing Center.

Still, several supercomputer experts said that the memory and computing bandwidth limitations of the PlayStation would prohibit broader applications of the machine. Gordon Bell, a Microsoft computer scientist and a veteran of the supercomputer world, said the PlayStation supercomputer might find its best application as a computer for the large digital display walls that are used by the Defense Department. Bell awards annual computing prizes that include a category for the best price/performance in high performance computing. “They should enter my contest,” he said. The supercomputing center’s scientists said they had chosen the PlayStation 2 because Sony sells a special Linux module that includes a high-speed network connection and a disk drive. By contrast, it is almost impossible for researchers to install the Linux system on Microsoft’s Xbox game console. Using a network of machines is not a new concept in the supercomputing world. Linux, which plays a major role in that world, has been used to assemble high-performance parallel computers built largely out of commodity hardware components. These machines are generally called Beowulf clusters.

PS3 supercomputer illustrates innovative IT cost savings
BY Bill Detwiler  /  March 4th, 2009

Back in 2007, Dr. Frank Mueller, an associate professor of computer science at North Carolina State University, created a supercomputing cluster of eight Sony PS3 systems. At the time, Mueller was quoted by NC State University’s Engineering News as saying, “Places like Google, the stock market, automotive design companies and scientist use clusters, but this is the first academic computer cluster built from PlayStation 3s.” Computer scientists at The University of Alabama in Huntsville and the University of Massachusetts, Dartmouth, have taken the clustering idea a step further and recently published research using simulations run on the Sony game systems. Dr. Gaurav Khanna, an assistant physics professor UMass Dartmouth, and Dr. Lior Burko, an assistant physics professor at UAHuntsville, used a cluster of 16 PlayStation 3s, dubbed the PS3 Gravity Grid, to simulate a vibrating black hole and determine the speed at which it stops vibrating. Why use PS3s and not a traditional supercomputing platform, such as the National Science Foundation’s TeraGrid? Cost. In a article on the PS3 project, Burko was quoted as saying “If we had rented computing time from a supercomputer center it would have cost us about $5,000 to run our simulation one time.” And, for their experiment, Khanna and Burko needed to run the simulation dozens of times. Considering a new 80GB PS3 retails for about $400, the 16 PS3s needed for Khanna’s cluster would cost around $6,400. For just over the cost of a single run, researchers were able to build a resource that they could use over and over again.

Frank Mueller
email : mueller [at] cs.ncsu [dot] edu

Why scientists love games consoles
BY Roger Highfield  /  17 Feb 2008

Leading scientists are turning to the extraordinary power of games consoles to do their sums and simulate everything from colliding black holes to the effects of drugs. Reprogram a PlayStation and it will perform feats that would be unthinkable on an ordinary PC because the kinds of calculations required to produce the realistic graphics now seen in sophisticated video games are similar to those used by chemists and physicists as they simulate the interactions between particles ranging from the molecular to the astronomical. Such simulations are usually carried out on a supercomputer, but time on these machines is expensive and in short supply. By comparison, games consoles are cheap and easily available, says New Scientist. “There is no doubt that the entertainment industry is helping to drive the direction of high performance computational science – exploiting the power available to the masses will lead to many research breakthroughs in the future,” comments Prof Peter Coveney of University College London, who uses supercomputing in chemistry.

Prof Gaurav Khanna at the University of Massachusetts has used an array of 16 PS3s to calculate what will happen when two black holes merge. According to Prof Khanna, the PS3 has unique features that make it suitable for scientific computations, namely, the Cell processor dubbed a “supercomputer-on-a-chip.” And it runs on Linux, “so it does not limit what you can do. A single high-precision simulation can sometimes cost more than 5,000 hours on the TeraGrid supercomputers. For the same cost, you can build your own supercomputer using PS3s. It works just as well, has no long wait times and can be used over and over again, indefinitely,” Prof Khanna says.

And Todd Martínez has persuaded the supercomputing centre at the University of Illinois, Urbana-Champaign, to buy eight computers each driven by two of the specialised chips that are at the heart of Sony’s PlayStation 3 console. Together with his student Benjamin Levine he is using them to simulate the interactions between the electrons in atoms, as part of work to see how proteins in the body dovetail with drug molecules. He was inspired while browsing through his son’s games console’s technical specification “I noticed that the architecture looked a lot like high performance supercomputers I had seen before,” he says. “That’s when I thought about getting one for myself.”

An effort to interconnect tends of thousands of PS3s is under way with Folding@Home , an effort based at Stanford University to study the way proteins fold, which plays a key role in Alzheimer’s, Huntington’s Disease and Parkinson’s disease. With about 50,000 such machines, the organisers of this huge distributed computing effort hope to achieve performance on the petaflop scale. The Wii, made by Nintendo, has a motion tracking remote control unit that is cheaper than a comparable device built from scratch. The device recently emerged as a tool to help surgeons to improve their technique. Meanwhile, neurologist Thomas Davis at the Vanderbilt Medical Centre in Nashville, Tennessee, is using it to measure movement deficiencies in Parkinson’s patients to assess how well a patient can move when they take part in drug trials.

Folding@home Reaches Million PS3-User Milestone
BY Susan Arendt / February 4, 2008

Sony recently announced that more than one million PlayStation 3 owners are taking part in Folding@home, the distributed computing project run by Stanford University. The participation of PS3 owners in Folding@home allows the project “to address questions previously considered impossible to tackle computationally, with the goal of finding cures to some of the world’s most life-threatening diseases,” said project lead Vijay Pande. More one million PS3 owners as registered participants breaks down to about two new registrants per minute, or about 3,000 new Folding@home members per day.

Folding@home’s mission is to try and better understand how proteins fold, and how misfolds are related to various diseases like cancer, Alzheimer’s and Parkinson’s. PS3s currently comprise about 74 percent of the entire computing power of Folding@home. When the project achieved a petaflop in September, it officially became the most powerful distributed computing network in the world, at least according to folks at Guinness World Records. A network of 10,000 PS3s can accomplish the same amount of Folding@home work as 100,000 PCs, making their computational ability an invaluable asset to the project.



Researchers have harnessed the powerful silicon chips used in the Xbox 360 console to solve scientific conundrums. Academics at the University of Warwick believe they are the first to use the processors as a cheap way to conduct “parallel processing”. Parallel computing is where a number of processors are run in tandem, allowing a system to rapidly crunch data. Researchers traditionally have to book time on a dedicated “cluster” system or splash out setting up a network of PCs.

Instead, the Warwick team harnessed a single Xbox 360 Graphical Processing Unit (GPU). The chip was able to perform parallel processing functions at a fraction of the cost a traditional systems. Dr Simon Scarle, a researcher on the team, built the system to help him model how electrical signals in the heart moved around damaged cardiac cells. Dr Scarle, who previously worked as a software engineer at Microsoft’s Rare studio, had first hand experience of tapping into the power of GPU technology.

Speaking to BBC News, Dr Scarle said that the the code controlling the chip was modified, so instead of working out graphical calculations, it could perform other ones instead. “You don’t quite get the full whammy of a cluster, but its close,” he said. “Instead of pumping out stunning graphics, it’s reworked; in the case of my research, rather than calculating the position of a structure and texture it’s now working out the different chemical levels in a cell.”

Real world computing
There has been cross-pollination between game consoles and real world computing in the past. Roadrunner, officially the worlds fastest supercomputer, uses the same processor technology as that found in Sony’s PlayStation 3. However, it is thought that this is the first time an Xbox has been used to perform parallel processing, albeit on a single chip.

Dr Scarle said that linking more than one Xbox together using the techniques would not be impossible. “It could be done, but you would have to go over the internet – through something like Xbox live – rather than a standard method. However, without development tools, it wouldn’t be easy.” Xbox live allows gamers to play against each other over the internet. “Sony have been into this [parallel processing] for some time, releasing development kits, and Folding@home comes as standard,” he added.

Folding@home is a project that harnesses the spare processing power of PCs, Macs, Linux systems and PlayStation 3’s to help understand the cause of diseases. The network has more than 4.3 petaflop of computing power – the equivalent of more than 4,300 trillion calculations per second. Roadrunner, by comparison can operate at just over one petaflop. The results of the University of Warwick research are published in the journal Computational Biology and Chemistry.

Simon Scarle
email : S.Scarle [at] [dot] uk

New Software Could Smooth Supercomputing Speed Bumps
BY Larry Greenemeier / 16 October 2009

Supercomputers have long been an indispensable, albeit expensive, tool for researchers who need to make sense of vast amounts of data. One way that researchers have begun to make high-speed computing more powerful and also more affordable is to build systems that split up workloads among fast, highly parallel graphics processing units (GPUs) and general-purpose central processing units (CPUs).

There is, however, a problem with building these co-processed computing hot rods: A common programming interface for the different GPU models has not been available. Even though the lion’s share of GPUs are made by Advanced Micro Devices, Inc. (AMD) and NVIDIA Corp., the differences between the two companies’ processors mean that programmers have had to write software to meet the requirements of the particular GPU used by their computers.

Now, this is changing as AMD, NVIDIA and their customers (primarily computer- and game system–makers) throw their support behind a standard way of writing software called the OpenComputing Language (OpenCL), which works across both GPU brands. A longer-term goal behind OpenCL is to create a common programming interface that will even let software writers create applications that run both GPUs and CPUs with few modifications, cutting the time and effort required to harness supercomputing power for scientific endeavors.

Researchers at Virginia Polytechnic Institute and State University (Virginia Tech) in Blacksburg, Va., are hoping that OpenCL can help them write software that can run on GPUs made either by AMD or NVIDIA. Using a computer equipped with both a CPU and an AMD GPU, the Virginia Tech researchers were able to compute and visualize biomolecular electrostatic surface potential (pdf) 1,800 times faster (from 22.4 hours to less than a minute) than they could with a similar computer driven only by a CPU.

The National Institutes of Health (NIH) has committed more than $1.3 million in funding from 2006 through 2011 for a project led by Alexey Onufriev, an associate professor in Virginia Tech’s departments of Computer Science and Physics, to represent water computationally, because water is key to modeling biological molecules. “When you model a molecule at the atomic level,” Onufriev says, “you need to know the impact that water will have on that model.”

This is the type of program that GPUs map quite well, says Wu Feng, director of Virginia Tech’s Synergy Laboratory and an associate professor in the school’s departments of Computer Science and Electrical & Computer Engineering. “These applications tend to be compute-intensive and regular in their computation,” he adds, “regular in the sense that you’re calculating electrostatic potential between pairs of points.”

CPUs, however, are better suited than GPUs to computing tasks that require the computer to make a decision. For example, if a string of computing tasks were likened to a line of people waiting to enter a stadium, Feng says, the GPU would be very good at dividing up the people into multiple lines and taking their tickets as they enter—as long as everyone has the same type of ticket. If some people had special tickets that allowed them to go backstage or entitled them to some other privilege, it would greatly slow the GPU’s capabilities as the processor decided what to do with the nonconformists. “GPUs work well today when they are given a single instruction for a repetitive task,” he adds.

Feng and his team are adapting an electrostatic potential program for Onufriev’s lab so that it will work specifically on computers running GPUs made by AMD. Feng notes that as OpenCL is embraced more widely, he will be able to write programs that can communicate with any type of GPU supporting OpenCL, regardless of manufacturer, and eventually write code that provides instructions for both CPUs and GPUs. (Earlier this week, AMD made available the latest version of its software development tools that the company says allows programmers to use OpenCL to write applications that let GPUs operate in concert with CPUs.)

With this type of computing power and versatility, Onufriev says many limitations will be lifted regarding the types of research he can tackle. Another of his projects is studying how the nearly two meters of DNA in each cell is packed into the cell’s nucleus. “The way DNA is packed determines the genetic message,” he says. “No one knows exactly how this works. We’re hoping to get stacks of GPU machines where we can run simulations requiring massive computations that help us better understand DNA packing.” Such work would be aided greatly by systems that can make use of both GPUs and CPUs.


The Do-It-Yourself Supercomputer
BY William W. Hargrove, Forrest M. Hoffman and Thomas Sterling

In the well-known stone soup fable, a wandering soldier stops at a poor village and says he will make soup by boiling a cauldron of water containing only a shiny stone. The townspeople are skeptical at first but soon bring small offerings: a head of cabbage, a bunch of carrots, a bit of beef. In the end, the cauldron is filled with enough hearty soup to feed everyone. The moral: cooperation can produce significant achievements, even from meager, seemingly insignificant contributions.

Researchers are now using a similar cooperative strategy to build supercomputers, the powerful machines that can perform billions of calculations in a second. Most conventional supercomputers employ parallel processing: they contain arrays of ultrafast microprocessors that work in tandem to solve complex problems such as forecasting the weather or simulating a nuclear explosion. Made by IBM, Cray and other computer vendors, the machines typically cost tens of millions of dollars–far too much for a research team with a modest budget. So over the past few years, scientists at national laboratories and universities have learned how to construct their own supercomputers by linking inexpensive PCs and writing software that allows these ordinary computers to tackle extraordinary problems.

In 1996 two of us (Hargrove and Hoffman) encountered such a problem in our work at Oak Ridge National Laboratory (ORNL) in Tennessee. We were trying to draw a national map of ecoregions, which are defined by environmental conditions: all areas with the same climate, landforms and soil characteristics fall into the same ecoregion. To create a high-resolution map of the continental U.S., we divided the country into 7.8 million square cells, each with an area of one square kilometer. For each cell we had to consider as many as 25 variables, ranging from average monthly precipitation to the nitrogen content of the soil. A single PC or workstation could not accomplish the task. We needed a parallel-processing supercomputer–and one that we could afford!

Our solution was to construct a computing cluster using obsolete PCs that ORNL would have otherwise discarded. Dubbed the Stone SouperComputer because it was built essentially at no cost, our cluster of PCs was powerful enough to produce ecoregion maps of unprecedented detail. Other research groups have devised even more capable clusters that rival the performance of the world’s best supercomputers at a mere fraction of their cost. This advantageous price-to-performance ratio has already attracted the attention of some corporations, which plan to use the clusters for such complex tasks as deciphering the human genome. In fact, the cluster concept promises to revolutionize the computing field by offering tremendous processing power to any research group, school or business that wants it.

Beowulf And Grendel
The notion of linking computers together is not new. In the 1950s and 1960s the U.S. Air Force established a network of vacuum-tube computers called SAGE to guard against a Soviet nuclear attack. In the mid-1980s Digital Equipment Corporation coined the term “cluster” when it integrated its mid-range VAX minicomputers into larger systems. Networks of workstations–generally less powerful than minicomputers but faster than PCs–soon became common at research institutions. By the early 1990s scientists began to consider building clusters of PCs, partly because their mass-produced microprocessors had become so inexpensive. What made the idea even more appealing was the falling cost of Ethernet, the dominant technology for connecting computers in local-area networks.

Advances in software also paved the way for PC clusters. In the 1980s Unix emerged as the dominant operating system for scientific and technical computing. Unfortunately, the operating systems for PCs lacked the power and flexibility of Unix. But in 1991 Finnish college student Linus Torvalds created Linux, a Unix-like operating system that ran on a PC. Torvalds made Linux available free of charge on the Internet, and soon hundreds of programmers began contributing improvements. Now wildly popular as an operating system for stand-alone computers, Linux is also ideal for clustered PCs.

The first PC cluster was born in 1994 at the NASA Goddard Space Flight Center. NASA had been searching for a cheaper way to solve the knotty computational problems typically encountered in earth and space science. The space agency needed a machine that could achieve one gigaflops–that is, perform a billion floating-point operations per second. (A floating-point operation is equivalent to a simple calculation such as addition or multiplication.) At the time, however, commercial supercomputers with that level of performance cost about $1 million, which was too expensive to be dedicated to a single group of researchers.

One of us (Sterling) decided to pursue the then radical concept of building a computing cluster from PCs. Sterling and his Goddard colleague Donald J. Becker connected 16 PCs, each containing an Intel 486 microprocessor, using Linux and a standard Ethernet network. For scientific applications, the PC cluster delivered sustained performance of 70 megaflops–that is, 70 million floating-point operations per second. Though modest by today’s standards, this speed was not much lower than that of some smaller commercial supercomputers available at the time. And the cluster was built for only $40,000, or about one tenth the price of a comparable commercial machine in 1994.

NASA researchers named their cluster Beowulf, after the lean, mean hero of medieval legend who defeated the giant monster Grendel by ripping off one of the creature’s arms. Since then, the name has been widely adopted to refer to any low-cost cluster constructed from commercially available PCs. In 1996 two successors to the original Beowulf cluster appeared: Hyglac (built by researchers at the California Institute of Technology and the Jet Propulsion Laboratory) and Loki (constructed at Los Alamos National Laboratory). Each cluster integrated 16 Intel Pentium Pro microprocessors and showed sustained performance of over one gigaflops at a cost of less than $50,000, thus satisfying NASA’s original goal.

The Beowulf approach seemed to be the perfect computational solution to our problem of mapping the ecoregions of the U.S. A single workstation could handle the data for only a few states at most, and we couldn’t assign different regions of the country to separate workstations–the environmental data for every section of the country had to be compared and processed simultaneously. In other words, we needed a parallel-processing system. So in 1996 we wrote a proposal to buy 64 new PCs containing Pentium II microprocessors and construct a Beowulf-class supercomputer. Alas, this idea sounded implausible to the reviewers at ORNL, who turned down our proposal.

Undeterred, we devised an alternative plan. We knew that obsolete PCs at the U.S. Department of Energy complex at Oak Ridge were frequently replaced with newer models. The old PCs were advertised on an internal Web site and auctioned off as surplus equipment. A quick check revealed hundreds of outdated computers waiting to be discarded this way. Perhaps we could build our Beowulf cluster from machines that we could collect and recycle free of charge. We commandeered a room at ORNL that had previously housed an ancient mainframe computer. Then we began collecting surplus PCs to create the Stone SouperComputer.

A Digital Chop Shop
The strategy behind parallel computing is “divide and conquer.” A parallel-processing system divides a complex problem into smaller component tasks. The tasks are then assigned to the system’s nodes–for example, the PCs in a Beowulf cluster–which tackle the components simultaneously. The efficiency of parallel processing depends largely on the nature of the problem. An important consideration is how often the nodes must communicate to coordinate their work and to share intermediate results. Some problems must be divided into myriad minuscule tasks; because these fine-grained problems require frequent internode communication, they are not well suited for parallel processing. Coarse-grained problems, in contrast, can be divided into relatively large chunks. These problems do not require much communication among the nodes and therefore can be solved very quickly by parallel-processing systems.

Anyone building a Beowulf cluster must make several decisions in designing the system. To connect the PCs, researchers can use either standard Ethernet networks or faster, specialized networks, such as Myrinet. Our lack of a budget dictated that we use Ethernet, which is free. We chose one PC to be the front-end node of the cluster and installed two Ethernet cards into the machine. One card was for communicating with outside users, and the other was for talking with the rest of the nodes, which would be linked in their own private network. The PCs coordinate their tasks by sending messages to one another. The two most popular message-passing libraries are message-passing interface (MPI) and parallel virtual machine (PVM), which are both available at no cost on the Internet. We use both systems in the Stone SouperComputer.

Many Beowulf clusters are homogeneous, with all the PCs containing identical components and microprocessors. This uniformity simplifies the management and use of the cluster but is not an absolute requirement. Our Stone SouperComputer would have a mix of processor types and speeds because we intended to use whatever surplus equipment we could find. We began with PCs containing Intel 486 processors but later added only Pentium-based machines with at least 32 megabytes of RAM and 200 megabytes of hard-disk storage.

It was rare that machines met our minimum criteria on arrival; usually we had to combine the best components from several PCs. We set up the digital equivalent of an automobile thief’s chop shop for converting surplus computers into nodes for our cluster. Whenever we opened a machine, we felt the same anticipation that a child feels when opening a birthday present: Would the computer have a big disk, lots of memory or (best of all) an upgraded motherboard donated to us by accident? Often all we found was a tired old veteran with a fan choked with dust.

Our room at Oak Ridge turned into a morgue filled with the picked-over carcasses of dead PCs. Once we opened a machine, we recorded its contents on a “toe tag” to facilitate the extraction of its parts later on. We developed favorite and least favorite brands, models and cases and became adept at thwarting passwords left by previous owners. On average, we had to collect and process about five PCs to make one good node.

As each new node joined the cluster, we loaded the Linux operating system onto the machine. We soon figured out how to eliminate the need to install a keyboard or monitor for each node. We created mobile “crash carts” that could be wheeled over and plugged into an ailing node to determine what was wrong with it. Eventually someone who wanted space in our room bought us shelves to consolidate our collection of hardware. The Stone SouperComputer ran its first code in early 1997, and by May 2001 it contained 133 nodes, including 75 PCs with Intel 486 microprocessors, 53 faster Pentium-based machines and five still faster Alpha workstations, made by Compaq.

Upgrades to the Stone SouperComputer are straightforward: we replace the slowest nodes first. Each node runs a simple speed test every hour as part of the cluster’s routine housekeeping tasks. The ranking of the nodes by speed helps us to fine-tune our cluster. Unlike commercial machines, the performance of the Stone SouperComputer continually improves, because we have an endless supply of free upgrades.

Parallel Problem Solving
Parallel programming requires skill and creativity and may be more challenging than assembling the hardware of a Beowulf system. The most common model for programming Beowulf clusters is a master-slave arrangement. In this model, one node acts as the master, directing the computations performed by one or more tiers of slave nodes. We run the same software on all the machines in the Stone SouperComputer, with separate sections of code devoted to the master and slave nodes. Each microprocessor in the cluster executes only the appropriate section. Programming errors can have dramatic effects, resulting in a digital train wreck as the crash of one node derails the others. Sorting through the wreckage to find the error can be difficult.

Another challenge is balancing the processing workload among the cluster’s PCs. Because the Stone SouperComputer contains a variety of microprocessors with very different speeds, we cannot divide the workload evenly among the nodes: if we did so, the faster machines would sit idle for long periods as they waited for the slower machines to finish processing. Instead we developed a programming algorithm that allows the master node to send more data to the faster slave nodes as they complete their tasks. In this load-balancing arrangement, the faster PCs do most of the work, but the slower machines still contribute to the system’s performance.

Our first step in solving the ecoregion mapping problem was to organize the enormous amount of data–the 25 environmental characteristics of the 7.8 million cells of the continental U.S. We created a 25-dimensional data space in which each dimension represented one of the variables (average temperature, precipitation, soil characteristics and so on). Then we identified each cell with the appropriate point in the data space [see illustration A]. Two points close to each other in this data space have, by definition, similar characteristics and thus are classified in the same ecoregion. Geographic proximity is not a factor in this kind of classification; for example, if two mountaintops have very similar environments, their points in the data space are very close to each other, even if the mountaintops are actually thousands of miles apart.

Once we organized the data, we had to specify the number of ecoregions that would be shown on the national map. The cluster of PCs gives each ecoregion an initial “seed position” in the data space. For each of the 7.8 million data points, the system determines the closest seed position and assigns the point to the corresponding ecoregion. Then the cluster finds the centroid for each ecoregion–the average position of all the points assigned to the region. This centroid replaces the seed position as the defining point for the ecoregion. The cluster then repeats the procedure, reassigning the data points to ecoregions depending on their distances from the centroids. At the end of each iteration, new centroid positions are calculated for each ecoregion. The process continues until fewer than a specified number of data points change their ecoregion assignments. Then the classification is complete.

The mapping task is well suited for parallel processing because different nodes in the cluster can work independently on subsets of the 7.8 million data points. After each iteration the slave nodes send the results of their calculations to the master node, which averages the numbers from all the subsets to determine the new centroid positions for each ecoregion. The master node then sends this information back to the slave nodes for the next round of calculations. Parallel processing is also useful for selecting the best seed positions for the ecoregions at the very beginning of the procedure. We devised an algorithm that allows the nodes in the Stone SouperComputer to determine collectively the most widely dispersed data points, which are then chosen as the seed positions. If the cluster starts with well-dispersed seed positions, fewer iterations are needed to map the ecoregions.

The result of all our work was a series of maps of the continental U.S. showing each ecoregion in a different color [see illustrations B and C]. We produced maps showing the country divided into as few as four ecoregions and as many as 5,000. The maps with fewer ecoregions divided the country into recognizable zones–for example, the Rocky Mountain states and the desert Southwest. In contrast, the maps with thousands of ecoregions are far more complex than any previous classification of the country’s environments. Because many plants and animals live in only one or two ecoregions, our maps may be useful to ecologists who study endangered species.

In our first maps the colors of the ecoregions were randomly assigned, but we later produced maps in which the colors of the ecoregions reflect the similarity of their respective environments. We statistically combined nine of the environmental variables into three composite characteristics, which we represented on the map with varying levels of red, green and blue. When the map is drawn this way, it shows gradations of color instead of sharp borders: the lush Southeast is mostly green, the cold Northeast is mainly blue, and the arid West is primarily red [see illustration D]. Moreover, the Stone SouperComputer was able to show how the ecoregions in the U.S. would shift if there were nationwide changes in environmental conditions as a result of global warming. Using two projected climate scenarios developed by other research groups, we compared the current ecoregion map with the maps predicted for the year 2099. According to these projections, by the end of this century the environment in Pittsburgh will be more like that of present-day Atlanta, and conditions in Minneapolis will resemble those in present-day St. Louis. [see Stone SouperComputer’s Global Warming Forecast]

The Future Of Clusters
The traditional measure of supercomputer performance is benchmark speed: how fast the system runs a standard program. As scientists, however, we prefer to focus on how well the system can handle practical applications. To evaluate the Stone SouperComputer, we fed the same ecoregion mapping problem to ORNL’s Intel Paragon supercomputer shortly before it was retired. At one time, this machine was the laboratory’s fastest, with a peak performance of 150 gigaflops. On a per-processor basis, the run time on the Paragon was essentially the same as that on the Stone SouperComputer. We have never officially clocked our cluster (we are loath to steal computing cycles from real work), but the system has a theoretical peak performance of about 1.2 gigaflops. Ingenuity in parallel algorithm design is more important than raw speed or capacity: in this young science, David and Goliath (or Beowulf and Grendel!) still compete on a level playing field.

The Beowulf trend has accelerated since we built the Stone SouperComputer. New clusters with exotic names–Grendel, Naegling, Megalon, Brahma, Avalon, Medusa and theHive, to mention just a few–have steadily raised the performance curve by delivering higher speeds at lower costs. As of last November, 28 clusters of PCs, workstations or servers were on the list of the world’s 500 fastest computers. The LosLobos cluster at the University of New Mexico has 512 Intel Pentium III processors and is the 80th-fastest system in the world, with a performance of 237 gigaflops. The Cplant cluster at Sandia National Laboratories has 580 Compaq Alpha processors and is ranked 84th. The National Science Foundation and the U.S. Department of Energy are planning to build even more advanced clusters that could operate in the teraflops range (one trillion floating-point operations per second), rivaling the speed of the fastest supercomputers on the planet.

Beowulf systems are also muscling their way into the corporate world. Major computer vendors are now selling clusters to businesses with large computational needs. IBM, for instance, is building a cluster of 1,250 servers for NuTec Sciences, a biotechnology firm that plans to use the system to identify disease-causing genes. An equally important trend is the development of networks of PCs that contribute their processing power to a collective task. An example is SETI@home, a project launched by researchers at the University of California at Berkeley who are analyzing deep-space radio signals for signs of intelligent life. SETI@home sends chunks of data over the Internet to more than three million PCs, which process the radio-signal data in their idle time. Some experts in the computer industry predict that researchers will eventually be able to tap into a “computational grid” that will work like a power grid: users will be able to obtain processing power just as easily as they now get electricity.

Above all, the Beowulf concept is an empowering force. It wrests high-level computing away from the privileged few and makes low-cost parallel-processing systems available to those with modest resources. Research groups, high schools, colleges or small businesses can build or buy their own Beowulf clusters, realizing the promise of a supercomputer in every basement. Should you decide to join the parallel-processing proletariat, please contact us through our Web site ( and tell us about your Beowulf-building experiences. We have found the Stone Soup to be hearty indeed.

Further Information:
Cluster Computing: Linux Taken to the Extreme. F. M. Hoffman and W. W. Hargrove in Linux Magazine, Vol. 1, No. 1, pages 56-59; Spring 1999.

Using Multivariate Clustering to Characterize Ecoregion Borders. W. W. Hargrove and F. M. Hoffman in Computers in Science and Engineering, Vol. 1, No. 4, pages 18-25; July/August 1999.

How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. Edited by T. Sterling, J. Salmon, D. J. Becker and D. F. Savarese. MIT Press, 1999.

Leave a Reply