GRID COMPUTING BEGAN AS A DATA-MANAGEMENT SOLUTION FOR CERN’S LARGE HADRON COLLIDER. NOW, IT STANDS TO REDEFINE COLLABORATIVE PROBLEM-SOLVING IN SCIENCE AND BEYOND
Charles Curran, a physicist who recently retired as the longtime storage consultant at CERN, remembers the old days of data access: when filling a request from a researcher was often a labor-intensive, daylong misadventure.
In the 1970s, information from CERN’s accelerators and experiments was stored on tapes, held in a huge library in the IT department, originally retrieved manually by operators and then copied to disk for the researcher. Overworked operators fell asleep, went missing for hours at a time, invented trickery to make the machines work faster, and overloaded the conveyor belts, causing tapes to fall off and disappear. Tape-retrieval robots squared off against mice (in one documented case, the mouse was found months later, desiccated) or overheated when they couldn’t reach tapes, melting their wheels in frustration. A request to see a certain tape often took 24 hours to fill.
Now the wait is about two minutes, hardly enough time to get a cup of coffee.
Accessing and processing data is now faster, more flexible, more reliable, and cheaper. A researcher in Croatia can reach and exchange data, in a variety of formats, with a colleague in Argentina almost immediately, 24 hours a day, seven days a week, without leaving her desk or going up against any rogue mice.
In the past decade, the public research community, the European Commission, the US, and other countries’ governments have invested heavily in game-changing data infrastructure known as “grid computing.” A grid is a network for sharing computer power and data-storage capacity over the internet. It goes well beyond simple communication between computers, ultimately aiming to turn the global network of computers into one vast resource for solving large-scale computer- and data-intensive applications. Grid computing is often compared to the concept of an electric power grid in which the power generators are distributed; in a computational grid, users can access computing power without regard for the source of energy or its location. A key element of grid computing is that it enables real-time collaboration between geographically dispersed communities in the form of virtual organizations.
In the next decade, we must invest even more heavily in such technology. Data is fundamental to science, and the science we do now requires ever-increasing data sets. We need flexible, powerful computing systems to support this data.
- Donate Your CPU Cycles to IBM’s Clean Water Grid (insidehpc.com)