The Tennessee Titan: Oak Ridge, Cray, NVIDIA Create New Open Science Supercomputer
Editor's Note: As predicted, Titan moved into the top spot on the Top500 list in November.
By Kevin Davies
October 29, 2012 | A new 20 petaflop supercomputer dubbed Titan is powering up at the Oak Ridge National Laboratory (ORNL) in Tennessee and ready to tackle some a host of scientific applications.
The new machine, which replaces Jaguar, ORNL’s previous supercomputer, features new technology from Cray as well as the new K20 GPU (graphics processing unit) processor from NVIDIA. With a footprint about the size of a basketball court, much of Titan’s operating time is open to any research team for use, subject to peer review.
The key motivation in developing Titan was to build a supercomputer that dwarfed Jaguar in terms of performance without dramatically increasing the cost to run the machine. At 2.3 petaflops, Jaguar was the top-ranked supercomputer when it was originally built, and consumed 7 megawatts of power—enough to power 7,000 homes.
But Jaguar relied on traditional CPU technology, which executives at NVIDIA would argue is no longer economically feasible. Titan features GPU accelerators in a 1:1 ratio with CPUs. The compute intensive part of the application runs on the GPU chips, while the serial parts of the operation stay on the CPU.
“When fully loaded, running the [Top 500 rankings] benchmark, the Jaguar used 7 megawatts (MW). Titan will use 9 MW. That’s a small 20-25% increase in power consumption for a 10x increase in compute power—more than 20 petaflops,” says ORNL director of science Jack Wells. About 90 percent of Titan’s peak performance is generated by the GPUs.
The announcement of yet another massive supercomputer is hardly news, concedes NVIDIA’s general manager of accelerated computing, Sumit Gupta, but it is a further sign, he argues, that simulation and computer modeling have become an integral part of modern research. “Along with experimentation and theory, we’ve added computing as a tool in the scientists’ toolkit. Supercomputing is the third pillar of scientific research.”
Gupta says NVIDIA’s GPU technology, which is supported by the thriving gaming industry, provides solutions for high-performance computing. “We’re doing millions of these computations every second,” he says. “The problems are very similar to HPC world, hence we can leverage GPU as an accelerator for HPC.”
Introducing Titan
Titan Specs |
|
Compute Nodes | 18,688 |
Login & I/O Nodes | 512 |
Memory per node | 32 GB + 6 GB |
# Opteron cores | 299,008 |
# NVIDIA K20 ―Kepler accelerators (2013) | 18,688 |
Total System Memory | 710 TB |
Total System Peak Performance | 20+ Petaflops |
Titan is a physical replacement for Jaguar, and is housed in the same 200 cabinets used in its predecessor. The new machine features 18,688 new Cray XK7 compute nodes or CPUs, which supersedes the XK6, and a total of 299,008 cores (each CPU has 16 cores). The XK7 features a Gemini interconnect, which ensures that servers can communicate with each other in a machine that has more than 18,000 of them.
Each node contains a 1:1 ratio of AMD Opteron CPUs and the NVIDIA Tesla accelerator K20 GPU. Gupta says the K20 is easier to program with added C++ and Fortran compatibility. “We’ve added new features to reach more programmers and reach domains that couldn’t take advantage of parallel processing so far,” says Gupta.
While there have been several impressive HPC systems featuring GPUs, including the formerly top-ranked Chinese supercomputer and about 10 percent on the latest Top 500 rankings, Wells says Titan is the first system of this scale—more than 20 petaflops. “It is the fastest open science supercomputer,” he says. Titan’s actual ranking will emerge in mid-November, when the latest Top 500 list is released. The current top ranked machine is Sequoia at the Lawrence Livermore National Lab. But that is a classified machine used by the US Government.
Open Season
What makes the Titan particularly significant is that any scientist can apply for time on the machine, says Wells. Selection is made by peer review, with applications open to academia and industry, and is not confined to groups receiving Department of Energy (DoE) funding or having a US presence.
Time on Titan is allocated in one of three ways: 60 percent is allocated according to an annual review called the INCITE Program, which has approved six initial applications. In addition, the DoE allocates 30 percent of Titan’s time, while the remaining 10 percent is a discretionary allocation by ORNL staff.
One of the first six projects approved under the INCITE program is a biofuels project, which will run a molecular dynamics program called LAMMPS to develop second-generation biofuels, or as Wells puts it, “get gas from grass.”
Wells also highlights some discretionary projects that have been awarded time in 2013, including one from Temple University chemist Michael Klein to study drug transport through the skin into the body (in collaboration with Procter & Gamble).
Two other projects were awarded to a husband-and-wife team at the University of Illinois—physicist Klaus Schulten’s research on simulating molecular components of a human cell as part of a long-term project to simulate cell function, and chemist Zan Luthey-Schulten, who is simulating ribosome biogenesis and other cellular processes.
“Another project in our discretionary resource related to drug discovery scored well, but not well enough to win time [via INCITE],” says Wells. “But this is a project involving the University of Tennessee and ORNL, so we’ll give them some discretionary time.” Jerome Baudry is seeking to modify the AutoDock program to do advanced screening of drugs against proteins. “The goal is to help pharma companies to fail early,” says Wells.
Each of these projects is allocated for up to 12 months, but can be renewed. While there have been some proposals in the field of genomics, Wells says, this has not been a dominant theme so far. “We had one submission on metagenomics, but at the deadline they weren’t ready,” he says.
The next opportunity to apply for the DOE time allotment is February 2013. However, Wells says potential users can apply for computing time at any time by sending a short application through the discretionary access program to the Oak Ridge Leadership Computing Facility here: http://www.olcf.ornl.gov/support/getting-started/