NVIDIA Unveils New Flagship GPU Processor

By Kevin Davies

May 18, 2011 | NVIDIA has released the latest version of its flagship GPU (graphics processing unit) processor, the Tesla M2090. Company executives claim this to be the fastest processor for high-performance computing (HPC) in the market, accelerating applications and offering a 20-30% increase in speed and performance compared to its predecessor, the M2070.

According to Sumit Gupta, NVIDIA’s Tesla product line manager, “life sciences is our #1 vertical” in terms of widespread adoption and the number of users. Applications range from molecular dynamics to genome sequence analysis, with at least one next-generation sequencing company using GPUs in its instruments.

The Tesla M2090 GPU is equipped with 512 CUDA parallel processing cores, delivering 665 gigaflops of peak double-precision performance and providing application acceleration up to 10x compared to a CPU alone.

At the same time, HP is announcing the release of a new server featuring 8 NVIDIA GPUs, the HP ProLiant SL390 G7 4U server. The SL390 family is built for hybrid computing environments that combine GPUs and CPUs. The SL390 G7 4U server incorporates up to eight Tesla M2090 GPUs in a 4U chassis. With a configuration of 8 GPUs to two CPUs, this server has the highest GPU-to-CPU ratio currently available, says Gupta. (Just a few years ago, no server could take even 1 GPU.) The ideal configuration would be 1 CPU core to 1 GPU. “We’re not there yet with this server, but getting closer,” says Gupta. (For the record, Gupta notes that Dell has an extension box that can take up to 16 GPUs, but this is not a single server – it has to connect to another machine.)

Gupta says most customers work with OEMs – NVIDIA doesn’t sell direct, but helps move applications to a GPU. “We’re trying to learn from users about the science they’re trying to do with GPUs,” says Gupta. Besides HP and Dell, NVIDIA also works with SGI, Supermicro, IBM, Tyan and others.

“HP is very high volume OEM. They only build systems like this when they believe there’s a very wide market for them,” says Gupta. While OEMs typically determine pricing, Gupta says it is possible to buy a GPU server with 4 GPUs for less than $10,000. “It’s essentially in the $5-10,000 range to buy a server fully equipped,” says Gupta.

Goes to 11

The benefits of GPUs can be found both in enhanced performance and accessibility. Mark Berger, NVIDIA’s specialist in life and material sciences, recently joined the company after working in drug discovery with Cytokinetix. “I see huge momentum in GPUs, there’s a real wind in our back with a lot of people in academia and software development in national labs and software companies working on GPU versions,” says Berger.

To showcase the performance of the M2090, Gupta cites work using the popular AMBER 11 molecular dynamics software. “Using 4 GPUs, you can now simulate 69 nanoseconds [of molecular dynamics] per day,” says Gupta. Previously, this kind of simulation would require access to a supercomputer in a national laboratory, such as KRAKEN, the 192-quad-core CPU supercomputer at the Oak Ridge National Laboratory, which held the previous simulation record at 46 ns/day.

“This is the fastest result ever reported,” says Ross Walker, a researcher at the San Diego Supercomputer Center who did the AMBER benchmarking. “AMBER users from a university department can now accelerate their scientific work as if they had a supercomputer in their own lab. Other life sciences customers include Boston Scientific (magnetic resonance imaging), Max Planck Institute (3D electron cryo-microscopy), Massachusetts General Hospital (imaging), and OpenEye.

Gupta adds: “It democratizes access to this software to every researcher around the world. You don’t have to write a grant proposal to get access to a supercomputer.” Similar analyses and results are being obtained by David E. Shaw and colleagues, but Gupta points out that their work is performed on a custom supercomputer, Anton.

There are several bioinformatics applications already running on GPUs, including BLAST, Hidden Markov Models, and MATLAB. “Users can get real performance and quite easily port their applications to the GPU,” says Gupta. “The toughest task is that most applications are written with a sequential mind frame – CPUs are inherently sequential. Users have to rethink some of the applications to take advantage of the GPU acceleration and parallel processor.”

A key question facing potential users is, do they have to modify the entire application? “The answer is no,” says Gupta. “When I open a photograph on my hard disk, this is a fairly sequential task, suitable for a CPU. Once a photo is open, you might want to do red eye reduction, autofocus etc. Those tasks modify each pixel mathematically. That’s extremely amenable to GPU. That’s the only part of Picasa you’d have to port to a GPU. Now take sequence search software. Reading the database, opening the sequences can continue to run on the CPU. But the search gets accelerated by GPUs.”