Dagdigian’s Trends in IT Highlight Bio-IT World Expo

May 3, 2012

By Kevin Davies  

May 4, 2012 | BOSTON—“Filter my words accordingly; I’m not a pundit or visionary,” began Chris Dagdigian in a superfluous introduction to his annual “trends in the trenches” review at the Bio-IT World Expo last week. (Dagdigian’s slides can be found here.)  

Dagdigian is a founding member of the BioTeam, a consultancy specializing in IT infrastructure and high-performance computing for life sciences organizations. Most of his colleagues are currently working on next-generation sequencing (NGS) pipelines and data management issues; two are engaged in IT infrastructure and facility projects—including the New York Genome Center—while one is occupied with Amazon Web Services (AWS) cloud projects.   

Dagdigian said 2012 feels a lot like 2011 (see, “Big Scary Graphs @bioitworld”), and noted that NGS is still causing a lot of pain in data handling. Companies are still spending, and there was no sign of pharma contracting in Boston (though that may differ elsewhere). The situation in government circles is of more concern, however, as the stimulus and biodefense funding wind down.   

For the past 12 months, Dagdigian said he had not been involved in any new datacenter projects—most infrastructure projects involve refurbishing and improving existing set-ups, refreshing the electrical/ cooling systems, or switching to blades. An interesting trend was the move to “colo” space—or co-located data centers—perhaps because of the cost of real estate or permit problems in some regions.    

Last year, Dagdigian saw examples of serious power density problems, causing friction between facility and research staff. But those issues are absent so far in 2012. “People are better able to integrate beefy research compute nodes,” he said, say 24-48 cores with “a boatload of RAM.” Such set-ups were satisfactory for many NGS applications and computational biology predictions.  

The average cluster is about 2,000 CPU cores. “Everyone has scale-out NAS [network-attached storage]—I think that’s won the storage battle in our space,” he said.   

Storage  

Petabyte-capable storage is trivial to acquire in 2012, said Dagdigian. But the cost of acquiring data is falling faster than the rate at which industry is increasing drive capacity. “That will cause catastrophic problems this year,” Dagdigian predicted. “The [science] is changing faster than we can refresh datacenters and research IT infrastructure.”  

Not everything is worth backing up, but Dagdigian said, “People still think ‘Keep everything online, forever’ is a viable demand to be making of IT staff. Something will break soon!”  

Dagdigian was excited about new NGS compression techniques, such as CRAM. “We need order-of-magnitude changes in compression,” he said. “Be glad you are not Broad/Sanger/BGI/NCBI.”  

“We’re back to needing storage tiers. I need high performance and storage,” he said, noting that he’s increasingly using Tier 2 storage in various installations, shifting to multiple vendors and mixing high- and low-end products.  

On the disruptive front, Dagdigian noted that storage offerings from vendors such as DDN, Panasas, Isilon and BlueArc all run Unix on standard architectures. “Your storage will be able to run apps,” he said. And while there were caveats, he noted the ability to build 135 terabytes of raw storage on a Backblaze storage pod for just $12,000.  

Clouds   

When it comes to cloud computing, Dagdigian said, “I’m not an Amazon shill,” he insisted, but “Amazon AWS has more building blocks. It’s an infinitely cool tool… It’s difficult to see anyone capturing the cloud crown from Amazon,” he said.  

In March, Dagdigian said he was able to sustain the transfer of 700 Megabytes of NGS data/sec for seven hours into Amazon using a 1GbE Internet connection and Aspera’s high-speed data transfer software. At that rate, “I can handle a genome core facility or 60 genomes/day,” he said.  

Dagdigian was still the recipient of “endless ‘BS’ being shoveled at me from private cloud vendors.” Dagdigian put it this way:  

“No APIs? Not a cloud… No self service? Not a cloud… If you just install VMWare and excrete a press release, that’s not a cloud… If you have to email a human? Not a cloud. Only a 50% failure rate? That’s a stupid cloud… Block storage and virtual servers? (Barely) a cloud.”  

In short, Dagdigian said he remains cynical of private clouds. At the very least, “I would not deploy a private cloud solution … that does not have Amazon API compatibility,” he said.  

MIT StarCluster is an amazing tool for managing clusters of virtual machines on Amazon EC2, Dagdigian said. “It’s going to be a MapReduce world—get used to it… There’s little need to roll your own Hadoop solution in 2012.”  

As for coming trends, Dagdigian said he was bullish about the Siri voice control demonstration BioTeam conducted with BT Global Services (see, “Hello, Siri…”), predicting a variety of interesting experiments. He predicted growing popularity for Opscode’s Chef (winner of a 2012 Best of Show award) and pNFS as well as smart storage systems from Drobo and DataDirect.  

Editor’s note: Chris Dagdigian will be a keynote speaker at Bio-IT World Asia, Singapore, June 6-8, 2012.