Garvan Institute Uses Panasas System for X Ten
By Bio-IT World Staff
September 21, 2015 | The Garvan Institute of Medical Research in Sydney was one of the first three groups to sign up for Illumina’s HiSeq X Ten sequencing platform in 2014, a system that produced up to 5TB of data per day. But before the X Ten, Garvan already needed more storage capacity.
“We needed a parallel file system to support the compute on our cluster and the Cancer Institute in New South Wales made a call for equipment grant applications,” Dr. Warren Kaplan, chief of informatics at Garvan, told Bio-IT World. “The equipment grant would only fund equipment and not dedicated personnel, and while we were keen to get a Lustre file system, we were advised that to support Lustre we would need a dedicated Lustre administrator, something we did not have nor could we fund off the grant.”
To enable round the clock computing with as little down time as possible, Kaplan opted for the Panasas PanFS parallel file system.
Garvan bought five Panasas ActiveStor network-attached storage (NAS) appliances and migrated a team of 80 researchers to the Panasas storage. The team later installed one more ActiveStor storage system, bringing total Panasas storage to 400TB.
“It really is an appliance,” said Geoffrey Noer, Panasas’ vice president of product management. “Once it’s racked it’s under ten minutes to get up and running.” Kaplan agreed that the installation process proceeded, “very smoothly and quickly,” Kaplan said. “In fact the system was ready for testing on the same day as installation.”
By choosing a parallel file system, Kaplan says, productivity has increased. “Most of our compute is on a single node and we make very little use of RDMA [remote direct memory access] or MPI [message passing interface],” said Kaplan. “With the arrival of the Panasas, our model of compute shifted away from our previous practice where we would copy files off the centralized NAS to the local storage on the node and compute off that, to a model whereby we could now compute directly of the centralized Panasas file system.”
Six months after the Panasas system was installed, Garvan bought an Illumina X Ten system, and the system has continued to serve the institute well, particularly with efficiency of storage.
In Illumina’s recommended workflow, sequencer data moves back and forth between central storage and local storage on the compute nodes, explained Noer. “We have a protocol called DirectFlow, which is a much higher performance means of accessing storage than legacy protocols that other storage systems use. The advantage with DirectFlow is that when you are analyzing data from a Linux cluster, every single one of those compute nodes can access storage exactly where it lives directly instead of having to asking a node to talk to another node to get the data to send back,” he said. “With DirectFlow the right node is always asked for the data that’s sitting on that node. That’s possible because of what we’ve done with the protocol.”
The efficiency of the access of the storage is very important, Noer continued. “When you’re trying to get every ounce of performance out of the storage component, having the right protocol to access those components makes a big difference in the performance of the solution.”
Kaplan has seen an improvement with the Panasas architecture. “Thanks to Panasas’ exceptional performance, our sequencing data stays in the central repository throughout the analysis,” he said. “This streamlined workflow saves time and bandwidth, enabling us to deliver results quickly to researchers around the world.” The new Illumina sequencers and the high-performance platform with Panasas storage have increased Garvan’s sequencing capacity to 50 genomes per day on average—a fiftyfold improvement.
Kaplan has also been pleased by the Panasas system’s damage control. When the Institute’s cooling system failed, the Panasas system automatically shut down, avoiding damage that could corrupt data and reduce IT and researcher productivity. “Panasas lives up to its promise of terrific performance with negligible maintenance and administration time,” said Kaplan.
Garvan has an ambitious goal, nothing less than transforming the practice of medicine through genome sequencing. The institute envisions itself as an enabler that can rapidly prototype and evaluate specific analyses. Once verified, those analyses will be made available to downstream research institutions as well as businesses working to commercialize genomics technology. “Fundamental to our work is maintaining an extraordinary infrastructure that makes it all possible,” said Kaplan, “and Panasas is a key part of that.”