Aspera Speeds Data in Amazon Cloud
By Bio-IT World Staff
March 15, 2012 | SAN DIEGO—The directory of Aspera’s approximately 1,400 clients reads like the Fortune 500 list: Disney. Netflix. NBC. NBA. Pixar. Bank of America. Coca Cola. Add to that some fairly impressive names in the life sciences, including the Broad Institute, Dana Farber Cancer Institute, and the Mayo Clinic.
These organizations use Aspera’s proprietary software to speed up the transfer of large volumes of data, which is significantly impacted by latency and packet loss. “We’ve solved the fundamental problem of moving big data over public and private networks,” Aspera’s Daniel Kumi, director of sales and business development, told an audience at CHI’s XGen Congress last week.
Aspera already offers its fasp solution for the first major bottleneck in the WAN (wide area network) to improve the utilization of the pipe. But another key bottleneck is the “last foot”—the datacenter itself. For cloud computing, that translates to the short distance between the Amazon EC2 server and the S3 object store.
Direct-to-S3
Aspera’s newest innovation—Aspera On-Demand Direct-to-S3—works with Amazon Web Services (AWS) and is designed to enable users to take full advantage of cloud computing with big data.
Data in the Amazon cloud is stored in distributed object stores, said Kumi. This offers simple, scalable storage of data blocks (or objects), but the downside is that the block sizes are small (typically 64/128 megabytes), meaning large files must be ‘chunked.’
“A 1 terabyte file must be chunked into very small pieces and reassembled,” into thousands of chunks, said Kumi. The good news is that there are commercial tools that allow for management of complexity and reassembly for storage. But these tools use TTP protocols, which Kumi said would hamper effective data throughput.
Aspera On-Demand is a paid Amazon Machine Image (AMI), available on 64-bit Linux. It provides an interface to an Amazon object store while maintaining maximum bandwidth, said Kumi. “The experience of moving data will be seamless and transparent to the end user,” he said.
The transfer of large volumes of data to and from the cloud typically requires shipping a hard drive or using FTP, with all the delays and interruptions that entails. Depending on bandwidth capacity, Aspera says its on-demand functionality allows users to move data to and from AWS at transfer throughputs of around 350 megabits-per-second (Mbps) over thousands of miles.
Kumi presented a case study involving the 1000 Genomes Project, in which Aspera software has helped synchronize data between the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute in the UK. Kumi said Aspera had afforded a 50x performance advantage. Any user could license the Aspera software, while the NCBI also deploys a version that supports a browser plug in.
“[NCBI] has become a big hub for the distribution of genome data,” said Kumi. The institute has jumped to a 10-gigabit license, distributing content to some 200 locations at very high speed, averaging 1 petabyte data/month. “They are probably one of the largest users of our software,” he said.
Another customer is the Fred Hutchinson Cancer Center in Seattle, which needed a simple way for researchers to share data with their collaborators. Aspera’s faspex server allows recipients to download files with the fasp protocol with top speed, security, and tracking capabilities.
Aspera now has a Microsoft Outlook plug in to transmit files. “It’s like sending any attachment with no size limitations,” said Kumi. The feature has the usual speed, security, performance and tracking capabilities, along with a Dropbox-like feature to share with collaborators.