Pistoia Alliance Announces NGS Data Compression Algorithm Competition

October 25, 2011

By Bio-IT World Staff 

October 25, 2011 | The Pistoia Alliance, a precompetitive alliance of more than 50 life science companies, vendors, publishers, and academic groups, has announced the launch of a competition to find the best algorithm for compressing next-generation sequencing (NGS) data. 

The winner of the Pistoia Alliance Sequence Squeeze Competition stands to win a $15,000 prize. Joining Pistoia Alliance president Nick Lynch among the judges will be Guy Coates, information systems lead at the Wellcome Trust Sanger Institute, and Yingrui Li, deputy operation officer of the BGI. A fourth judge will be announced shortly.  

The competition aims to encourage a diverse group of scientists -- including bioinformaticians, mathematicians, physicists, and computer scientists -- to address a thorny problem in the management of NGS data. The Alliance notes that with sequencing costs plummeting faster than the data storage and processing rates predicted by Moore's Law, current NGS instruments can generate more data in a day than any one machine could have produced during the whole of 2005.  

Labs rely on compression to enable them to store data from sequencing runs, which includes sequencing reads and associated quality scores. Yet compression technologies are themselves faltering under the data volumes produced by NGS.  

“There is a very real need today for novel methods of compressing sequence reads and their quality scores in a way that preserves 100% of the information while achieving much-improved linear-or, even better, non-linear-compression ratios,” said Lynch. “We believe in championing this type of grassroots, from-the-trenches innovation.” 

The competition, which closes next March, requires entrants to devise and implement a computer algorithm for compressing (and decompressing) NGS data stored in the FASTQ format. Entries must be fully open source so that the entire scientific community can benefit from the winning algorithm. Full details of the competition rules and entry process are at www.sequencesqueeze.org 

The competition will be administered by Eagle Genomics, a British bioinformatics services and software company.