Broad Institute To Release Genome Analysis Toolkit 4 As Open Source Resource To Accelerate Research
By Bio-IT World Staff
May 24, 2017 | The Broad Institute of MIT and Harvard will release version 4 of the industry-leading Genome Analysis Toolkit under an open source software license. The software package, designated GATK4, contains new tools and rebuilt architecture. It is available currently as an alpha preview on the Broad Institute’s GATK website, with a beta release expected in mid-June. Broad engineers announced the upgrade, as well as the decision to release the tool as an open source product, at the Bio-IT World Conference & Expo today.
The new version is built on a new architecture, allowing significant streamlining of individual tools and support for performance-enhancing technologies such as Apache Spark. This new framework brings improvements to parallelization, capitalizing on cloud deployment and making the process of analyzing vast amounts of genomic data easier, faster, and more efficient.
“We wanted to remove traditional barriers of scale while offering the same high level of data quality our users expect.” wrote Eric Banks, Senior Director of Data Sciences and Data Engineering at Broad and a creator of the original GATK software package. “Thanks to the rapid adoption of cloud computing, researchers can finally do away with many of the infrastructure-related complications that have hampered progress, especially at smaller institutions and startups.”
Fully open source software
GATK4 will be released as a fully open source product, thanks in part to a collaboration between Broad Institute and Intel Corporation to advance high-performance analytics so researchers can study massive amounts of genomic data from diverse sources worldwide.
At the Intel-Broad Center for Genomic Data Engineering, software engineers and researchers have spent the last several months building, optimizing, and widely sharing new tools and infrastructure to help scientists integrate and process genomic data. GATK4 has benefited from this collaboration, which has helped engineers optimize best practices in hardware and software for genome analytics to make it possible to combine and use research data sets that reside on private, public, and hybrid clouds.