Intel and Broad Commit to Five-Year Genomic Data Engineering Partnership
By Joe Stanganelli
November 22, 2016 CAMBRIDGE, Mass.—On Thursday, Intel and the Broad Institute came together to announce a joint endeavor that aims to enable research organizations and their IT departments to keep up with exponentially increasing amounts of genomics data.
The two entities are establishing the Intel-Broad Center for Genomic Data Engineering, a five-year collaboration project that seeks to achieve manifold, lofty goals in scaling up genomics research.
Funded with a $25 million investment, the Intel-Broad Center aims to, inter alia, improve the Broad Institute's Genome Analysis Toolkit (GATK) best practices, develop "secure multiparty computation" so that organizations across different jurisdictions with conflicting data-privacy rules and conventions can connect and share their siloed research data, and generally scale up genomics research and analysis at far faster rates,
"The size of genomic datasets doubles about every eight months—and, as it does, the challenge of acquiring, processing, storing, and analyzing this information increases as well," Eric Banks, a Senior Director of the Broad Institute's Data Sciences and Data Engineering Group, said in a public statement. "Working with Intel, we plan to build out solutions that can work across different infrastructures to facilitate efficient processing of these growing datasets, and then make these tools openly available for researchers worldwide."
"Our work on this infrastructure is analogous to a superhighway to connect genomic databases in a secure and federated way," explained Anthony Philippakis, Chief Data Officer of the Broad Institute, in a YouTube announcement video that went live on Thursday. "We hope to create a model for other industries to break down barriers and speed up research and discovery that relies on complex and distributed datasets."
Banks also put forth a superhighway analogy to describe the project; indeed, for both collaborators, the Intel-Broad Center is a two-way street.
Intel Inside Genomics
The two genomic powerhouses appear to complement each other well; the Broad Institute is a not-for-profit research organization that has helped define genetic research and diagnostics today, whereas Intel is a commercial hardware and high-performance computing (HPC) giant that has long been prioritizing—and eagerly scrabbling for footholds within—the genomics space.
"Why we choose genomics—it['s] about helping people," said Ketan Paranjape, Intel's General Manager of Life Sciences and Analytics, in a June 2014 interview, pointing to a then-recent collaboration with Dell that purportedly reduced the time to sequence the RNA in neuroblastoma patients from one week to four hours. "Of course, we are looking for [vertical] adjacencies in HPC, like oil and gas, financial services, etc., down the road."
Focusing on how and why Intel decided to get involved in the genomics space with on-premises HPC solutions, Paranjape explained that Intel's focus on genomics was a natural, incidental extension of the HPC industry's attempts to meet ever-increasing Big-Data demands.
"We personally think that genomics is just a start," said Paranjape, indicating that three factors—the sheer volume of Big Data involved in genomics, the number of software programs running simultaneously on HPCs that need to be optimized, and the amount of fragmentation in the market—make it the most challenging vertical with which to work. "We want to establish genomics as a strong competency… Let's solve the bigger problems first."
Intel has made admirable strides in those efforts over the past two years, having launched its Collaborative Cancer Cloud last year despite having been rejected on its proposal for an NCI Cloud Pilot grant. Over the past year, Intel's CCC has picked up three major partners. The company has also partnered with numerous companies in bringing specialty on-premises HPC appliances to the genomics realm—and has recently undergone a realignment in its business strategy so as to make these turnkey appliances more accessible to clinical researchers and bioinformaticians.
Artificially Intelligent Genomic Analytics
Now, Intel seems to believe it is on the verge of solving the "bigger problems" Paranjape spoke of with an equally big technological solution: artificial-intelligence (AI). In the company's joint press release with the Broad Institute, Intel noted that the Intel-Broad Center will apply AI-powered data analytics to enhance both Broad's own biomedical research resources and the state of precision medicine as a whole.
"We each bring to the collaboration our unique expertise and capabilities," Diane Bryant, Executive Vice President and General Manager of Intel's Data Center Group, said in a public statement on the day of announcement. "At Intel, through the use of artificial intelligence, we are confident we can solve the massive data challenges facing the industry."
The Broad Institute, for its part, apparently agrees.
"When you think about some of the analytics challenges with genomics information, [they have] similarities with many other areas that have revolutionized the technology ecosystem, ranging from the ability to automatically recognize faces to self-driving cars," Philippakis told Bio-IT World in an email. "[W]hat they have in common is learning from examples, and using machine learning and artificial intelligence to accomplish this goal. When you look at the next decade or so [for] both genomics [and] healthcare at large, we think it’s very likely that we are going to see that the great breakthroughs are going to come from applying artificial intelligence to problems in healthcare."
A Next-Level Collaboration
As for the partnership itself, the two genomic bedfellows are definitely not strangers to each other. Intel and the Broad Institute have been working together on various initiatives for some time. Intel contributes to Cromwell, a Broad-developed workflow engine; Intel's Genomics Kernel Library (GKL) has been integrated with GATK; and the Broad Institute has integrated both Intel's hardware and its genomic-variant data management system, GenomicsDB, into its production pipeline for joint genotyping.
Moreover, at this year's Bio-IT World Conference and Expo, the two organizations co-announced another joint project to develop "new tools… so large genomic workflows can run at cloud scale," as well as continued endeavors "that address the size, speed, security, and scalability challenges associated with large-scale genomic-sequencing data and analytics."
"Broad and Intel have already done amazing work together," said Philippakis in his video announcement of the Intel-Broad Center. "For example, in a recent project, we reduced the time it took to genotype a 20,000-sample cohort to less than 1/10th of the time it took previously. Our continued work together has the potential to help researchers everywhere drive new insights that can ultimately improve human health."
To be sure, according to both organizations, the Intel-Broad Center will be "building upon an existing collaboration."
"We've been working with Intel engineers for some time now, and we've all been enjoying it so much, we decided to commit to the relationship big time," beamed Geraldine Van Der Auwera, Group Leader of Data Science and Data Engineering at the Broad Institute, in "[W]e are taking our collaboration with Intel to the next level."