Seven Bridges Introduces Open Source Cancer Genomics Workflow

By John Otrompke

August 20, 2014 | Seven Bridges, a Cambridge company that builds tool to help scientists process genomic data, has created a cloud-based workflow for analyzing cancer genomes. The workflow was presented at the Biology of Genomes conference at Cold Spring Harbor in May.

When scientists are running a cancer genome analysis, “a typical database query will return millions of genetic variants. This is too much for human processing and needs to be prioritized,” said Lu Zhang, PhD, director of R&D cancer informatics at Seven Bridges. “The main advantage of this pipeline is that it uses algorithms to review large amounts of information that currently has to be manually reviewed by experts before they can arrive at hypotheses. We believe this is one of the critical problems, and our team is dedicated to building a great application around it.”

“The workflow includes databases with a catalogue of genetic tumor variants, derived from whole genome sequencing and exome sequencing, from the 1000 Genomes project, and other information. After processing, it outputs the following information: disease-causing mutations, gene function, pharmacogenomics, and a prognosis for treatability,” explained Zhang.

The workflow, which Seven Bridges began working on about six months ago, derives its evaluations from peer-reviewed oncology publications, many of which have been developed in consultation with oncologists, while the evaluations of the tissue samples are output from the software itself. “The pipeline has 10 main components, each one with at least one peer-reviewed publication to back it up,” said Zhang. “The open source algorithms we use are mainly for somatic variant calling and as a predictive tool for driver mutations.”

While the algorithms are currently open source, the underlying cloud-based operating system, which runs on the Amazon Web Services cloud, is proprietary to Seven Bridges, which has patents pending on the technology, according to Zhang.

The system was previously tested using The Cancer Genome Atlas. “We showed that our pipeline, on this gold-standard dataset, was able to rapidly and automatically detect disease-causing mutations,” said Zhang, who received a PhD in 2012 in bioinformatics, with a focus on genomics and lipidomics. While TCGA is a great resource for a retrospective study, some suggest that a much better validation of the workflow would be a prospective study with a dedicated patient cohort.

Seven Bridges intends to increase the content of its data collection, but currently aggregates publicly-available resources. The pipeline is a general oncology tool for use in all cancer types, including breast, colon, kidney and lung cancer and leukemia. “We are considering some proteomics data for the knowledge annotation,” said Zhang, but currently the database has none.

“Our pipeline is focused on cancer genomic data processing for individual patients' genome analysis,” said Zhang. The cloud-based workflow itself is open source, but, she added, “We might add proprietary algorithms in the future, and see if that improves the quality of results.”