Lessons Learned From The Cancer Genomics Cloud
By Allison Proffitt
May 17, 2017 | There was a changing of the guard several months ago at Seven Bridges. Brandi Davis-Dusenbery, Ph.D., was promoted to CEO. Co-founder Deniz Kural, Ph.D., assumed the role of Executive Chairman of the Board of Directors, and the company named new Chief Strategy, Financial, Communications, and Product Officers, and a new Executive Vice President of business development and strategy.
Bio-IT World caught up with newly-minted Chief Communications Officer Andrew Gruen (formerly director of marketing) a few weeks after the announcements. Gruen gave an update on the Seven Bridges Cancer Genomics Cloud and some hints at how lessons learned in the public arena are powering Seven Bridges’ private business.
In late 2014, Seven Bridges was one of three groups to win a contract from the National Cancer Institute for the NCI Cancer Genomics Cloud Pilots. In February of 2016, the company announced that its CGC was live; last May the CGC was recognized as a 2016 Bio-IT World Best of Show award winner.
A little over a year after its launch the CGC has more than 1,500 users, Gruen reported, and over the past year, the users have completed about 97 year’s worth of computation on the platform. There are nearly 250 public tools created for the CGC, and 13PB of linked data.
The linked data metric provides a way of assessing cost and time savings, Gruen said. There’s about 1.4PB of controlled access TCGA data available. Instead of those data being copied, moved to, and hosted on a local cluster, they’ve been accessed within the CGC many times over. “We’re looking at a magnification effect of almost, like, ten times the available data copied into projects. It’s a very significant cost savings, and it speaks to the very value proposition of doing something like CGC,” he said. “You are enabling access to all these folks and doing so in a more economical way than your previous research paradigms.”
The vision of the NIH and Harold Varmus when it launched the cancer genomic cloud projects, was to bring questions to the data, said Gruen, and the projects on the CGC seem to be doing that. Projects have an average of 7-8 collaborators across different institutions. “It’s much easier to bring eight people in and work together when it’s basically sending something a link by email, than shipping them a bunch of hard drives,” he pointed out.
He also reported that Seven Bridges is seeing new kinds of projects being done on the CGC. For instance, more than 5,000 applications have been added to the CGC by users. “Potentially, that’s a little over 5,000 different types of analyses that were made possible by uploading tens of megabytes instead of downloading a couple of petabytes.”
Seven Bridges usually shows off the CGC’s visual interface, but Gruen says users who are doing analysis at scale are more often using the API. In the first quarter of 2017, Gruen said, the CGC has seen 1.9 million API calls. He counts that as evidence of the different types of users who are accessing the portal. “We see researchers all over the world actually doing what the NCI hoped they would do, and helping us learn how to do it better as a result of all of these pilot projects.”
In October 2016, all three NCI cloud pilots were renewed, with the mandate to add additional datasets including the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) program, and the Cancer Genome Characterization Initiative (CGCI); and incorporate medical imaging and proteomic data.
The variety of data the company is now working with is the impetus for Seven Bridges’ informal name change. The company no longer uses “Genomics” in most public materials. “We know and recognize that there’s a lot more to biomedical data analysis than just genomics. Genomics is great, we do a lot of genomics. Dare I say we do mostly genomics. But we understand and we already see how that’s broadening.”
Beyond the CGC
Gruen declined to share how much of Seven Bridges energy and attention are dedicated to the CGC, but he did concede that the project has been a valuable learning experience for the company. “The CGC remains an opportunity for us… to learn at some of the largest scale available in public datasets. It remains something we pay a lot of attention to, and we really think that we are excelling at,” he said.
For instance, the work Seven Bridges has done on its data browser—“and repeatedly revamped,” Gruen said—has paid dividends. The data browser was developed as part of the CGC so users could more easily check metadata within the TCGA dataset and find out if the files they needed were there. Gruen said that process of building and refining the data browser has changed some of the company’s thinking about how users browse data, and how to serve different user communities.
“If you are an advanced bioinformatician you can write SPARQL queries and get exactly what you want instantly. But if you are a clinician or a lab-based researcher, you can also go in and visually design a query without having to know anything about RDF or semantic languages and triplestores that gets you directly to an answer.”
TCGA is a great dataset for that type of training because it’s so vast, Gruen said. And by working to better serve the users of the CGC, Seven Bridges has developed a philosophy of data browsing, that he says gets “baked” into the functionality of a whole host of products.
“Our goal isn’t to make data available; it’s to make data useful,” he said.
That’s the mission behind a product the company launched to early access users in December of last year. Seven Bridges Sonar is a platform for previously-processed data that users can use to ask questions of genotypic and phenotypic metadata “to begin to go from big question to an actionable, testable hypothesis,” Gruen said. As researchers begin to accumulate genomes, they’ll need a tool to access past work and put it into context. “It’s allowing an organization to have all of their data and make use of all of this data that was expensive to generate. It was really useful in one capacity, but now it’s useful in a broader set of capacities.”
It’s a unifying need across both government and large pharma clients, Gruen said. “’How can we get the data to talk to one another?’ The more we can do that, the more we are delivering on our value of accelerating discovery.”
Spinning a Web
Seven Bridges has been applying its lessons learned on connecting data to the CAVATICA project as well, a partnership between the Cancer Moonshot, the Children’s Brain Tumor Tissue Consortium, and the Pacific Pediatric Neuro-Oncology Consortium. The two consortia are biobanking samples at the Children’s Hospital of Philadelphia (which is a member of both), have collocated sequencing resources, and are sharing the data via an analysis platform launched in October 2016. The project hired Seven Bridges to build a “data analysis and sharing platform designed to accelerate discovery in a scalable, cloud-based compute environment where data, results, and workflows are shared among the world's research community,” according to the CAVATICA website.
“There’s not a huge amount of government funding for the production of pediatric data, so the more they’re able to combine data, the better statistical power they get,” Gruen said, calling CAVATICA a “spectacular” resource.
The data on the CAVATICA platform are varied: more than 111,000 files from 53 datasets (TCGA accounts for 33 of those), whole exome sequences, whole genome sequences, RNA-Seq, genotyping array, miRNA-Seq, and methylation arrays. The platform includes files on 41 different diagnoses including schizophrenia, pediatric cancers, and rare diseases.
“We’ve really learned a lot about harmonizing metadata, so we’re able to help them connect a bunch of different datasets in a way that enhances compliance. You can’t just break the rules and give your collaborators a USB drive. You’ve got to actually get access and all that. We’ve learned a lot about how to make data directly useful,” Gruen said.
CAVATICA is a finalist in this year’s Bio-IT World Best of Show Awards, and is a contender for the People’s Choice Award.