St. Jude Cloud Announces New App-Driven User Interface, Functionality
By Allison Proffitt
April 28, 2020 | St. Jude has released a new brand and user interface for the St. Jude Cloud, an initiative the research hospital launched two years ago in collaboration with Microsoft and DNAnexus.
In 2018, the St. Jude Cloud team wanted to create not only a portal to a valuable pediatric oncology dataset, but a platform for researchers to be able to “do real work”. They filled the St. Jude Cloud with validated data analysis pipelines and interactive visualization tools to make it easier to make discoveries from large datasets.
But since then, the vision has significantly expanded, explains Clay McLeod, who has been managing the St. Jude Cloud development team and is now Director of Product Development and Engineering at St. Jude. “What we’re trying to set up here is a much more scalable framework for sharing for all of St. Jude data, not just the genomics data.”
The first iteration of St. Jude Cloud, “was geared toward just the genomics data itself and transcriptomics data,” McLeod says. “But what we’re seeing is, there’s much more of an appetite for sharing at St. Jude, not just within the genomics groups.” For example, he says, there are xenograft models of rare tumor types, histology data, and drug interaction data that can now be explored.
App Architecture
The St. Jude Cloud is now organized into apps instead of Visualizations, Tools, and Data—the previous organizational structure. Each app gets its own identity within the St. Jude Cloud ecosystem and is designed to address a particular use-case. So far there are three apps: the Genomics Platform, PeCan, and the Visualization Community. More apps are in progress, McLeod says.
“These are all applications that are within an ecosystem and are built together and know about each other,” he explains—comparing the ecosystem to Office 365 or G Suite by Google. “You go to each application depending on the particular use case that application is trying to enable. When it makes sense, those application have integrations between one another.”
The Genomics Platform includes all that was originally part of St. Jude Cloud—all of the omics assays and cancer genomic data. Within the St. Jude Cloud Genomics Platform researchers can request access the world's largest repository of pediatric cancer related genomics data for analysis in the cloud or download, upload data alongside St. Jude's data to do cross-cohort analysis in the cloud, and leverage any of St. Jude Cloud's end-to-end computational workflows written by experts in the respective areas.
McLeod describes PeCan as the St. Jude reference dataset. “This is our opinionated way of how we want users to explore the pediatric cancer variation landscape,” he explains. PeCan includes variant pages—quite similar to ClinVar but pediatric focused, McLeod explains. The St. Jude Tumor Board is updating this knowledge base in real time about literature that’s relevant to a variant, population frequency data, and more data.
The Visualization Community is the newest St. Jude Cloud app—one McLeod is quite excited about. “Researchers are really quite underserved in terms of the tools at their disposal to interrogate data. For instance, if you are doing analysis on a 3,000-sample cohort and you want to look at all the variants and do some analysis of those, many people still use really big excel spreadsheets,” he says. Spreadsheets are the wrong medium for data exploration, he insisted. “It’s not enjoyable; people don’t enjoy their lives or their research while they’re using them. You’d be surprised at how much that hinders discovery.”
St. Jude has invested a lot of effort into creating visualization tools that instead enhance discovery and understanding of the data, McLeod says. The Visualization Community lets researchers view community-created visualizations that use ProteinPaint, GenomePaint, and SJCharts, then create their own visualizations by uploading their own data. Those visualizations can be included in papers or shared with collaborators.
The library now includes 22 visualizations, most originally accompanied St. Jude published papers (those papers are linked), McLeod says. Researchers would totally own their own data spaces; St. Jude wouldn’t be able to see any of the data that they upload, except for what they expose so the visualization can be rendered.
“The trend that we’re trying to bring about here at St. Jude—and that we hope disseminates in the community—is individuals will go more of this route and make visualizations interactive as they publish papers. This is really how you can unlock the maximum amount of research that goes into a paper. It’s not just a static figure on a page,” McLeod explained.
Partners, Programs, and ROI
The St. Jude Cloud originally launched as a partnership with Microsoft and DNAnexus, and those relationships are still strong and ongoing, McLeod says. He alluded to even further investments. “The partnership continues to grow, especially with Microsoft.”
St. Jude Cloud is free for academic researchers to use, though it’s not yet opened up broadly for commercial companies. Microsoft graciously sponsors over a petabyte of genomics data in Azure and neither St. Jude nor other researchers are paying for that copy of the dataset, McLeod says. The St. Jude Cloud Genomics Platform is still built on top of DNAnexus—which McLeod calls, “the best in the business” at security.
When outside researchers upload their own data, they pay for their storage. For the most part, when they run analyses, they pay for that compute. “Nobody is super thrilled that we have to pass on costs there, but there’s not a great scalable model for us to remediate that,” McLeod says.
PeCan Pie, the pathogenicity information exchange, is a free tool. And a new analysis tool is coming, McLeod hinted, that will also be free to the community.
Hundreds of researchers have accessed the genomics data and tens of thousands of users access PeCan each year, McLeod says. “Every time somebody comes to St. Jude Cloud and tries to use it—particularly when I sit down and talk with individuals about it—we learn how to share data better,” he adds.
“We not only do that for ourselves, but we talk with others in the community. The pediatric cancer community is relatively small… what we’re all trying to build is a future where these data can flow and can really bring about the next big discovery that can push the needle forward on one of these rare subtypes of cancer.”