GenomePaint: Interactive Visualization of the Genome
By Allison Proffitt
February 9, 2021 | The computational biology team at St. Jude has published the latest on GenomePaint, one of the visualization features in the St. Jude Cloud platform, in the January issue of Cancer Cell.
GenomePaint is a web-based, interactive visualization platform for whole-genome, whole-exome, transcriptome, epigenome and 3D-genome architecture of tumor samples. Its use, outlined in the Cancer Cell paper (10.1016/j.ccell.2020.12.011), captures the inter-relatedness between DNA variations and RNA expression, supporting in-depth exploration of both individual cancer genomes and full cohorts.
The development of GenomePaint began in 2017 when a research team led by Dr. Zhang was compiling pan-pediatric cancer genomic landscapes for the NCI TARGET project (Clay McLeod mentioned it nearly a year ago while giving Bio-IT World updates on St. Jude Cloud), but the new publication reflects enhanced functionality for discovery, Jinghui Zhang, chair of the St. Jude Department of Computational Biology, told Bio-IT World, “especially how the epigenome data, including Hi-C, ChIA-PET data—the routine data—can be incorporated with whole genome sequencing data that could lead to new insights on the regulatory roles of non-coding regions we have not recognized in the past.”
Current visualization tools focus on variants that alter the function, dosage (i.e., copy number) of protein-coding genes, the team writes in the paper. But GenomePaint offers a suite of features designed for direct inspection of aberrant transcription that may be caused by regulatory non-coding variants, splice variants, and for in-depth exploration of genomic alterations of individual cancer genomes via integration of multi-omics data.
Users move between cohorts and single sample genomes with GenomePaint’s three interactive and interconnected views: Cohort View, Matrix View, and Sample View. Cohort view combines mutations from all samples over a genetic region, allowing to discover subtype-specific mutation patterns. Matrix view is gene-centric, allowing a user to assemble multiple genes or loci in order to compare their mutational patterns across a set of tumors, and associate the patterns with clinical outcome. Finally, sample view is tumor-centric, showing detailed omics data for the selected tumor, centered on the variant of interest. Sample view can lay out a cancer genome based on the pattern of re-arrangement, and it supports the exploration of long range interaction between a candidate non-coding regulatory variant and its target gene. Both sample and cohort view supports a dynamic range of genome visualization: users can explore at base-pair resolution and zoom out at various scales up to the entire chromosome. For the St. Jude team, in-depth exploration of single samples is particularly important because rare cancer subtypes account for more than half of pediatric cancers.
Currently, GenomePaint hosts somatic alterations from 3,854 pediatric cancers representing 16 histotypes. “For the GenomePaint server in St. Jude, we have the pediatric cancer dataset,” Xin Zhou, also of St. Jude Computational Biology and first author on the paper, explained. “We do not have adult cancer dataset yet, but we are exploring this potential by working with the NCI Genomic Data Commons.” Currently a user can view adult cancer data by first downloading from GDC and then visualizing as a custom track on GenomePaint.
The team will continue to update genomics data content and present release notes for datasets generated from published research studies as well as St. Jude’s prospective clinical sequencing program, which performs three-platform sequencing of whole genome, whole exome, and RNA sequencing for every eligible childhood cancer patient.
Users can bring their own datasets to the tool as well. “The strength of GenomePaint is to allow people to pull [together] a variety of data. Or even within the same dataset we can perform multiple analyses and pull these results together,” Zhang explained. Users can bring their own datasets either through DNAnexus or host them directly on Amazon Web Services, Zhou said. “They can have their files publicly exposed, then be able to link those files to the GenomePaint server in St. Jude to visualize their own data,” he said.
Uses and Surprises
In the Cancer Cell paper, Zhou and Zhang presented two novel discoveries using GenomePaint: disruption of the CREBBP RING domain as a novel driver mutation, and MYC activation caused by duplication of the NOTCH1 MYC enhancer, both in pediatric leukemia.
The second discovery surprised even the St. Jude team, Zhang said.
“The MYC enhancer duplication event that we presented in the manuscript coupled with the FISH [fluorescence in situ hybridization] analysis, it actually revealed something people never thought about before: what the role of enhancer RNA in the gene regulation process,” she said. GenomePaint revealed an enhancer RNA cloud. “People hadn’t recognized enhancer RNA actually can formulate a blob of cloud like that,” she said.
The finding prompted a lot of speculation even within St. Jude, Zhang said. Many researchers have wondered if this could be a general mechanism for enhancer RNA to contribute to the formation of the chromatin architecture. It’s under very active investigation.
“Before GenomePaint, we would not consider looking into the details of the enhancer RNA. Most people believed—including me at the time—enhancer RNA is just an artifact of the expression you expected from the enhancer activity. It’s a byproduct. That’s what we believed. Because GenomePaint can showcase the phased, haplotype-specific data, this led us to look into—initially just to verify what we observed—[this] blob of enhancer RNA popping up, and actually also closely interacting with the MYC DNA is really something that we did not recognize before.”
Next Steps: Integration, Global Comparisons
The development pipeline for GenomePaint is full. Zhou’s team is working to automate the process of building more sophisticated visualizations with user data and more closely integrating GenomePaint with the St Jude Cloud visualization community.
“One of the most attractive features… [of GenomePaint] is the rich features of different graphs it supports,” Zhang said. “It’s not just the standard BED format or bigWig format, but there’s also the 3-D genome, gene expression rank, RNA splice junctions, allele-specific expression—all these different graphs. To do the standard displays is very easy, but to do the more sophisticated ones, right now people have to ask Xin’s team to do the work for them in converting data.”
Zhou is also working on automated cohort-building. “Dynamically subsetting a large cohort into finer sets and then having some way to interrogate their DNA alterations or transcription differences between those sets—that’s an incredibly useful thing to do for cancer genomics and genomics in general,” he said.
Currently GenomePaint users can look at one locus at a time while St. Jude Cloud can generate global views of all available tumor transcriptomes in the form of tSNE plots. Now Zhou plans to link them.
“GenomePaint already manages all the very detailed, diverse types of molecular datasets for the same cohort of RNAseq tumors in St. Jude Cloud,” he said. “We’re working on the next step, which is to allow the user… to select [a] certain group of tumors and then be able to find out the most recurrently mutated genes in a group or even compare two groups,” identifying commonalities between them.
“We’re setting up the infrastructure by extending GenomePaint to achieve this very important goal,” he said.