Washington University Rolls Out Open Source Epigenomics Browser
By John Otrompke
September 8, 2014 | The Roadmap Epigenomics Project, initiated in 2008, is now beginning to yield new tools in the field of epigenetics, including a browser intended as a complement to the popular genome browser tool at the University of California Santa Cruz. At the Biology of Genomes conference at Cold Spring Harbor this May, a workshop entitled “The Complete Epigenome Browser: Visual Integration of Distributed Epigenomics Resources” gave attendees the opportunity to try out a faster and more user-friendly method of performing epigenomics analysis, according to Ting Wang, PhD, who hosted the presentation. A new paper describing the browser is already under review, explained Wang, who is assistant professor in the department of genetics at Washington University School of Medicine in St. Louis.
The workshop on the browser complemented the completion of Phase I of Roadmap, an NIH-funded initiative to create a reference set of cell types demonstrating how the genetics of any given cell change in the context of environment. While the work of the Human Genome Project has spurred significant scientific advances, scientists also need epigenomic data to understand the relevance of genes. “We need to have one epigenomic map per cell type, showing the effects of variations such as DNA methylation, histomodification, and open chromatin structure, for example, especially for non-coding genes,” said Wang.
The first phase of the Roadmap project focused on compiling a dataset for a healthy human genome, and was intended to kickstart epigenomics research at different institutions into specific disease states. In conjunction with this phase, Wang and his colleagues had set out to create an open source browser for epigenomic data, now called The WashU EpiGenome Browser. The project has been funded through Roadmap for about $250,000 per year for several years.
An epigenome browser adds capabilities to a typical genome browser, allowing researchers to browse the epigenomics data that has been gathered by the Roadmap project and the related ENCODE project. “The genome is just the DNA sequence which is in every cell of the human body, but the epigenome shows why genetics are still different from one cell to another, because there’s lots of tissue and cell type specificity,” said Wang. For example, the Roadmap and ENCODE projects have gathered data on DNA methylation, histomodification, and open chromatin structure. The WashU Browser hopes to make all that data conveniently accessible and searchable by researchers.
“As a consortium we have already generated a large amount of epigenomics data across multiple human tissues and cell types,” said Wang, noting that the total number of datasets is already more than 10,000. This presented a design challenge for the browser, he added, because “if you put all the data in the database you can quickly crash the system. So we decided to try this system where data is distributed. We designed this data hub cluster as a hierarchical data management system so that browsing is centralized. When you browse the data, the structure allows us to pull data from different resources, as if all the data were in your laptop.”
While many browsers to explore epigenomic data have been created over the years, most have disappeared, and the browser at the University of California at Santa Cruz, where Wang did his post-doc training, remains “the best and most recognized browser.”
“It’s great, but it doesn’t really grow well with this explosion of data,” said Wang. “We’re much, much faster than UCSC’s browser. The more data you pull, the bigger the difference.”
“I don’t want to say we’re a replacement for UCSC’s browser,” he added. “I want to say we’re a complement. You can go to the UC browser and look at a genetic variant, and then you can come to our browser and try to define the tissue specificity of that variant. There are also specific functions we provide that UC does not. For example, in our browser you can look at many arbitrary regions together, next to each other. But if you are looking at five regions distributed across the genome, if you are using the UC browser, you need to open up five windows.”
A recent Nature paper coauthored by Wang demonstrated another unique feature of the WashU Browser. Unlike ordinary genome browsers, this tool can visually represent chromatin interactions across long stretches of the genome. Wang says that information on chromatin interactions will be the first type of data made available as part of this project.
Wang and colleagues are still adding to the functionality of the browser, which is not yet cloud-based. “We haven’t put all the data on the cloud, because we can’t afford it, but we did put our browser software on the cloud, so any user can make a clone of our browser. We are considering working on advanced visual techniques, and we will also engage with specific disease areas.”
One area Wang’s team plans to work on is addiction genetics, which will also be a major focus of later phases of the Roadmap project.