GenePattern Notebook Environment Needs Only A Web Browser To Run Analyses
By Allison Proffitt
August 31, 2017 | Researchers at the University of California San Diego School of Medicine and the Broad Institute of MIT and Harvard have launched a new GenePattern Notebook environment, which combines the dynamic capabilities of an electronic analysis notebook with the ease of use of a point-and-click interface to hundreds of genomic tools.
“The idea is to take the functionality of Jupyter Notebook, which makes it very easy to create complete research narratives—electronic versions, let’s say, of papers that include the text, graphics, multimedia, and combine them with the actual analysis—and extend that to include analyses that can be run by non-programming users,” explained Michael Reich, assistant director of bioinformatics at the University of California, San Diego.
The goal was to, “Create a tool that makes it easy for researchers to create and disseminate work that is 1) accessible, 2) reproducible, and 3) follows the ‘Beyond the PDF’ vision to make research an active, living document,” Reich told Bio-IT World.
GenePattern Notebook is an integration of the popular Jupyter Notebook environment with the GenePattern platform for integrative genomics. GenePattern has been available as a plugin for Jupyter Notebooks for some time, but the online repository now lets users access the functionality using only a web browser. Users have access to a growing collection of methods for analyzing gene expression, proteomic, SNP and copy number, flow cytometric, network and other data, as well as visualization and data processing tasks.
The GenePattern Notebook environment is hosted on the Amazon Cloud and can be accessed from any Web browser. “You can log [into your account] from any browser anywhere and see your notebooks and work with them,” Reich explained. Like Jupyter Notebooks, users build notebooks one cell at a time, adding text, images, or a mathematical formula, for example. GenePattern analyses can be added in the same way. “That gives you point-and-click access to hundreds of genomic analyses without the need to write code,” he said.
“Within a notebook you have the option to connect and to use the analyses on any available GenePattern server,” Reich said. “You are presented with a list of possible servers that you might want to connect to. We provide a default one—that’s the one most people will want to use—but you have access within a single notebook to a number of different servers… and all of the analyses available on any of them.” Some groups may choose to create a private GenePattern server—Broad, for instance, has a private server.
Using such a notebook keeps all parts of a research project together. “You’re building up your publication as you go along, in one place,” Reich said. GenePattern Notebook also allows users to share notebooks, letting other researchers follow the progress of the research and emulate it.
For now, notebooks are shared much like Word templates, Reich explained. Researchers can copy shared notebooks into their own accounts, edit or revise them. Adding commentary or collaborating on a shared notebook is not yet possible, but Reich assured me that it tops the “soon to add” functionality list. “We will be making it easy for researchers to collaborate on notebooks in real time, similar to the way that you can do that now on a Google doc. We’ll also be adding community features that allow the public discussion and annotation of notebooks,” This sort of public discussion capability is important for bringing peer review to the public, he believes.
Right now, there’s no provenance for notebooks, but all GenePattern analyses maintain their analytical provenance and Reich said the team is adding those capabilities into the notebooks. “Things along the lines of attribution and notebook-level provenance will be enabled.”
Reich hopes that GenePattern Notebook won’t just attract users who don’t code, but will instead be a unifying platform for researchers to create and share notebooks and experienced computational biologists to disseminate methods.
“We added functionality very specifically for people who already code,” he said. Python users can add variables and flow them into and out of GenePattern analyses. “This is also an excellent vehicle for disseminating new analysis methods, because the notebook format allows you to explain, in any desired degree of detail, all of the analytical considerations behind a method… you might be disseminating,” he added.
“It’s a tool for open science, for reproducible research, and it has benefits to offer any researcher working in genomics today,” Reich said.
GenePattern Notebook is freely available online at www.genepattern-notebook.org, requiring only a Web browser, or may be installed locally as a Docker image or Python package. The work is funded by grants from the National Institute of General Medical Sciences and the National Cancer Institute. The environment’s capabilities are described in a paper published last week in Cell Systems.