PrecisionFDA Consistency Challenge Will Benchmark the Basic Software Tools of Genetic Research
By Aaron Krol
March 4, 2016 | The FDA is hosting a “Consistency Challenge,” through its online precisionFDA platform, to test the accuracy and reproducibility of some of the most basic software tools used in genetic research. The challenge was announced by John Holdren, senior advisor on science and technology to President Obama, at a summit held at the White House on February 25 to discuss the national Precision Medicine Initiative.
The precisionFDA platform was built as a test bed for the analytical tools that process raw DNA sequence data into useful information. Launched in December, the platform includes gold standard “truth sets” of genomic data that users can try to replicate; compute environments for running analytical pipelines on large volumes of sequence data; and a venue to share workflows and results with the broader scientific community. While the platform is managed by the FDA, its development was contracted out to the private company DNAnexus. (See, “PrecisionFDA to Test Accuracy of Genomic Analysis Tools.”)
The Consistency Challenge will be the first initiative within precisionFDA to formally test some of the most popular pipelines in genomics. The challenge focuses on mapping and variant calling: the process of taking short DNA reads from a sequencer, aligning them to a reference map of the human genome, and identifying areas where the new data differs from the reference sequence.
This is the first and most fundamental type of analysis done in almost all human genetic studies. The end result is a Variant Call Format (VCF) file, which contains all the genetic variants found in an individual’s genome. (In practice, technical limitations mean that these files usually include only the smallest DNA changes, called SNPs and indels, and not larger restructurings of the genome.)
The standard tools for mapping and variant calling are among the most widely used pieces of software in the field, including BWA, Bowtie, and the Genome Analysis Toolkit.
Yet it has been difficult to test which of these tools are most effective. Many are open source and exist in a number of different versions. Computational biologists also mix and match tools to create their own pipelines, so inconsistent results may point to problems with an alignment tool, a variant caller, or both together. There are also many more, lesser-known tools that accomplish the same tasks.
To measure the performance of different pipelines, the Consistency Challenge will offer all comers an identical batch of raw sequencing data to process. This data comes from sequencing a well-characterized human cell line, called NA12878. Results will be compared against a gold standard VCF file provided by the Genome in a Bottle project, which has been exhaustively studying NA12878 for just these kinds of quality control purposes.
A Multi-Pronged Approach
Unlike the more advanced software that tries to tease out the effects of genetic variation on health and biology, the pipelines that produce VCF files have very clear outputs that can be measured fairly objectively. “This is not a very complex challenge,” says George Asimenos, Director of Strategic Projects at DNAnexus. “It works with data that people are familiar with, and the goal is to increase the participation in precisionFDA, and for people to get to know the platform.”
As basic as this initiative is, though, it’s been a challenge to put together.
Comparing and scoring VCF files is not a trivial task, because the ways scientists describe deviations from the “normal” human genome sequence are inherently ambiguous. For example, some variants are small deletions, where a single DNA letter is missing from the reference sequence. But if a deletion occurs in a string of identical letters―say, a sequence like GAAAAT―there is no right answer to the question, “which A is missing?”
That means that two VCF files, both correctly describing the exact same variant, might call a deletion at two different locations on the reference genome. A naïve software program could easily make the mistake of thinking one or both of those files is inaccurate.
“Early on, we decided that the best approach would be for the community to tell us what are the best practices for comparing VCFs, rather than the FDA imposing a particular methodology,” says Asimenos. With guidance from groups like Genome in a Bottle and the Global Alliance for Genomics and Health (GA4GH), the FDA chose a tool called vcfeval, designed by the company Real Time Genomics. The tool is published and open source, making it a transparent option for scoring pipelines.
Vcfeval has a context-aware method of reading VCF files, which lets it spot when two variants are synonyms. “It requires the whole sequence of the reference genome in order to do the comparison,” says Asimenos. “Practically, it seems to catch a lot of these corner cases.” Once it’s settled how many variants actually differ between a submitted VCF and the gold standard file provided by Genome in a Bottle, the program will also score pipelines on their accuracy. It will consider true positive variant calls, false positives, false negatives, and more nuanced statistical concepts like a pipeline’s positive predictive rate.
Yet the Consistency Challenge is not only about a single, absolute accuracy measure. “The FDA was very smart to include a reproducibility piece in the challenge too,” Asimenos says. Participants will be asked to run their pipelines a total of three times, on two different sets of sequence data from NA12878. While both sets use the same sequencing technology―the Illumina HiSeq X Ten system―the data was gathered at two different facilities.
By comparing pipelines’ performance across these two different datasets, the FDA hopes to offer a better picture of how experimental variables, like who is running a sequencer and in what conditions, might affect the end results. The challenge will also measure what happens when users take one of these datasets, and run their pipelines on it twice, producing two VCF files. Because these applications do a great deal of parallel processing, no two runs are exactly identical, and a big concern for the genetics community is how much that affects the overall analysis.
The integrity of genetic research rests on the ability to accurately call variants. By testing pipelines’ performance at replicating experiments, at working with data from different research facilities, and against an absolute standard, precisionFDA is taking a multi-pronged approach to judging that ability.
“A Collective Effort”
This is useful information for the FDA to have for its own purposes. Increasingly, the agency is being asked to study and approve genetic tests for use in diagnosing and treating patients, and to do that effectively, the FDA needs to know if the analytical tools that underpin those tests are precise and replicable.
Yet the Consistency Challenge is not just for the FDA’s internal use. Like an earlier web platform created by the agency, openFDA, precisionFDA is meant to share knowledge broadly and benefit researchers around the world.
While the precisionFDA cloud can be used privately, the Consistency Challenge is public―even the FDA can only see and evaluate pipelines that users choose to make publicly visible. Participants can upload VCF files they’ve generated in their own compute environments, along with descriptions of the tools in their pipelines; or they can run their entire workflows in the precisionFDA cloud, and publish all the software they used.
Either way, the research community as a whole will get to see how different pipelines stack up against each other. This should not only offer geneticists a better idea of which of the popular open source tools are most effective, but also help to surface lesser-known tools that may have their own advantages. “I hope that this will be a good opportunity for people who are developing new approaches and don’t necessarily have a medium to showcase those approaches,” Asimenos says.
This kind of thorough evaluation of foundational tools in genetics is badly needed, and a large number of stakeholders have stepped up to help with the effort. Some of these are public or non-profit groups, like Genome in a Bottle and GA4GH. But others are for-profit companies―like DNAnexus, Real Time Genomics, and Human Longevity, Inc., which provided one of the NA12878 sequence datasets―or large academic centers like the Garvan Institute, which also provided NA12878 sequence.
Asimenos lists half a dozen others who have contributed in one way or another to the precisionFDA project, including giants in the field like Illumina, 23andMe, and the Broad Institute. “At the end of the day, this is a collective effort, and I’m very grateful that all these people have contributed their data and have joined the website,” he says.
Wide participation will also be needed to make the most of the Consistency Challenge. The competition will be open through April 25 to anyone who wishes to enter, and will be the first of several precisionFDA challenges meant to evaluate the software tools used to make sense of the human genome.