Genome in a Bottle Uncapped
By Aaron Krol
May 21, 2015 | Creators of DNA technologies now have an important new tool for understanding how well their instruments and tests are performing: a DNA sample designated as a “reference material” by the National Institute of Standards and Technology (NIST). The DNA, originally donated by a Utah woman in 1980 for an unrelated study, is thought to represent the best-characterized diploid human genome in the world, thanks to the efforts of the Genome in a Bottle Consortium, which was sponsored by multiple federal agencies in 2012 with the aim of producing near-perfect genome sequences for public use.
“We realized at the time that there was an interesting problem in next-generation sequencing, in that there is no single genome that has a reference sequence,” says Elizabeth Mansfield, Director of Personalized Medicine at the FDA, which participates in and helped finance the consortium. “We wanted to actually create a physical sample for which we knew a great deal of the truth sequence.”
That “truth sequence” has been painstakingly compiled over a period of years by the Genome in a Bottle Consortium, which repeatedly sequenced DNA from a single cell line, known as NA12878, using a wide variety of technologies and computational pipelines. Almost 80% of this cell line’s genome is now known with high confidence, including a very comprehensive set of SNPs and indels, two types of small genetic variants.
Last week, NIST, whose reference materials are used in fields as diverse as food safety and nuclear physics to measure properties against a defined standard, announced that a batch of this DNA will serve as the first genomic reference material. The DNA is available for purchase by laboratories and technology providers who want to compare their sequencing results to those of Genome in a Bottle. These comparisons will test the overall accuracy of users’ pipelines, and help uncover any biases that could be corrected.
“We’re hopeful that it’s a tool that raises the bar for everybody, and gives people something quantitative that they can use to objectively evaluate metrics,” says Marc Salit, who leads the Multiplexed Biomolecular Science Group at NIST.
A Recognized Standard
Clinical laboratories and bioinformatics groups have already used Genome in a Bottle’s variant set, which was published last year, to validate their workflows. The NA12878 cell line is commercially available and has become a standard in the genomics community ― partly because of its history of use in important early projects like HapMap, and partly because the donor’s large family allows for interesting comparisons with the genomes of close family members.
However, the DNA used as a reference material has better quality controls than ordinary batches of NA12878 cells. To create its reference samples, NIST ordered a single large batch of NA12878’s DNA, amounting to around 8,000 10-microgram vials, a process intended to control for new mutations that might spontaneously arise in the cell line.
“Since [the reference samples] all came from the same batch, and because we’ve also sequenced multiple vials, we believe that these are very likely to contain the same DNA,” says Justin Zook, a member of NIST who has played a lead role in the Genome in a Bottle project.
The release of the first NIST reference DNA is a major milestone for Genome in a Bottle, and could impact the way new genomic technologies are designed, validated, and even regulated. Currently, with few recognized standards to measure the accuracy of genomic testing, companies submitting new instruments and tests to the FDA have to coordinate with the agency on a personalized plan to validate their products. As clinical genomics becomes a standard part of American medical practice, both providers and regulators would benefit from clearer expectations for submitting this technology to the FDA.
The need for clear standards is heightened by the FDA’s plans to become more involved in the oversight of clinical laboratories, which perform a majority of genomic testing in the U.S. and operate in a different ― and typically more lenient ― regulatory environment from mass market diagnostics companies.
“Right now we’re looking at developing a new regulatory pathway for next-gen sequencing as part of the President’s Precision Medicine Initiative,” says Mansfield, adding that “it’s possible we would write guidance to make [the NIST reference DNA] a special control.” The reference material could be especially useful when looking at sequencing instruments or tests that interrogate multiple genes, as opposed to highly targeted tests, because it can be used to validate sequence across distant areas of the genome.
In fact, the well-characterized NA12878 genome has already played an important role in FDA regulation of next-generation sequencing. Only one sequencing instrument, the MiSeqDx manufactured by Illumina, is currently cleared for clinical use, and part of the validation process for that device included sequencing NA12878, although the high-quality Genome in a Bottle sequence was not available at the time for comparison.
New Horizons
Using the reference DNA is voluntary, and is likely to remain that way for the foreseeable future. “We’re very interested to see these reference materials play a role in the optimization of commercial sequencing technologies,” says Salit, but he adds that it’s not in NIST’s best interest to become part of a standard pipeline for FDA submissions. “We don’t want to be on a treadmill or a conveyor belt where we’re constantly supplying the world with reference material if there’s a better way.”
Meanwhile, Genome in a Bottle has plenty of work still ahead of it. Continuing research on this genome will focus on large structural variants, which are much harder to characterize than SNPs and indels. “We just released an initial version of structural variant calls that we believe are very likely to be accurate, but they’re not comprehensive,” says Zook. At the Global Alliance for Genomics and Health, a standards-setting organization, Zook also chairs a “benchmarking” task team, where he is helping to create a central web portal where users can compare their sequences of NA12878 with the Genome in a Bottle consensus, and receive a uniform measurement of their performance.
Work is also continuing on additional reference samples. Four new genomes, provided by Han Chinese and Ashkenazi Jewish donors who have participated in Harvard University’s Personal Genome Project, will be characterized to a similar level of quality to the current NIST reference. The hope is that making more reference DNA available will prevent labs and providers from inadvertently biasing their methods to work specifically with a single reference genome.
“It’s definitely a concern that putting out one genome would be insufficient,” says Salit.
Even one thoroughly validated human genome, however, is a big step up for creators of genomics tools, who until now have had to gauge their accuracy through proxy measures ― like consensus with other tools, or results that are internally consistent. This has made it difficult to account for systematic biases that might repeatedly give the wrong answers.
“One of the things we hoped when we started out was that this wouldn’t just be valuable for our direct regulatory purposes, for looking at submissions, but that this could accelerate development of new instruments and informatics tools that can be designed from the ground up with a known material in hand,” says Mansfield.
That should offer everyone a little more assurance that DNA tests are able to make the right variant calls in real-world conditions ― something that will be an urgent concern for the FDA as genomics starts to change the way patients are diagnosed and treated.