With Revolocity, Complete Genomics Eyes New Markets for DNA Sequencing
By Aaron Krol
October 23, 2015 | This June, Complete Genomics of Mountain View, Calif., revealed its first commercial DNA sequencer at a genetics conference in Glasgow. The Revolocity system, a massive, room-filling contraption equipped with robotic arms to shuffle DNA samples between its constituent pieces, is built to sequence 10,000 whole human genomes a year or 95,000 whole exomes. On sheer volume, it competes with Illumina’s HiSeq X series, the factory-scale sequencers unveiled last year as the answer to the industry’s long-standing call for a “$1,000 genome.”
In practice, Revolocity is not going to compete with Illumina. The big research centers who sequence genomes at these levels are already comfortable with Illumina technology. They’ve been using it for years, they’ve developed a large ecosystem of software tools to work with its raw data, and they’ve trained a generation of laboratory technicians to handle the instruments. While Complete Genomics’ proprietary method of reading DNA is one of the oldest proven approaches in the field ― the company published its first human genome in 2009 ― it does not have the same long history of global use as Illumina.
This was always going to be a problem for the reinvented Complete Genomics. When the company was purchased by Chinese genomics giant BGI in 2013, its only business was in sequencing-as-a-service, producing human genomes in house and on demand. That’s a business BGI was not interested in inheriting, says Complete Genomics co-founder and CEO Cliff Reid.
“They were already a services organization,” he says. “We’re three hundred people and they’re five thousand, so their services organization far outstripped anything we could do. What they needed from us was a technology vendor.”
Reid’s company could sequence human genomes at a competitive price and quality, but with Illumina data entrenched as the de facto standard, competitive wasn’t going to be good enough to win over the major genomics centers. So when Complete Genomics built a sequencer to read whole human genomes at a massive scale, they pitched it to a new group of customers: large healthcare providers.
“Those organizations were not in the market just a few years ago,” says Reid. “They aren’t genomics experts today. They certainly aren’t hardware and software and biochemistry and fluidics and workflow experts. They really are looking to the vendor community to provide turnkey solutions.”
The idea is that medical centers don’t want to play around with raw data, but would like to have reliable access to their patients’ DNA code. Revolocity is built for that purpose alone. Many of its features would be deeply unappealing to basic researchers: like Complete Genomics’ former service offerings, it only works with human DNA and only sequences at least whole exomes. Sample preparation is built-in and inflexible, so you can’t play with new methods and chemistries for niche experiments. You never see the raw data; the output of a Revolocity run is a VCF file, the standard format for lists of genetic variants.
“Think of this as a box that produces variants,” says Reid. “It’s a different kind of architecture for a large-scale sequencing system. You can put blood in one end and get variants out the other.”
“Sequencing Them Today”
So far, Complete Genomics has found three takers. Radboud University Medical Center, in the Netherlands, and Mater Health Services of Australia both signed on at launch; a third customer, the UK Epilepsy Society, announced its purchase at a large genetics conference in Baltimore earlier this month. When they receive their Revolocity systems in the first half of 2016, these customers will begin rapidly sequencing patients in their systems, building up a huge volume of in-house data. Most of that data is likely to go unused in the short term, but it could become increasingly informative as patients go through a lifetime of care and new discoveries about genetic health are made.
“Outside the U.S., the world is dominated by single-payer systems,” says Reid. “Patients don’t move from one insurer to another, so the idea of sequencing them today, and then using that data for the next few decades in addressing healthcare issues for those patients, makes a lot of sense.”
It may make sense intuitively, but it’s an unproven model of care. Only a minority of people carry genetic variants that are likely to influence their healthcare today, mainly those with rare genetic diseases or mutations that strongly dispose them to serious conditions like breast cancer. For this information, and the hope that more of the genome will become medically relevant in the future, Revolocity customers will be paying $12 million for the sequencing system, plus another $1,000 per genome in consumables, and a bit more in instrument and labor costs.
Complete Genomics can’t promise the kind of huge progress in precision medicine that would make this cost-benefit analysis a no-brainer, but it’s working hard to be the go-to supplier for any clinical organization willing to take the chance. Part of that is simplifying the workflow, accomplished by robotics that handle all the sample prep, and a single user interface for managing sequencing runs, monitoring those runs for quality, and collecting the data afterward. “We never needed to have a beautiful user interface when we were running our own system internally, because we could train people to run really ugly software,” says Reid. “The interface is a lot prettier now.”
The company will also be providing customers with full-time staff engineers for maintenance and support of the system, a necessity with a complex suite of instruments like Revolocity that replaces almost every hands-on process with automation.
But Reid also promises his technology will produce higher quality data than Illumina, or other competitors like Ion Torrent aiming for large-scale adoption in the clinic. “In a clinical setting, false positives are really painful,” he says. “We’ve always done higher quality genomes.”
Data Quality
It’s hard to evaluate that claim so far. None of Complete Genomics’ customers have Revolocity up and running, so we won’t see user data for at least a few months. But the company has shared a head-to-head comparison against an Illumina HiSeq X battery, running a whole genome sequence of the NA12878 human cell line, considered the gold standard for measuring the accuracy of sequencing pipelines.
Based on this comparison, Complete Genomics claims marginally better accuracy than Illumina on capturing single nucleotide polymorphisms (SNPs), or one-letter substitutions in DNA. SNPs are the best-studied form of genetic variation, and represent a large number of mutations known to be clinically significant ― but realistically, improving on Illumina’s accuracy with SNPs is not that much of a selling point, since Illumina is already very, very good at catching these variants.
More interesting are the figures on knottier sorts of variation, including small DNA insertions and deletions; copy number variants (CNVs), where a short stretch of DNA is repeated a variable number of times; and more complex small structural variants. For each of these cases, Complete Genomics reported fewer false positives on Revolocity than an Illumina pipeline, although their platform was also a little less sensitive, missing some genuine variants that Illumina picked up. (It’s also worth noting that this was only one possible pipeline for analyzing Illumina data, of many that have been developed.)
For data scientists, these results might be a little perplexing. Complete Genomics reads DNA in shorter fragments than Illumina, and normally the best way to catch these kinds of additions, deletions, and rearrangements in the genome is to read long stretches of DNA that span the entire variant. When Revolocity was announced, its unusually short reads were one of the biggest sources of concern for observers.
But Reid says this is more than compensated for by Complete Genomics’ software, which is embedded directly in Revolocity and was developed over many years of working exclusively with the human genome. First, he says, Revolocity has an exceptionally high depth of coverage, meaning that it works with more redundant reads over each region of the genome than its competitors. That has allowed the company to build accurate counting functions into its software ― essentially, estimating the length of a CNV based on the number of times the same read appears in the data.
Second, Revolocity has a fundamentally different relationship to reference data than standard analytics tools. Pipelines for Illumina and Ion Torrent data typically find variants by mapping reads to a reference ― essentially, matching reads to a pre-existing genome, and then spotting the differences. Complete Genomics uses an alternative approach it calls “local de novo,” where the reference is only used to match reads to a general area of the genome, in “buckets” of a couple thousand bases.
“The reference almost by definition is biased against structural variants,” says Reid. “It kind of presumes they don’t exist. So once we’ve sorted the reads by the reference, we throw the reference out.” Reads in the small, manageable “buckets” are then overlapped in a chain that spans their entire region, a small-scale version of the computationally demanding de novo sequencing that does the same thing across the whole genome. Local de novo sequencing is simple enough to do on a large scale, even with a huge number of very small reads, but doesn’t bias Revolocity toward agreeing with the reference. (Like all short-read sequencers, however, Revolocity is unable to capture very large structural variation.)
Clinical organizations might well find this appealing. As more attention is paid to structural variants, many are likely to prove pathogenic or otherwise clinically significant; it’s possible that these variants have a greater overall impact on health than SNPs, since they tend to cause larger changes to the structure of proteins. So far, however, structural variation is not well explored ― like many other aspects of clinical sequencing, recording these variants is largely a gamble on future knowledge.
Optimistically, Complete Genomics has identified a little more than 200 organizations worldwide who are potential customers for Revolocity, all covering very large, stable patient populations. Though a latecomer to the market as a products company, Complete Genomics may be coming in at the beginning of a shift toward high-scale clinical sequencing, and is trying to set itself apart as the only company fully catering to that market.
“We think that shift is underway right now,” says Reid. “More and more clinical organizations are interested in sequencing their patients, both to advance their healthcare, and also to continue to conduct clinical research.”
Meanwhile, in China, BGI recently released a lower-throughput sequencer based on Complete Genomics technology, for both clinical and more traditional research customers. Globally, genomics is a large and growing enterprise, and the markets that seem well-defined today may be only a small slice of the opportunity for DNA sequencing in the future.