Stanford: ‘Clinical Grade’ Long-Read Genome Sequencing Is Here
By Deborah Borfitz
April 20, 2022 | Long-read DNA sequencing can be done at ever-increasing speeds and with significantly fewer errors than even a year or two ago, at last bringing the technology to the clinic where it can shorten the wait time for a diagnosis from weeks to hours. In fact, scientists at Stanford have used the technique in a “lighthearted competition” with their counterparts utilizing Illumina machines at Rady Children’s Hospital that have earned both teams honors by Guinness World Records, according to Euan Ashley, professor of medicine, genetics and biomedical data science at Stanford.
In the most recent record-breaking feat, the Stanford squad upped the ante by sequencing a patient’s genome in a snappy five hours and two minutes on a PromethION platform from Oxford Nanopore, an achievement ratified by the Genome in a Bottle Consortium of the National Institute of Science and Technology, says Ashley. Only months earlier, the Rady team bested its own speed-to-diagnosis record with a time of 14 hours and 30 minutes.
Guinness World Records, which has given its accolades for everything from the largest inflatable aqua park to the speediest assembly of Mr. Potato Head, no longer recognizes fastest genomic diagnosis—Stanford’s original starting-line goal—because it didn’t feel “qualified to judge” what is and isn’t a diagnosis, Ashley says. “In the end, strictly speaking, we set a new world record for the fastest DNA sequencing technique… but obviously we’re focused on how fast we’re able to make these diagnoses and the impact it has on patients.”
The sequencing Olympics was unknowingly sparked by Stephen Kingsmore, M.D., president and CEO of Rady Children's Institute for Genomic Medicine at Rady Children's, who worked with Illumina to build a pipeline reducing the diagnostic wait time back in 2012 to 50 hours, says Ashley. In the years that followed, Kingsmore incrementally ratcheted that down to new record-setting speeds using Illumina technology.
Yet in most real-world healthcare settings today, genome sequencing reports still take eight to 12 weeks to appear, he adds. A “rapid” turnaround only reduces the wait to around five to seven days.
Major Players
It has been nearly five years since Stanford researchers first used long-read genome sequencing to diagnose a patient’s rare genetic condition. At the time, they were using the technology of Pacific Biosciences, a pioneer in the field, he says.
Earlier, short-read sequencing with an Illumina machine was the standard approach to diagnosing a suspected genetic disease. But despite the falling price tag, the traditional technique doesn’t always capture the entirety of the genome, potentially omitting variations that point to a diagnosis.
In contrast, long-read sequencing preserves long stretches of DNA composed of tens of thousands of base pairs to uncover previously unseen variants providing similar accuracy and more detail for scientists scouring the sequence for variants, Ashley says. Structural variants (mutations that occur over a large chunk of the genome) are therefore easier to detect.
Two challenges—higher cost and higher error rate—are no longer barriers to widespread transition to the long-read sequencing approach, Ashley says. The Stanford team has carefully tracked the accuracy of short- and long-read clinical sequencing over the past decade and has been critical of some platforms claiming to offer “clinical-grade sequencing.” But the time has now come for “clinical application in the real world… the speed of innovation is amazing,” he says. “Every six months the error rate [improves].”
The summary F1 performance score of the PromethION long-read sequencing machine from Oxford Nanopore Technologies is 88% for indels (insertions and deletions) and nearly 100% for SNPs (single nucleotide changes), he reports. Variant calling using deep neural networks (Google’s DeepVariant) on a virtual machine equipped with GPUs from NVIDIA gives long reads an accuracy that’s on par with short-read technology.
The nanopore is a small protein with a hole through which a DNA strand passes. As the individual nucleotides pass through the pore the electrical current across it changes and that signal, a squiggly line that is referred to as a squigglegram, can be translated by a neural network into the As, Ts, Gs, and Cs of the DNA code, Ashley explains.
PromethION is the “first truly bedside” application of long-read sequencing technology due to its speed-to-diagnosis, says Ashley. Stanford scientists now plan to offer long-read sequencing with sub-10-hour turnaround to patients in intensive care units at Stanford Hospital and Lucile Packard Children’s Hospital Stanford. When deployed for undiagnosed patients in critical care settings, the ability to quickly identify a disease can spell the difference between life and death.
The machine contains up to 48 flow cells and Stanford uses all of them simultaneously to sequence a person’s genome, he says. If the condition is diagnosable, an answer can be produced well within a typical 12-hour nursing shift which, in many clinical situations, would be fast enough.
Arriving at the decision to make the final diagnosis happens on a Zoom call, usually in the evening and sometimes quite late, involving the bedside clinician, genetic counselor, and bioinformatician, notes Ashley. The CPU- and GPU-heavy work happening on the backend is indispensable, as is piping all the data to the cloud for analysis.
Speed + Accuracy
A paper describing ultrarapid nanopore genome sequencing in critical care at Stanford appeared earlier this year in The New England Journal of Medicine (DOI: 10.1056/NEJMc2112090). Lead author is postdoctoral scholar John Gorzynski, D.V.M., Ph.D., an ultra-endurance runner who personally took charge of speeding up the diagnostic process by having researchers sprint samples by foot from the bedside to the lab, says Ashley.
For the published study, the Stanford team showcased their mega-sequencing approach redefining “rapid” for genetic diagnostics—the fastest diagnosis was made in just over seven hours—without sacrificing accuracy. In less than six months, the team enrolled and sequenced the genomes of 12 patients, five of whom could be diagnosed in the time it takes to round out a day at the office.
The first diagnosis took the longest, 19 hours, mostly because all the genomic data caused a script to crash, Ashley says. The next two were made in 12 and 14 hours, respectively, and thereafter the team established a seven- to eight-hour average.
Importantly, the diagnostic rate was roughly 42%, or about 12 percentage points higher than the average rate for diagnosing mystery diseases, he adds.
Technical details of how the Stanford team implemented nanopore genome sequencing, including distributed cloud-based bioinformatics and a custom variant-prioritization approach, which recently published online in Nature Biotechnology (DOI: 10.1038/s41587-022-01221-5), reports Ashley.
Technical Challenges
To speed time to diagnosis with PromethION, the Stanford team first had to figure out how to process the data faster and rethink and revamp its data pipelines and storage systems. It was a task that fell to graduate electrical engineering student Sneha Goenka, whose idea was to “parallelize” across multiple compute towers.
Goenka found a way to funnel the data straight to a cloud-based storage system where computational power could be amplified enough to sift through the data in real time. Algorithms from Google and the University of California Santa Cruz then scanned the incoming genetic code for variants that might cause disease, and, in the final step, the scientists prioritized the patient’s gene variants according to their likelihood of causing disease.
The technical work involved reducing file size and improving read/write performance with “VBZ compression,” an algorithm developed by Oxford Nanopore for handling raw data in fast5-formatted files. The cron and rsync command tools were used for periodic uploads, and the file size was optimized for the tradeoff between number of parallel loads and latency overhead for each network connection for a new file, says Ashley.
In short, Goenka worked out the puzzle of how to adaptively divide the data and feed just the right amount of it to the right scripts, Ashley says. “It’s like making a jigsaw and putting it back together.”
Results of a 2020 Truth Challenge of the Genome in a Bottle Consortium exemplifies just how well the Nanopore machine can “scream out data” and, in real time, pipe it to the cloud for analysis, he says. The sample-to-diagnosis timeline broke down as follows: 4.2 hours for the wet lab work, 6.3 hours for dry lab piece (base calling, alignment, and variant calling), and 46 minutes for the curation period when associated genetic variants were identified.
The fastest runtime that ended in a positive diagnosis was seven hours and 18 minutes. By way of comparison, the Illumina protocol involved 90-minute sample prep, 11 hours for sequencing and base calling, 55 minutes for variant calling, and 10 minutes for variant prioritization.
In a case study of a teenager presenting with cardiogenic shock at Stanford, long-read sequencing succeeded in identifying a likely pathogenic variant in TNNT2 to support a diagnosis of dilated cardiomyopathy with a genetic cause, as reported recently in Circulation: Genomic and Precision Medicine (DOI: 10.1161/CIRCGEN.121.003591) The patient was urgently listed for transplant and received a new heart 21 days later.
In 2019, the Stanford team published a perspectives piece on the various available sequencing approaches, noting that at the time even the most efficient sample preparation process required several hours to complete (Clinical Chemistry, DOI: 10.1373/clinchem.2018.293506). Only a year earlier, in one of the first studies of its kind, the potential cost savings of rapid genome sequencing—in this instance, where Illumina technology was used—was found to reduce inpatient costs of caring for hospitalized infants in the NICU/PICU with a suspected genetic disorder by between $800,000 and $2 million.
Broadening Access
Ashley says the research team now wants to “spread the love” of long-read sequencing beyond Stanford. To that end, the software tools they have developed are all open source and available for use elsewhere. “The fact that the computation happens in the cloud means that if you have a computer connected to the internet anywhere in the world you can make use of those scripts.”
Oxford Nanopore Technologies is but one player in a broader field, of course, but a standout among the three major sequencing companies in that they’ve chosen to make the cost of the machine negligible and it’s free if enough flow cells are purchased, he continues. That could potentially “democratize” access to genome sequencing.
As has been recently discussed in social media circles, the biggest limiting factor currently is patients’ physical distance from a sequencing machine, says Ashley. In a rural area, the most time-consuming step might well be the shipping of samples rather than the sequencing or analysis work.
To enable truly universal access to real-time genomic surveillance will require sequencing machines at the bedside. The PromethION machine is “not hard to use, so potentially that part could happen even in a small hospital as long as the data [it generates] can make it to the cloud.”
Achieving the speed-to-diagnosis from long-read sequencing available to critical care patients at Stanford will require using all 48 independent flow cells, he notes. The important question for would-be adopters to consider is how fast tests need to completed, and at what point added speed becomes redundant.
“If it’s five minutes, is that really better than five hours?” he offers as an example. “I think in critical care minutes matter so we should make it as fast as we can, and at the bedside we don’t worry about being too fast.”