For Rare Disease, What’s Missed by Genome Sequencing Still Matters

Contributed Commentary by Sissel Juul, PhD, Oxford Nanopore Technologies

December 8, 2023 | For almost two decades, members of the life science community bemoaned something that most outsiders never realized: that the “finished” human genome assembly had never actually been finished. Now, thanks to remarkable efforts from the Telomere-to-Telomere (T2T) consortium, we finally have a complete human reference genome. It includes the 8% of the genome that had been impossible to sequence accurately despite many attempts to do so over the years.

What we have learned from the T2T assembly—and from many efforts in the past 20 years to resolve genomic regions that proved impenetrable to conventional sequencing technologies—is that these missing elements matter. Intractable regions encompass clinically relevant elements such as segmental duplications that appear to be associated with neurological disorders, repeat expansions underlying conditions such as fragile X syndrome and Huntington’s disease, and highly homologous regions that inform a person’s ability to metabolize certain medications.

Finishing the human genome assembly required the use of orthogonal tools, from long-read sequencing technologies and optical maps to short-read sequencing and Strand-seq. Even before the T2T assembly was produced, long-read sequencing had given scientists access to regions of the genome that could not be aligned accurately with typical next-generation sequencing platforms that produce reads of just a few hundred bases. For insertions, deletions, inversions, duplications, and other structural variants that often span a kilobase or more, the only way to get an accurate sequence was to generate reads long enough to cover the variant and its flanking sequence.

Early explorations of long-read sequencing revealed that the regions missed by short-read sequencing tools often contained clinically relevant information associated with rare diseases. These diseases may go undiagnosed and, without fully resolved sequence data, can even be undiagnosable. In recent years, though, scientists have made stunning progress in identifying the genetic causes of rare diseases with long-read data, increasing the diagnostic yield compared with earlier approaches such as whole exome sequencing or even whole genome sequencing based solely on short-read data.

In one particularly dramatic example, scientists at Stanford University aimed not just to identify variants associated with rare disease, but to do so in a clinically relevant time frame. They developed an ultra-rapid approach and tested it on two clinical research samples. This workflow homed in on the candidate variant for one case in less than eight hours, demonstrating the potential for future clinical application of this approach to diagnose rare diseases far more quickly than the usual years-long diagnostic odyssey. The authors reported, “We show that this framework provides accurate variant calls and efficient prioritization, and accelerates diagnostic clinical genome sequencing twofold compared with previous approaches.”

A study from researchers at the University of Washington deployed a targeted form of long-read sequencing with the goal of improving upon diagnostic rates for rare diseases seen with conventional sequencing tools. The study included 40 individuals, including 10 for whom there was no complete molecular diagnosis available based on prior testing. The research initiative identified all of the clinically relevant elements found with past analyses, but the long-read approach also delivered higher resolution that in one case led to an adjustment in clinical management. Among the 10 individuals without a clear diagnosis, long-read data was able to supply molecular answers for six cases and to highlight variants of unknown significance in two others.

It would be impossible to list every study that has yielded exciting new results for rare diseases based on long reads and rich sequencing data, but here are a few more for context: genotyping and methylation profiling of short tandem repeats associated with neurological and neuromuscular diseases; the identification of novel repeat motifs linked to cerebellar ataxia, neuropathy and vestibular areflexia syndrome; and uncovering cryptic structural variants (in this case, balanced chromosomal rearrangements) that appear to cause congenital aniridia, a genetic eye disorder.

While these results have been amazing, they also give the life science community a new mandate. Now that we know just how important these tough-to-access genomic regions can be, it should no longer be considered acceptable to rely solely on short-read data for any study related to rare disease. Supplementing short-read data with rich long-read or long-range information, either from the same or different sequencing platforms, will be essential going forward. Whether you choose to add data from long-read sequencing, genome or optical mapping, or other methods, it is easy enough to measure success: are you able to generate comprehensive sequence information for all genomic regions of interest, regardless of their sequence content?

When that threshold can be cleared, it will be possible to provide answers to scientific questions about rare disease that have eluded our grasp for decades, revealing new biology to transform human health. In the long run, the real winners will be patients who can quickly get a clear diagnosis that allows them to find the right physician for a precision treatment plan and possibly even join a rare disease community to help navigate their condition. For their sake, we cannot settle for less in our molecular analysis studies.

Sissel Juul, PhD, serves as vice president of emerging and commercial applications at Oxford Nanopore Technologies. She can be reached at Sissel.Juul@nanoporetech.com.