The False Dichotomy of Short Reads vs. Long Reads

February 28, 2025

Contributed Commentary by Rosemary Sinclair Dokos, Oxford Nanopore Technologies 

February 28, 2025 | The rate of evolution of DNA sequencing technologies since the introduction of the Sanger method in the 1970s has been astounding. It’s just over 20 years since the publication of the first draft human genome and today, technologies can deliver that same information for as little as $100. Short-read sequencing, introduced by Solexa in 2006 and scaled by Illumina, created the market we see today, unlocking scale and cost that enabled a phenomenal understanding of genomics.  

As the understanding of genetic causality rapidly grew, mysteries remained and more than 10 years ago long-read sequencers began to fill the gaps left by short reads, characterizing those missing structural variants and epigenetic profiles and other elements that were invisible to previous technologies. Both groups of sequencing platforms have been essential in deepening our understanding of genomics. 

This long-versus-short divide has been baked into our mindset for more than a decade, but we are beginning to recognize that it is a false dichotomy. Researchers shouldn’t have to choose between short and long reads before they begin an experiment. It simply doesn’t serve the science.  

DNA molecules come in lots of different sizes, from the tiny fragments of circulating tumor DNA to megabase-long bacterial genomes to gigabase-scale genomes in humans. It might seem practical to sort our experiments to different sequencing platforms based on the length of DNA; however, our assumptions are based on an imperfect knowledge.  

Consider the area of circulating DNA, which has been characterized for applications such as cancer and prenatal testing. For many years now, scientists have assumed that these short circulating pieces of DNA are typically 200 bases long, making them ideally suited for a short-read-only approach. 

When scientists applied a long-read sequencer to circulating DNA, they found that the length is not limited to 200 bases. There’s actually a very long tail, out to about 3,000 bases. There are peaks approximately every 160 bases, which represents the wrapping of DNA around the nucleosome. Several groups now look at fragment lengths in circulating DNA as a potential biomarker in a space now known as “fragmentomics.” Subsequent investigations have revealed that the fragment length distribution of circulating DNA differs between healthy samples and cancer samples, and that combining that information with methylation data that can be used to tissue type each read could lead to an entirely new biological profile of cancer and health. 

Using a long-read sequencer on what conventional wisdom told us was a short-read application revealed novel biology and the possibility of a different type of biomarker that could have enormous impact on the rapidly evolving field of oncology. What other areas of biology might need to be revisited in this way? As a community, we are seeing many examples of users re-evaluating applications that have long been considered a fit for short-read data only. In transcriptomics today, scientists combine two different sequencing platforms to get the information they need, deploying short reads for counting and long reads for full isoform discovery. This is also the case in large population studies, where extremely high-quality single reference genomes are created by combining three platforms to generate high-quality ultra-long, long, and scaffolding techniques on a small number of genomes. Cohorts of challenging samples are sequenced on the more thorough (but, in some cases, more expensive) long-read sequencing tools and finally cheaper short-read genomes or exomes are applied to generate data for tens of thousands of other genomes. 

Sequencing today feels like computing did in the 2000s, with computers hooked into modems that you could plug into digital cameras to e-mail a photo. These technologies are stitched together to progress our knowledge, but they still require scientists to implement and maintain at least two different sequencing platforms, adding to overall laboratory costs and forcing researchers to divide their experiment into short and long reads. What will emerge is a single sequencing platform that could generate reads of any length. Users could choose low-pass sequencing or deeper coverage to maximize cost-effectiveness, but they would not have to compromise length, accuracy, or quality to get the results they need. 

If we could rewind the clock to the first human genome project and ask scientists about the ideal sequencing platform, I don’t think anyone would vote for a method where we chop DNA up into tiny pieces to re-assemble it. However, that is what we had, and the ingenuity of people made it work and deliver a groundbreaking moment. 20 years on, our sequencing platforms technologies have progressed enough that scientists should be able to demand a single platform that generates short reads, long reads, and anything in between. The ingenuity of our community will continue to unveil the next set of amazing discoveries as we unveil more mysteries of our genomes. 


Rosemary Sinclair Dokos serves as Chief Product and Marketing Officer at Oxford Nanopore Technologies. She can be reached at rosemary.dokos@nanoporetech.com