BioNano Teases Out the Genome's Structural Quirks
February 16, 2015 | The University City neighborhood of northern San Diego is ground zero for the gene sequencing revolution, home to the headquarters of Illumina, a near-monopoly power in reading DNA. Since 2007, Illumina’s sequencing-by-synthesis machines, which now produce over 90% of all DNA sequence data, have steadily improved scientists’ ability to zoom in to individual letters of the DNA code, reading the genome at incredible speeds and at single-base resolution.
Just five minutes’ drive up the road from Illumina headquarters, a smaller company called BioNano is trying to zoom back out.
“Illumina is continuing to launch amazing capabilities and new next-generation instrumentation, which increases the daily throughput of their systems for analyzing whole genomes,” says Erik Holmlin, BioNano’s President and CEO. “We’ve tracked that as well. In the last year, we announced a major development milestone, which was the ability to collect enough human data on a single chip in one day to assemble a whole human genome.” That’s a comparable rate to Illumina’s highest-throughput HiSeq line of instruments.
The difference is that BioNano doesn’t “sequence” DNA: its platform, called the Irys, doesn’t provide any information on the DNA letters that make up a genome. Instead, it’s all about “assembly,” sorting out exactly where different parts of the genome fit into the chromosomes in which DNA is organized. While genomes from two individuals of the same species will share virtually all the same large structural components, those pieces can be rearranged, duplicated, stretched or compressed, and sometimes deleted altogether. This is big-picture information that sequencers — especially the fast, accurate “short-read” sequencers produced by Illumina, which split the genome into many millions of tiny, manageable pieces — do a very poor job of capturing.
Changes to the arrangement of long DNA sequences, called structural variants or SVs, are a bit of a mystery to geneticists. Most of the genetic variants known to affect human health are very small: either single-nucleotide polymorphisms (“SNPs”), in which a single DNA letter has been changed, or short “indels,” in which one or a few letters have been inserted or deleted. These variants include the famous BRCA mutations, which can dramatically increase a woman’s risk for breast and ovarian cancer, and many mutations responsible for rare diseases like sickle-cell anemia and hereditary forms of blindness.
Yet the reason modern medical genetics has focused on these kinds of mutations isn’t that SNPs and indels are inherently more important than SVs. We’re just better at detecting them.
“Structural variation is a completely untapped area of genome biology,” says Holmlin, who points out that known SVs include many of the most important mutations in cancer, as well as the most common cause of hemophilia, often a poster child for genetic disorders. “If hematology and oncology is any example, genome biology is much more about structural variation than it is about single nucleotide polymorphisms.”
The SVs that are so central to these fields were discovered not by sequencing, but with older, more labor-intensive lab methods to look at whole chromosomes. BioNano’s aim is to give the same boost to SV detection that next-generation sequencers have afforded for small variants, letting scientists detect them rapidly, en masse and across the entire genome, without having to make any prior guesses about what they’re looking for.
“Irys is the only commercially available system that allows you to do this in large complex genomes,” says Holmlin. “There’s going to be a lot of different clinically interesting structural variations that will be discovered as a result.”
Strung Out
Tests for large SVs typically involve attaching fluorescent probes to key DNA sequences on a chromosome. For instance, if you wanted to test for duplications in a certain region of the genome, you might design a probe that binds with a signature DNA sequence in that region. Measuring the fluorescent signal will then tell you how often your sequence is repeated.
The Irys works on a similar principle, but without having to tailor the fluorescent probes to particular SVs. First, the genome is split into very large pieces: with a human sample, BioNano averages over 300,000 DNA base pairs per fragment. Then those pieces are tagged with a probe that binds to a seven-base DNA sequence, GCTCTTC, specially chosen to provide information about as much of the genome as possible.
“We picked [that seven-base signature] because it just turns out that it’s distributed throughout almost every single genome, very consistently,” says Holmlin. “These labels occur on average every 10,000 base pairs, and that’s enough to see most structural events.” A DNA sample enters the Irys studded with these fluorescent labels every few thousand bases. With one near-universal probe, the Irys can use the same reagents for almost any experiment, with a few exceptions if users want a closer look at areas of the genome where that seven-base sequence is unusually scarce.
While other techniques simply check the presence or absence of a label, or measure the intensity of the fluorescence, the Irys preserves the precise arrangement of its probes. It does this by forcing DNA molecules through extremely narrow chambers, which BioNano calls NanoChannels. The chambers are so thin that a DNA molecule can only pass through when stretched out into a straight string, at which point optical sensors can record the position of each fluorescent label. After Irys collects these optical maps for millions of overlapping fragments, BioNano’s software overlays them to produce a single assembly.
“You get basically a barcode of the entire genome,” says Holmlin. “And that barcode changes under structural variation. If the barcode expands, that indicates there’s been an insertion. If it contracts, that indicates there’s been a deletion.”
Typically, a BioNano assembly would be combined with sequencing to produce a full picture of an individual genome. That means a priority for BioNano has been to catch up with the pace of next-generation sequencers. It’s especially a concern because many of the company’s first customers have been big genome centers running huge banks of instruments to pump out human genomes on a daily basis.
In December, the journal GigaSciencepublished a paper written in collaboration with scientists at one of those centers, BGI-Shenzhen (formerly the Beijing Genomics Institute), demonstrating an Irys assembly of a whole human genome for the first time. BGI chose to assemble the YH genome, from a cell line donated by a Han Chinese individual who was the first Asian to undergo whole genome sequencing. Since then, YH has become one of the world’s most thoroughly characterized genomes through multiple sequencing studies, but large SVs — on the order of 1,000 bases or longer — have mostly escaped detection.
The BGI analysis using Irys found 666 large structural variants in YH’s genome, not counting apparent SVs that overlapped with gaps in the human reference genome (which would be difficult to characterize because there is no well-known sequence to compare against). More than 600 of these SVs could later be confirmed, either by cross-referencing databases of previously described SVs, by looking for subtle errors in the way a short-read sequencer covered the same regions, or both. While this study can say little about the medical relevance of the SVs detected — except that hundreds of them affected active, protein-coding genes — it does help illustrate how common these variants are, and how much is missed in genomics by passing over them.
Irys on the Ground
While the BGI study presents the first published whole human genome assembly with Irys, BioNano has done more than twenty of these assemblies internally. “We celebrate that of course, because it’s a major achievement, but it’s absolutely required to be relevant in genomics right now,” says Holmlin. “The field is just going toward whole human genome analyses across the board.”
And with the volume of data now available to geneticists, there has also come a growing appetite for better quality data. BioNano isn’t the only company exploring new ways of getting at structural variation. Pacific Biosciences, a competitor to Illumina, has been carving out a niche as the sequencing company that can provide reliable data across structurally complex regions. It does this using long-read technology, sequencing DNA in fragments as much as 100 times longer than an Illumina sequencer can handle. Oxford Nanopore is another sequencing company that plans to compete on read length, although they do not yet have a commercial product. Other companies, like Illumina-owned Moleculo and the independent 10X Genomics, try to bridge the gap by computationally stitching together short Illumina reads into synthetic long reads. (See, “10X Genomics Announces a High-Throughput Platform for Synthetic Long Reads.”) All are betting that SVs will soon be considered essential information for most DNA sequencing applications.
Holmlin says that the growth of these alternative technologies is good news for BioNano. Regarding PacBio’s sequencers, currently the most widely-used of these long-read systems, he says that “they’re able to see a lot of structural variations that are not identified by short-read sequencing, but the sensitivity starts to trail off right around that 5,000- to 10,000-base pair range.” By contrast, the Irys, working with fragments over ten times longer even than a PacBio sequencer, can pick up repetitive elements that might repeat a 2,000-base pair sequence fifty times in a row.
“PacBio data actually integrates more seamlessly with our data than short reads,” he says, “because it is already a pretty high-quality data type. So we look to all these emerging longer-read technologies as being great for us.” Holmlin adds that there has already been at least one human genome assembly that combined PacBio sequencing with a BioNano assembly, although that analysis has not been published to date.
So far, BioNano has only placed a little over 30 Irys instruments in the field. At $300,000 an instrument, the current Irys model is a comparable investment to a high-end sequencer, meaning potential customers have to weigh the structural information the Irys can provide against a significant boost to their overall sequencing power. (Running the device is a little cheaper than sequencing on a per-genome basis.) For those large genome centers and mid-size labs that have adopted BioNano technology, however, the instruments are being turned to some interesting purposes.
At the Genome Institute of Washington University in St. Louis, one of the key participants in the original Human Genome Project, the Irys is currently being used to improve the human reference genome itself, the gold standard against which new sequencing projects are compared. While the human reference is the single highest-quality assembly in existence, it’s not immune to problems with structural variants, and the scientists who maintain it are using a host of new technologies to try to capture these complex events. (For more on that project, see “The Hunt for a New Human Reference Genome.”) At the other end of the scale, Kansas State University’s smaller Bioinformatics Center is now using the Irys in support of the i5k Project, an effort to assemble whole genomes of 5,000 insect species, including disease vectors, agricultural pests, and beneficial species like honeybees.
And of course, BioNano has plans for noteworthy Irys projects itself. Holmlin says that one priority for his company is to demonstrate that the Irys can fill in information on telomeres and centromeres, the massive, highly repetitive regions at the tips and centers of chromosomes, respectively. “We have an internal project trying to put together the most comprehensive map of an entire genome, inclusive of all those regions, and that’s a project that’s progressing well,” he says.
Holmlin also looks forward to a time when the Irys, or an instrument like it, could be used in the clinic to help with disease diagnostics, just as next-generation sequencers are beginning to be applied in rare disease and cancer. At the moment, our medical knowledge of SVs is sharply limited, but Holmlin hopes that translational discoveries will soon take off in the same way clinically relevant SNPs were rapidly discovered when next-generation sequencing got off the ground. In the meantime, BioNano could get a head start by recreating existing cytogenetic tests for large mutations in cancer, hemophilia, and a handful of other indications.
“One of our top objectives this year is to demonstrate feasibility of a clinical cytogenetic assay that could be commercialized in the future,” he says. “That’s really where the total scope of Irys’ utility goes.”