PacBio's Juliet Minor Variant Software
By Bio-IT World Staff
May 8, 2017 | At PacBio’s SMRT scientific symposium and developers meeting last week held at the Leiden University Medical Center in the Netherlands, Lance Hepler, with PacBio, announced three apps coming in the next SMRT Analysis release: minor variants, structural variation, and multiplexed microbe WGS.
Juliet is the new unified minor variant software pipeline, Hepler said. The pipeline is extensible for new disease areas and organisms, offers a targeted amplicon approach, with reference-guided, one-click analysis. Juliet can handle de novo codon variants now. Insertion and deletion variants are currently being ignored; support will be added in a future version.
The challenge, Hepler said, is to reliably identify 1% variants from sequencing noise. Juliet can do that, he said, with 6000 CCS reads. The false-positive and false-negative rates he reported are below 1% and 0.001% (10-5), respectively. At lower coverage, minor variants in the 5%-10% range can determined. Juliet can distinguish between minor variants and PCR heteroduplex. Official release of the Juliet pipeline will be in SMRT Link 5.0, though it’s available as-is on Github now.
Hepler, of course, touted the strength of long reads, especially for revealing structural variations. He quoted a Genome Biology story from 2011 that theorized that structural variants are the major limitation to better diagnostics from exome sequencing, and credited structural variations with diseases, traits, and evolutionary processes (10.1186/gb-2011-12-9-128).
PacBio discovers 20,000 structural variants in a human genome, compared to 4,000 revealed by Illumina sequencing, Hepler reported, combining several published findings from 2015 and 2016 in Nature, Nature Communications, and Genome Research (DOIs: 10.1038/nature20098; 10.1101/gr.214007.116; 10.1038/ncomms12065; and 10.1038/nature15394).
In fact, Hepler claimed, most structural variants in the human population remain undiscovered today. Increased sample sizes will yield higher rates of structural variant discovery, he said, but doesn’t require deep sequencing of every individual. Volume, not depth of sequencing, will reveal more of variants in lower frequency in the population.
Hepler presented models claiming that nearly 100% of variants in the population at 5%, 1% and 0.5% frequencies could be detected by sequencing 1,000 humans at 50-fold, given that reads are long enough to identify all variants. The model didn’t suffer much at just 5-fold coverage; only variants of 0.5% frequency fell to less than 100% sensitivity.
For microbial whole genomes sequencing, Hepler called PacBio the gold standard, and reiterated that multiplexing bacterial samples is the key to maximizing throughput, efficiency, and cost on the Sequel System.
PacBio has released an end-to-end workflow for bacterial multiplexing compatible with both the PacBio RSII and Sequel systems. Hepler recommended one SMRT cell for two microbes on the PacBio RSII system, and 12-16 microbes per SMRT cell on the Sequel System.