Pacific Biosciences Releases Highest-Quality, Most Contiguous Individual Human Genome Assembly To Date
By Bio-IT World Staff
October 9, 2018 | Pacific Biosciences announced it has produced the most contiguous diploid human genome assembly of a single individual to date, representing the nearly complete DNA sequence from all 46 chromosomes inherited from both parents.
The sample used was derived from a Puerto Rican female who was a donor to previous population genetics studies such as the 1000 Genomes Project. The new assembly adds to a growing list of high-quality, population-specific human genome reference assemblies generated using PacBio long-read, Single Molecule, Real-Time (SMRT) Sequencing.
The new publicly available assembly (PacBio HG00733) has the fewest gaps of any human genome assembly, with more than half of the genome contained in gapless sequence at least 27 Mb long. The primary contig assembly is 2.89 Gb long and consists of 865 contigs that were assembled with PacBio data generated with the company’s Sequel System.
Using the FALCON-Unzip assembler, maternal and paternal haplotypes were resolved over more than 80% of the genome. Maternal and paternal haplotype blocks were then further phased using Hi-C technology and the FALCON-Phase method developed in collaboration with Phase Genomics. The genome was then de novo scaffolded using Phase Genomics’ Proximo Hi-C platform, resulting in the first chromosome-scale diploid assembly of a single individual accomplished with only two technologies. More specific details about the assembly are included on the PacBio blog.
“This level of human genome resolution was not possible until now and is uniquely enabled by PacBio sequencing technology,” Michael Hunkapiller, Chairman and CEO of Pacific Biosciences, said in a press release. “Previous sequencing methods were unable to separate the sequence from the 23 chromosome pairs, resulting in human genome assemblies that were half the size and composited in a haphazard manner from DNA sequences inherited from the maternal genome and paternal genome, respectively. It is now possible to resolve haplotype sequences from each parent, resulting in the most complete view of a diploid human genome and the full range of an individual’s unique genetic diversity.”
"The Genome Reference Consortium has been using data generated with a number of new technologies including PacBio sequencing over the past 5 years as part of our efforts to improve the human genome reference assembly. This new achievement is a prime example of the level of quality that can now be achieved on a single human genome,” said Valerie Schneider, member of the Genome Reference Consortium, in an official statement. “We look forward to the availability of this and other highly contiguous, phased assemblies for the opportunities they offer in better understanding and representing human genomic diversity.”
The current version of the human reference genome assembly released by the Genome Reference Consortium (GRCh38) represents the chromosomes of the human genome as a mosaic haploid sequence that was derived from sequencing the DNA of more than 50 individuals and combining the data. By contrast, the PacBio HG00733 diploid genome assembly separately resolves the maternal and paternal chromosome sequences and includes diversity specific to the Puerto Rican population, which is rich itself in ethnic diversity.
SMRT Sequencing has demonstrated that any individual diploid human genome contains more than 20,000 unique structural variants (defined as ≥50 bp in length) and another ~400,000 insertions or deletion variants (ranging in length from 1 bp to 49 bp). Importantly, more than 80% of these variants are not currently accessible using short-read whole genome sequencing methods due to coverage bias, ambiguity in read mapping and inability to span large variants. In contrast, sensitive detection of these larger variants in human genome studies has been widely demonstrated using PacBio long-read sequencing. More than 40 global initiatives are currently underway to apply these de novo assembly methods to individuals representing multiple ethnic populations, thereby extending the diversity of available human reference genomes.