PacBio Releases Computational Tool for Parsing Pseudogenes
By Bio-IT World Staff
March 24, 2023 | PacBio last week announced a new informatics method that genotypes gene paralogs and pseudogenes with high accuracy. The computational tool, named Paraphase, enables variant calling, copy number analysis and phasing by identifying the full gene sequence of each of the haplotypes for all genes and pseudogenes of the same gene family. Many medically-relevant genes fall into segmental duplications and thus have highly similar gene family members or pseudogenes. The sequence similarity often leads to error-prone read alignment and variant calling.
“Through the use of Paraphase, we are able to identify the full sequence of each copy of a gene and, importantly, identify the number of functional and non-functional copies of a gene,” said Mike Eberle, Vice President of Computational Biology at PacBio in a press release. “This will allow researchers to conduct more accurate carrier analyses and provide a framework for studying the underlying genetics of these complex genomic regions. We believe that applying this method to larger, diverse, population data will ultimately enable researchers to better understand medically important problems, such as silent carriers for spinal muscular atrophy.”
Paraphase has been used on several medically-relevant genes with highly similar paralogs or pseudogenes, including, CYP21A2 (21-hydroxylase-deficient congenital adrenal hyperplasia), TNXB (Ehlers-Danlos syndrome), STRC (hereditary hearing loss and deafness), and SMN1 and 2 (spinal muscular atrophy). SMN1 is >99.9% similar in sequence to its paralog, SMN2, and both genes have variable copy numbers across populations. Mutations in SMN1 cause spinal muscular atrophy (SMA), a leading cause of early infant death.
High throughput detailed sequence analysis of complete genes is challenging using existing technologies, and identifying silent carriers (having two copies of SMN1 on one chromosome and zero copies on the other, accounting for 27% of carriers in African populations) is impossible without pedigree information. In a recent peer-reviewed publication, Paraphase was able to detect these pathogenic variants for SMA. The study also identified major SMN1 and SMN2 sequence haplogroups and characterized their co-segregation through pedigree-based analyses. In addition, the authors identified a pair of haplotypes that can serve as a genetic marker for alleles carrying two copies of SMN1 in African populations, demonstrating the potential of haplotype-based screening of silent carriers.
Paraphase is being extended into a genome-wide generalized paralog caller as more highly homologous genes are included. Paraphase works on whole-genome sequencing and hybrid capture-based enrichment data. It can also be adapted to work with amplicon sequencing data when the full regions of interest are captured or amplified.