Standardized Interpretation Pipelines Will Be Essential For Genomic Medicine

Contributed Commentary By Anika Joecker

September 29, 2017 | Next-generation sequencing continues to offer new hope for scientists and clinicians focused on everything from rare disease to cancer. Success stories so far include solving decades-old medical mysteries, ending diagnostic odysseys, and helping select more effective treatments for many types of cancer. Still, the clinical application of genomics could use some crucial improvements— perhaps none more urgent than standardized automated tools for genome interpretation.

Too often, variant interpretation involves individual clinical geneticists scouring the literature for reports of every unknown variant detected in the exome or genome sequence of a patient. This approach incorporates a large dose of human judgment and is fraught with problems as basic as limited access to content. It is no surprise, then, that benchmarking studies (doi: 10.1038/gim.2017.14) have found this process lacks reproducibility, with different labs offering a range of different interpretations for the same set of genomic variants. This could lead, in the worst case, to several completely different treatment plans being proposed for the same patient.

Biology is entering the big data realm, and it is time to step away from entirely manual methods for clinical genome analysis. Standardized automated pipelines built on high-quality, curated data can improve the reliability of variant interpretation and accelerate the process to help patients. Automating much of the variant interpretation process saves time and allows clinical geneticists to focus their expertise on the most important components of variant interpretation – the final decision of a treatment plan.

The Variant Interpretation Process

Typically, variant interpretation in the clinical lab is a time-intensive and tedious process. After DNA sequencing of the patient or tumor sample, geneticists must review a filtered list of detected variants and classify them for clinical utility. For each variant, experts usually search public and private databases such as HGMD, COSMIC and ClinVar; they may also run queries in Google or PubMed to find papers citing the variant. When available, automatic variant functional prediction tools and machine learning may help predict the effect the variant will have on protein function or cellular processes. In parallel, geneticists consult population databases to determine whether the variant is common in a population. In some cases, they may trace the presence of a specific variant throughout a patient’s family tree to help determine how likely that variant is to be causative of disease.

In the end, these experts aim to classify variants using all available information. In that regard, variant classification is often based on guidelines made by international committees like the ACMG and AMP; however, sometimes a rule set is defined by the clinical lab itself.

As the literature search is predominantly manually executed, this entire process can easily take several hours to weeks.

While this approach worked reasonably well in the era of single-gene tests, the rapid shift toward exome and whole genome sequencing has put tremendous pressure on a process that was never intended to be scaled. For one thing, manually searching all relevant information for hundreds or thousands of variants takes much too long. Further, it is impossible to know and remember all pathogenic variants of several diseases, not to mention keeping up-to-date with new findings. For another, the flood of publications citing all those variants inevitably includes paywalled content that analysts may not have access to. There is also the challenge of thoroughness. In some cases, missing just one relevant paper could result in an inaccurate variant classification. Overall, the traditional/current approach lacks standardization, making reproducibility a real challenge.

Much of this analysis and interpretation workflow can be automated, using the same breadth of content for a more robust and reproducible analysis process. When done well, these tools use expert-curated data, a highly structured, comprehensive and constantly-updated database, and standardized variant classification guidelines to support the clinical geneticist in making an evidence-based interpretation. Further, it is critical that automated tools for variant interpretation link to supporting evidence, such as publications for each automatic classification. This provides physicians the opportunity to evaluate each variant classification, collect all the evidence in an automatically generated clinical report, and ultimately determine the best treatment plan.

Case Studies of Automated Pipelines for Improved Interpretation

A couple of recent examples illustrate how automated pipelines improve genome interpretation.

At the Center for Genomic Medicine at Rigshospitalet in Copenhagen, clinical scientists and physicians are using an automated and standardized pipeline for a precision medicine effort, seeking to match patients with metastatic or refractory cancers to more optimal therapies or to clinical trials. In a project including more than 700 patients, for whom standardized therapies had failed, automated and standardized genome interpretation led to the detection of driver mutations in 95% of all cases. In 40% of cases, an actionable variant could be identified (in this example, ‘actionable’ means the variant is pathogenic and disease-relevant, with a drug and/or clinical trial matching the patient’s genomic profile). In addition, in over half of those cases the patient’s treatment plan was modified to enable enrollment in a clinical trial that delivered either therapies targeted at the patient’s specific genomic profile, or that were less toxic and more effective than the standard of care.

Separately, at the Centre for Translational Omics in University College London’s Institute of Child Health, scientists implemented an automated workflow to identify variants in exome or whole genome data that underlie rare diseases in children. Already, the center has helped hundreds of families, and has found answers to dozens of previously unsolved cases. In one recently published case (doi: 10.1038/ejhg.2015.121), the team analyzed two children from Bangladesh and found that their complex phenotype — long thought to be representative of a single unknown syndrome — was actually the product of two different health issues. One was quickly associated with a mutation in PRX, a gene linked to peripheral neuropathy diseases, while the other was caused by environmental factors.

On the Horizon

As genomic data become more common in the clinic, it is imperative that clinical scientists and physicians use automated and standardized tools which employ up-to-date, manually-curated literature content. These tools should also support open communication and sharing of information among labs, so they can compare results and more readily determine the best treatment for a patient. Optimizing and standardizing the analysis and interpretation of genomic data for patient care means reducing the opportunity for human error, scaling for higher sample volumes, and ensuring that interpretation is robust and reproducible by basing it on a foundation of trustworthy scientific content.

Anika Joecker, PhD, is Director of Clinical Partnering Bioinformatics at QIAGEN, where she collaborates with clinical labs and Key Opinion Leaders on clinical NGS data analysis and interpretation. She can be reached at anika.joecker@qiagen.com.