New Framework For Developing Polygenic Risk Scores
By Allison Proffitt
March 25, 2021 | Last week a diverse research team published a 22-item framework in Nature that identifies the minimal polygenic risk score-related information that scientists should include in their published studies of polygenic risk scores.
Polygenic risk scores (PRSs) are increasingly used to communicate an individual’s complex inherited risk for diseases like Type 2 diabetes, coronary heart disease and breast cancer. These scores may be aggregated from genome-wide association studies and delivered to individuals by clinicians in a healthcare setting or by direct-to-consumer companies like 23andMe, Color, MyHeritage, and others.
But the application and reporting of these scores varies widely, hindering translation into clinical care, writes a team of international researchers in the Nature paper (DOI: 10.1038/s41586-021-03243-6).
“Now with so many scores being published and new methods being developed, we’re at the point in time where it would be really helpful to look under the hood of each of those scores that are published to understand the various components and important elements that went into constructing those scores,” Erin Ramos, deputy director, Division of Genomic Medicine at the National Human Genome Research Institute and an author on the paper, told Bio-IT World. “That’s important for a couple of reasons, most importantly for us to be able to replicate those scores and ensure that other groups can reproduce the findings of those scores. It’s also important for us to be able to compare the scores that are done for similar diseases—comparing apples to apples.”
In a collaboration between the Clinical Genome Resource (ClinGen), Complex Disease Working Group, and the Polygenic Score (PGS) Catalog, the authors present new Polygenic Risk Score Reporting Standards (PRS-RS) to better document PRSs.
From Risk Scores to Clinical Care
While some inherited diseases can be traced to errors in a single gene, most disease are more complex, with risk conferred by multiple gene variants as well as environmental risk factors—diet, stress, smoking, etc. PRSs are increasingly used to communicate an individual’s complex inherited risk for disease. Typically, information across many variants is combined by a weighted sum of allele counts, in which the weights reflect the relative magnitude of association between the variant and disease. These weighted sums can include millions of variants and are frequently referred to as polygenic risk scores. The authors argue that mature PRS models with potential clinical utility are available only for a few diseases including coronary heart disease and breast cancer.
This is not the first time risk prediction standards for PRSs have been published. The Genetic Risk Prediction Studies (GRIPS) Statement was published in 2011. The newly published framework is meant to update the GRIPS statement, the authors write, and more precisely outline what should be reported for a PRS study to be deemed rigorous, reproducible, and ultimately translatable.
This 22-item PRS-RS is an, “expanded and updated set of reporting standards for PRSs that addresses current research environments with advanced methodological developments to inform clinically meaningful reporting on the development and validation of PRSs in the literature,” the authors write, “with an emphasis on reproducibility and transparency throughout the development process.”
“We want the community to consider using this framework,” Ramos said. “They’ll address these 22 important elements in their publications, and also the paper encourages them to submit their scores to our collaborator, the [Polygenic Score] PGS Catalog. And then in that way researchers around the world can take a look at their scores.”
This level of transparency is essential for validation of PRSs and understanding how to apply them in the clinic, agrees Robb Rowley, program director of the Electronic Medical Records and Genomics (eMERGE) Network within NHGRI. “If you develop a polygenic risk score, and let’s say you use UK Biobank to develop it, but you don’t make it apparent that you use UK Biobank as part of your model development, [another group] could just be going back to the same dataset and validating,” he explained.
The authors broke the 22 items into six key components organized along the developmental pipeline for PRSs, they write. By beginning at the beginning, the authors hope to encourage documentation of the PRSs from the earliest stages of the study.
- Background: The study type, risk model, and predicted outcome should be documented. “As the PRS-RS is focused on clinical validity and implementation, authors must outline the study and appropriate outcomes to understand what risk is measured, what the purpose of measuring risk would be and why this purpose may be of clinical relevance,” the authors write.
- Study population and data: The study population should be well-characterized including how they were recruited, demographics, clinical characteristics, genetic and non-genetic data of interest. The authors also highlight the importance of careful handling of ancestry data. “There are often inconsistent definitions and levels of detail associated with ancestry, and the transferability of genetic findings between different racial and ethnic groups can be limited,” they write. They recommend that detailed genetic ancestry—and how it was determined—be included.
- Risk model development and application: “There are several commonly used methods to select variants that constitute the PRS and fine-tune their weights,” the authors write. All of the details about how the risk model was developed, iterated upon, the time-scale to which is applies, etc. should be described.
- Risk model evaluation: All findings from the evaluation of the model should be described including risk score effect size, sensitivity, specificity, positive predictive value and negative predictive value. “Any differences in variable definitions or performance discrepancies between the training and validation sets should be described,” the authors write.
- Limitations and clinical implications: “By explicitly describing the ‘risk model interpretation’ and outlining potential ‘limitations’ to the ‘generalizability’ of their model, authors will empower readers and the wider community to better understand the risk score and its relative merits,” the authors write. Authors should fully characterize the intended uses of their PRSs and justify the new model again existing alternatives. Finally,
- Data transparency and availability. “Information sufficient to calculate the PRS and the risk model(s) on external samples should be made freely available,” the authors write. They recommend that PRSs be deposited in the PGS Catalog “to facilitate reproducibility and comparative benchmarking. By providing these criteria in a structured format that builds on existing standards and ontologies, the use of this framework in publishing PRSs will facilitate translation into clinical care and progress towards defining best practice,” the write.
The team doesn’t expect this framework to be exhaustive or final. They predict supplementary frameworks to be developed and encourage researchers to develop further best practices. They also highlight how new technologies—for example, deep learning—may change the development and application of risk scores.
But still this is an important step, Ramos and Rowley say. “This is a great step to starting to get some clinical validity and start touching on clinical utility studies and really understand how these impact someone’s health,” Rowley said.