Race And Ancestry: Distinguishing The Two For Genetic Studies
By Allison Proffitt
December 11, 2017 | In November, a team of researchers from the Keck School of Medicine of the University of Southern California (USC) Department of Translational Genomics looked at 127 patients of African descent and 591 patients of European descent, all with multiple myeloma, from the Multiple Myeloma Research Foundation’s ten-year CoMMpass study. The overall goal of the study was to elucidate differences in molecular alterations in multiple myeloma as a function of self-reported race and genetic ancestry, the paper authors wrote.
The findings were published in PLOS Genetics (doi: 10.1371/journal.pgen.1007087), and Bio-IT World covered the study.
During our conversation about the study, I asked first author Zarko Manojlovic to tell me a little bit about how race was genetically defined for the study. “That is actually a very great question,” Manojlovic said, laughing. “This is a perfect sort of over-a-couple-of-glasses-of-wine-or-beer kind of question.”
It’s also a very complicated question that quickly wades into social issues. “Basically, where do we draw the line between what you self-identify yourself as, and what your genetic ancestry says?” Manojlovic said.
Your self-identity often dictates your lifestyle, which then exposes you to social determinates of health common to the group with which you identify. “For instance,” Manojlovic explained, “if you are in a Southern state, African-American community you’re going to be eating a lot of soul food. We know that’s going to promote certain disease risks associated with that.”
But what if self-identity does not align with genetic markers found through whole-exome sequencing? Manojlovic said discrepancies between racial self-identity and genetic ancestry raise questions for healthcare and researchers. “Ok, they self-reported themselves as such. Are they really self-reporting themselves? Or is this a CLIA issue of data entry? If someone is keyed in as African American, but is 99% Caucasian ancestry, that becomes a very, very tricky situation!”
Researchers often define genetic races using the International Genome Sample Resource (IGSR) codes that represent 26 populations within the 1000 Genomes dataset. ASW represents Americans of African Ancestry in Southwestern United States; it’s the only IGSR code for African descent within the United States.
“If you look at 1,000 Genomes—which in itself is obviously not perfect, but it’s the best we have—we see a cutoff of about 24%-27% of Caucasian admixture found in the ASW population,” Manojlovic explained. “But that’s very much drawn from a very specific geographic region. You kind of have to take it with a grain of salt at this point.”
When they began analyzing data from the CoMMpass study, Manojlovic and his colleagues chose to only use patients for whom their genetics and self-reported race matched. They used a population stratification principal component analysis to cluster patients by extracting 4,761 Ancestry Informative Markers SNP genotypes derived from germline whole exome sequencing of CoMMpass IA9 data. They also used Structure, a software package for using multi-locus genotype data to investigate population structure developed by Jonathan Pritchard of Stanford, to determine individual percent ancestry for each CoMMpass case.
Their approach revealed three mismatches: two patients that self-reported as Caucasian and had greater than 55% African ancestry, and one self-reported African American had 99.9% European ancestry.
There are many reasons why a person’s self-reported race might not align with their genetic race. Manojlovic recounts his personal clinical experience where patients purposefully self-report as Caucasian because, “there’s sort of an underlying belief that they might get better treatment if they do so.” The latter case, Manojlovic said, raises warning flags of possible data entry error, but previously published studies have documented some self-identified African Americans with up to 99% of European ancestry.
In a 2015 Human Genomics paper (doi: 10.1186/s40246-014-0023-x), Mersha and Abebe, write: “In African American or Latino populations, self-reported ancestry may not be as accurate as direct assessment of individual genomic information in predicting treatment outcomes.”
Initially the paper was written based on the self-reported race groups, but then the researchers chose to exclude the three patients from their study for whom self-identity did not match genetic race. “Two samples in 120 could definitely inflate the study,” Manojlovic said. “We had to be very careful because we didn’t want those samples to influence our outcome: is this a data error or is this true self-identification,” Manojlovic said.
But he couldn’t help but analyze the dataset with and without their data to see if there was a difference. “I redid the analysis with them in the study and without them in the study… and there were no statistical differences. They didn’t inflate the study to the point to where we purposefully took them out to make a point.”
In the CoMMpass dataset of 721 patients, only three self-reported a different race than their genetics suggested. But Manojlovic sees opportunities for follow-up studies with larger, but similar cohorts. Does genetic ancestry have a greater influence over health than self-reported race? Can you rule out data entry errors? These are big questions and worthy further study.
Manojlovic and his colleagues found that race changed the outcomes for multiple myeloma patients. African American patients are three times more likely to be diagnosed with this type of cancer and twice as likely to die from it than Caucasians of the same age and gender. The researchers identified new cancer gene mutations that are significantly more prevalent in African Americans, linking these genes to multiple myeloma for the first time.
“We in the cancer genomics community have a responsibility to ensure that our studies represent true population diversity so we can understand the role of ancestry and biology in health outcomes,” said John D. Carpten, Chair of the Department of Translational Genomics and senior author of the paper in a statement when it was first published. “The new candidate myeloma genes we identified in the African American population may have been overlooked because of the lack of diversity in previous genomic efforts. There are clearly molecular differences between African American and Caucasian multiple myeloma cases, and it will be critical to pursue these observations to better improve clinical management of the disease for all patients.”