Allele Frequency Community to Share Data on Population-Wide Incidence of Genetic Variants
By Bio-IT World Staff
February 24, 2015 | This morning, diagnostics and laboratory supply company QIAGEN announced the creation of the Allele Frequency Community, a network of academic centers and commercial groups who will share data on the genetic variants they’ve uncovered during human sequencing studies. QIAGEN will store the information and provide a bioinformatics infrastructure to access the pooled data.
In contrast to public variant databases like ClinVar, the Allele Frequency Community’s aim is not to manually interpret the clinical significance of variants. Instead, the goal is to offer members ready information on the incidence of variants across the general population. This information is often used as a first-pass filter to help decide whether an unknown variant might be responsible for a hereditary disorder: variants that are relatively common are unlikely to be disease-causing and can be eliminated from consideration. However, because different geographic and ethnic populations may have very different mixes of common alleles, these filters work best when a wide array of ethnic groups are represented in allele databases. By combining data from all members, the Allele Frequency Community hopes to overcome the limits of any single participant’s study populations.
Thirteen founding members of the Allele Frequency Community have together contributed data from over 70,000 human exomes and whole genomes to the project, with individuals from more than 100 countries represented. Data is deposited in the form of variant call files (VCFs), and scrubbed of metadata that could be used to identify either individuals, or the organizations contributing data. The community platform then annotates those VCFs with data on the prevalence of each allele across the entire shared data set.
While this service is free to all participants, members are required to share files before they can access the combined data set.
“The concept was that we would basically provide the forum and infrastructure to support that data sharing in a way that would be safe and compliant, and anonymized,” says Laura Furmanski, Senior Vice President of QIAGEN’s Bioinformatics Business Area. She adds that the idea for the Allele Frequency Community was first floated in conversations with customers last fall, and quickly gave rise to a firm architecture as more and more organizations volunteered to participate.
“It really got a lot of traction with the different founding members quite quickly,” Furmanski told Bio-IT World. “It created the spark to make this happen in a rapid timeframe.”
The thirteen founding members are an eclectic mix of organizations with various roles in exploring population genomics. Some group leaders, like Heidi Rehm, Director of the Laboratory for Molecular Medicine at Partners HealthCare, have been on the front lines of the genomic data sharing movement; others, like Eric Schadt, Director of the Icahn Institute for Genomics and Multiscale Biology at Mount Sinai, are better known for translational research into ways that genome-wide information can be used to inform patient care. One member group that sticks out is Laboratory Corporation of America, one of the country’s largest commercial providers of genetic testing — an industry that has had a spotty record in making its data available to others. LabCorp’s participation provides some assurance that data gathered during genetic testing of patients will help to inform future tests across the ecosystem of providers.
Similarly, QIAGEN is not requiring members to use its in-house informatics platforms to share their data. “This is a freely accessible community, so you don’t have to be a QIAGEN customer in order to participate,” Furmanski points out. However, QIAGEN is integrating the Allele Frequency Community data set into its own products, including Ingenuity Variant Analysis and CLC Cancer Research Workbench.
Those products joined QIAGEN’s portfolio with the acquisitions of Ingenuity Systems and CLC bio, respectively, in 2013 — moves that gave QIAGEN crucial informatics support for its forthcoming GeneReader instrument, a clinically-oriented next-generation sequencer in active development. However, the company has also lent its new bioinformatics muscle to projects in broad support of genomic science. In addition to the Allele Frequency Community, QIAGEN is using the Ingenuity Variant Analysis platform to host the Empowered Genome Community, a network of people who have been personally sequenced to collaborate on interpreting their own genetic variants, as Bio-IT World covered in “An Online Community for the World’s Whole Genome Owners.” (Nathan Pearson, the former Principal Genome Scientist at Ingenuity Systems who spearheaded the Empowered Genome Community project, has also pledged support to the Allele Frequency Community in his new capacity as Senior Director of Scientific Engagement and Public Outreach at the New York Genome Center.)
“We see a tremendous amount of value for our customers in creating these types of community forums, for sharing data and ultimately delivering insights from a research and from a clinical perspective,” says Furmanski. While the data collected by the Allele Frequency Community can be used to improve any pipeline for variant interpretation, QIAGEN is leveraging access to this data set in new platforms for clinical use, as the diagnostics company revamps its product line for the next-generation sequencing era. Those platforms include Ingenuity Clinical, a decision support system launched earlier this month for the interpretation of somatic cancer variants, which QIAGEN will continue to build out over the course of 2015 with capabilities in hereditary cancer and rare disease. Internally, Furmanski says, QIAGEN has already reduced false positive calls of potentially pathogenic variants by over 40% using the Allele Frequency Community data, helping Ingenuity users to more quickly narrow down lists of suspect variants.
As the Allele Frequency Community goes public, it will seek new member organizations who can contribute their own VCFs to the data sharing initiative, focusing particularly on new geographic regions or members with large data sets covering underrepresented ethnic groups. (Of the current members, eleven organizations are based in the U.S., one in the Netherlands, and one in Canada, although their combined data covers many more regions.) In the longer term, the Community plans to explore new data management strategies to better mine clinically relevant information from its pooled data set.
Laboratory directors can sign onto the Allele Frequency Community, and begin sharing and annotating their VCFs, at allelefrequencycommunity.org.