An Online Community for the World's Whole Genome Owners
By Aaron Krol
June 4, 2014 | There were lots of reasons to attend the fifth annual GET (Genomes, Environments, Traits) event in Boston this April: to hear George Church speak about reviving the woolly mammoth; get invited to the world’s largest family reunion in Queens, NY; discuss the genetics of sports; or just to donate facial mites to the North Carolina Museum of Natural Sciences.
Nathan Pearson, however, had come to bag some genomes.
As principal genome scientist at QIAGEN (which he joined during last year’s acquisition of his former employer Ingenuity Systems), Pearson had taken it upon himself to join together every fully-sequenced person he could find in a single online community, where members could act as both data donors and citizen scientists. This Empowered Genome Community, or EGC, would contribute to large genetic research projects that could not be imagined without a steady diet of whole genomes.
“The seeds of it got started when I went to the Understand Your Genome conference in 2012,” Pearson tells Bio-IT World. That event was sequencing giant Illumina’s first attempt to open its services directly to the public, offering attendees ownership of their whole genomes, which were sequenced in advance and loaded on an iPad app.
“I was talking with folks there about the insights they were getting back,” continues Pearson, when he ran into a familiar barrier. “They go into their genome data, and they may have high hopes to learn something about their health outlook long-term. And they quickly find that there typically is very little to say on that front yet… Those folks came to realize over time that their data were more useful scientifically than they were personally.”
As both a fully-sequenced individual himself, and a part of Ingenuity Systems, whose flagship Variant Analysis platform helps researchers assign functional meaning to genetic variation, Pearson was very familiar with the limits of genomic knowledge today. Barring a devastating single-mutation disorder, we simply don’t know much about the contributions our genes make to our overall health – a little about drug responses, maybe some tentative hints about a fractionally raised risk for heart disease. “That’s one of the first lessons people learn” after being sequenced, says Pearson, and many react by looking for ways to contribute to research so that future sequencees will find their genomes more informative.
QIAGEN has something unique to offer these individuals: the Variant Analysis platform, which is already built to round up genetic variants, provide detailed information about the genes they reside in and what they might mean for function, and compare them across as many genomes as one can collect. Free access to Variant Analysis, along with the chance to connect with like-minded peers, is the major draw of the Empowered Genome Community.
It was the chance to pitch the EGC to a large audience of whole genome owners that brought Pearson to this year’s GET event. (For an overview of the conference, see “Genetic Pioneers GET a Taste of the Science’s Imaginative Future.”) This annual two-day meeting is an offshoot of George Church’s Personal Genome Project (PGP) – the same project that provided Pearson with his own whole genome – and almost everyone in attendance had already been sequenced through that effort.
Pearson set up a lab with computers where PGP participants could pull up their genomes and start digging into them with Variant Analysis. He promoted the drive by giving out carob pods, a symbol of a project that takes a long time to grow, but could provide incredible fruits to the next generation.
“We had folks coming two hours early and staying two hours late,” Pearson says. By the end of the day, he had registered 46 new members – more than doubling the community’s population.
A Switchboard, a DIY Lab, a Genome Marketplace
The premise of the EGC is not to be an alternative data collection effort to the PGP or Understand Your Genome, which are already doing a fine job providing a growing number of people with whole genome sequences. In particular, Pearson has high praise for the PGP’s system for collecting phenotypic data on its members, which should help researchers draw connections between genetic variants and physical traits or health outcomes. “Madeleine Price Ball [the PGP Director of Research] has been amazing,” he says. “She’s put so much savvy foresight into how to gather genome data, and phenotype data, well enough to make it useful for research,” including a consent protocol purpose-built for genomic studies.
“We encourage everybody who joins the community, regardless of how they had their genomes sequenced, to join the PGP,” he adds. “It’s a ready-to-go effort.”
Instead, the EGC was envisioned as a meeting place where people who already have valuable data can connect with the research projects that need it. Members may consent to make their genomes publicly visible, or share them on a case-by-case basis. An EGC member interested in a particular trait, or a specific genetic variant, would be able to track down other members with the same genetic quirks and start building a study from the ground up. Alternatively, members might connect with established researchers who are already looking into their traits of interest, but haven’t found enough subjects for a statistically powerful study.
Either way, the idea is to overcome the massive time and cost commitment of sequencing new cohorts of individuals over and over again. A large enough set of pre-sequenced research volunteers, who can be contacted to gather new data and can even lend a hand as citizen scientists with some basic experience with Variant Analysis, could dramatically lower the barrier to entry for studies of complex, polygenic traits, which may be incrementally affected by dozens or hundreds of variants scattered across the genome.
The trick, of course, is reaching that tipping point. Last month, Pearson left QIAGEN to join the New York Genome Center, leaving behind some well-placed connections and an engaged membership, but little in the way of infrastructure at the EGC. The task of building the community out will now fall to Kate Wendelsdorf, a QIAGEN field application scientist who will spearhead the EGC project and serve as the community’s spokesperson.
Wendelsdorf was just recruited from the NIH Laboratory of Systems Biology this January – “I’m what they call fresh off the bench,” she says – but she’s been a long-time admirer of the PGP. “When I came on board and started talking to Nathan,” she tells Bio-IT World, “it turned out that the EGC is trying to take it a step further. PGP makes the data available, but what we’re trying to do now at the EGC is make the data usable.”
One of Wendelsdorf’s first priorities is to overhaul the communication systems in the EGC. Right now, she acts as a manual go-between for herding genomes to the right projects. “I contact researchers all the time in my field application work, and consistently they need controls, or they need different genomes, so I put them in touch with the PGP genomes that they can use,” she says. “I’m also the contact person for individual sequencees who have uploaded their data and want to get in touch with researchers.” That might be manageable now, with only around 80 members, but it’s not scalable, and it doesn’t leave much room for the independent growth of new projects that the EGC one day hopes to enable.
One idea for solving this bottleneck is to build a communication feature right into Variant Analysis. “We have this product in hand that’s user-friendly, that allows sharing, that allows one to bring all sorts of sophisticated tools to get information from this coalmine of data,” says Wendelsdorf – so why not also make it a hub for finding relevant human genomes?
The concept is that genomes would be tagged with contact information, with consent from the donors – either a direct line to the sequencee, or more likely, a researcher who is using that genome in her own work. Variant Analysis users could then search for genomes by variant, and immediately have a chance to reach out to other users with access to whole genomes that carry the same variant.
This tool would be available to EGC members, but also to commercial users of Variant Analysis. “For the individual member, it allows them to shortlist scientists who might be interested in working with them,” says Wendelsdorf. “From the researcher’s point of view, they may have a preliminary study of a few cases, a list of interesting variants, and then they want to pull other genomes from this universe of data.”
Wendelsdorf will also oversee the first research project in the EGC: a study of myopia, which was chosen as a complex condition that is well-described in the PGP phenotype surveys. There aren’t nearly enough members in the community yet to expect any big results, but the project is a chance to familiarize early members with the tools in Variant Analysis and best practices for variant hunting. QIAGEN is supplying members with webinars, introductory tutorials, and sometimes one-on-one instruction. The company is also thinking about building new training tools that are more geared to non-geneticists.
The hope is that this myopia study will be the last one QIAGEN runs from the top down. With a more experienced membership, and the input of professional geneticists who connect to the community through Variant Analysis, the EGC should quickly start generating ideas on its own. “Ideally,” says Wendelsdorf, “I’d like to see all [research projects] grow organically from EGC members.”
Taking the Long View
The biggest challenge for the community will just be attracting enough members to satisfy the huge research demand for whole genomes. The vast majority of whole genome sequencing today is done for specific projects, and the sequences are never given back to their donors – making it almost impossible to get consent for their reuse in later studies. Candidates for the EGC, who actually control their own data, are a much rarer breed.
QIAGEN’s goal is to have a thousand members by the end of the year, which Wendelsdorf acknowledges is probably more than the total number of people who own their whole genomes today. That means outreach to people who have already been sequenced by the PGP or Understand Your Genome, while a promising start, won’t be enough. Wendelsdorf is looking for new places to attract members, targeting people who want to participate in science whether or not they already own their genome data. She mentions community DIY labs, tech conferences, and museums as possible places to promote the Empowered Genome Community brand.
With luck, opportunities for interested parties to go out and get sequenced will keep growing. “I personally could see it exploding, and a lot of people starting to get their sequences done,” says Wendelsdorf. “It keeps getting cheaper and cheaper, and more and more companies are making glossier and glossier ads to do it.”
Even one thousand whole genomes won’t be enough to answer the knottiest genetic questions that have become the new frontier of personalized health. It would, however, be a powerful demonstration of the EGC’s potential, and a magnet for researchers looking for new ways to bump up their study populations. It would also provide the sort of broad member base that will help QIAGEN fine-tune the tools available to the EGC to meet the community’s needs.
“The EGC has very little software infrastructure to it,” says Wendelsdorf. “We want to be pretty hands-off at this stage, and wary of overfitting software features to what we think the community is going to want, and how we think the community is going to use it.”
It’s likely to be quite some time before the EGC can produce major research results. Some experts, like Broad Institute Director Eric Lander, predict that the genetic basis of common chronic diseases won’t become obvious until tens of thousands of whole genomes can be looked at together, with exhaustive phenotype data on the donors. But with studies of genetic health so splintered, Pearson and Wendelsdorf agree that the groundwork needs to be laid now to bring resources together, or the huge amounts of data being collected in sequencing labs around the world will never reach its full potential.
To Pearson, it’s about fulfilling the promise that today’s genetic pioneers hoped to find in their own sequences: a meaningful picture of what their genomes mean to them. “It’s going to take a generation-long effort to gather this big data from all of us, to make that single genome really reliably health-informative in the next generation,” he says. “But if we don’t start now, it will be two generations, or three.”