Welcome to DNA.Land: Trait Prediction, A Breast Cancer Research Collaboration, And a Facebook Group
By Allison Proffitt
October 21, 2016 | “We’ve already entered the era of ubiquitous genetic information.” says Yaniv Erlich, a professor of computer science at Columbia and New York Genome Center. “We have millions of people that have been tested with one of the direct-to-consumer companies such as 23andMe, AncestryDNA, or Family Tree DNA.”
“DNA.Land is an experiment to crowdsource these genomes from individuals,” Erlich says. “We believe this is the future of genomic sciences.”
Along with Joe Pickrell (who is also working on the Seeq sequencing app) and Assaf Gordon, Erlich has built and online community where people who have DNA data from 23andMe, FamilyTree DNA, or Ancestry.com (so far) can upload their data, explore it, and contribute it to research.
And it’s catching on. A ticker on the DNA.Land website counts almost 32,000 genomes deposited into DNA.Land so far, and a Facebook group boasts more than 4,000 likes and several user posts each month. DNA.Land uses the Facebook group as a testing ground for new initiatives; a call for survey testers garnered volunteers within an hour. A DNA.Land user group meeting in held New York at the New York Genome Center had more than 100 attendees.
“Lots of people have some DNA data—either array data or whole genome data,” Erlich explains. “Instead of telling people, ‘Please upload your data to a website and we’ll use it for science,’ we thought, maybe we can actually give something back to people, we can reciprocate… if we reciprocate, this will actually drive and motivate people to contribute their data.”
The trick for Erlich and his team is to find the right carrot.
Last week DNA.Land rolled out trait prediction, weighing in on users’ likely coffee consumption and educational attainment under “Wellness Traits” and eye color and height under “Physical Traits.” In order to “unlock” the trait predictions, users must take phenotype surveys.
“If you want to see a trait, you must take a survey,” Erlich said. “One of the main parts of the project is to keep them involved, to convert them from human subject to research participant. They’re all part of this project.”
But lest this start sounding too much like a clinical trial, DNA.Land is staying well away from health traits, Erlich said, focusing only on “wellness” traits. “We’re not trying to diagnose diseases, and of course we tell participants that this is just for education and self-care. There is no medical value in what we do here.”
On the DNA.Land blog, Richard Aufrichtig, DNA.Land’s User Engagement Coordinator, and Jie Yuan, a Computer Science PhD student in Columbia University wrote, “Trait Prediction is our attempt to predict real-word characteristics based on genomic data. This feature harnesses the results of studies that have discovered associations between single nucleotide polymorphisms (SNPs), areas of the genome that vary between individuals, and biological traits believed to have a genetic component. The trait prediction menu page shows the available traits and your predictions.”
The Community Responds
Perhaps unsurprisingly, the comments on the DNA.Land Facebook announcement of the trait predictions were full of wrong findings: people who were told they were likely prolific coffee drinkers hated the smell of the stuff, and a woman likely be of less than average height stands at 6 feet tall.
Some people pointed out the errors almost gleefully. “Way off again!” said one user, “Chuckle... so far you only have one correct out of four,” said another. Other commenters, though, chastised the critics. “In the consent you have to agree to before getting your results, it says quite clearly that this is experimental and is not an exact science,” said user Debbie Bloor. “So why are you all acting as if the sky has fallen in?”
And other users enjoyed the process even in its inexact state, digging into their findings and looking for the why behind them. “It predicted me to be quite a bit shorter (10 cm too short) than my actual height, but I did make an interesting observation. Many of the SNPs that dragged the prediction in the wrong direction (too short) had the lowest "Frequency (European)" percentages... 2.49%, 6.16%, 5.47%, etc.,” observed Jessica Koury.
Commenting as the official DNA.Land Facebook page, the team took the errors in stride: “We need exactly people like you to improve the prediction in the future. We just reflect the results of the latest scientific knowledge on this traits!”
Breast Cancer Partnership
While trait prediction steers clear of health issues, DNA.Land has entered into a partnership with the National Breast Cancer Coalition to build a data repository for research a bit more health-focused that caffeine consumption.
“If you want to [look at] the genetics of breast cancer, it’s really hard to get these datasets from previous genome wide association studies. It’s nearly impossible… due to privacy concerns and other political reasons these datasets aren’t shared,” Erlich said.
So DNA.Land is letting the National Breast Cancer Coalition crowdsource data from its members. In particular, NBCC wants to look for genetic markers for breast cancer recurrence.
For DNA.Land members, participating is wholly optional. Users will start by contributing their genomes along with an NBCC survey to report phenotype. Users can consent to share their genotypic and phenotypic data the NBCC anonymously who can in turn share the aggregated data with breast cancer researchers vetted by the Coalition. Or DNA.Land users can consent to share their data and their email address to be contacted by breast cancer researchers.
“This will be the first time that we will take the DNA.Land data… and we will share it with the rest to the world through the National Breast Cancer Coalition, but in a way that respects the choices of all participants,” Erlich said.
The Next Frontier
Erlich, of course, has some interesting plans for how social media and health can and will link. What else would you expect from a researcher who has broken into a bank and tracked down Craig Venter’s ex-wives via his genome.
“What is the future of medicine, really?” he asked. Genetics is one aspect, of course, but Erlich is particularly interested in how our digital lives intersect with our physical health. “There are several types of language research that show that you can take these digital interactions and find medically relevant phenotypes in there.”
He cites a June 2016 paper published in the Journal of Oncology Practice from Microsoft and Columbia researchers that used web searches to diagnose pancreatic cancer (doi: 10.1200/JOP.2015.010504), and a PNAS paper from 2013 that used Facebook likes to predict personality (doi: 10.1073/pnas.1218772110).
“This could be the future of medicine, and a way to really predict your health,” Erlich said.
For it to work, of course, Erlich stressed that any digital observation must be done with the express consent of the individual. But apps and programs could scan Facebook or your mobile phone search history to make predictions and observations that could align with health. If you share your genomic data with the application, predictions could be even better.
This isn’t far off. Erlich said that a simple Facebook app could read your public Facebook posts, and look for connections—data Facebook already scans and uses to deliver advertisements, he points out.
The first step is the carrot.
“First we’re going to do something fun with the data for you for your curiosity,” Erlich said. “What we’re exploring right now, we’re going to use IBM Watson for text-to-personality prediction. We’re going to look at your posts on Facebook and predict your personality. That’s just a fun report for you.”
The next step gathers a bit of research for Erlich’s team: “And then we’ll use this data to do some research and to see if we can see any correlation between your genetic components and the phenotype that we can devise from your Facebook information.”