FAIR Data At The Right Time: What Researchers Are Saying About Data Stewardship
By Allison Proffitt
April 23, 2018 | There’s a movement afoot to make data more findable, accessible, interoperable and reusable. The concept of FAIR data principles dates back to March 2016. In a comment published in Nature, a group of authors outlined “a concise and measurable set of principles that we refer to as the FAIR Data Principles.” The FAIR principles were unique from other open data initiatives that focused on the human scholar, the authors said, instead putting “specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.”
Last year at the Bio-IT World Conference & Expo, we hosted our first FAIR data hackathon. Three teams completed the Bio-IT Hackathon. The first-place team applied FAIR principles to Clinvar; the second-place team worked with the Foundation Medicine dataset; and the third-place team worked to “FAIRify” a personal dataset from 23andMe.
At this May’s event, Bio-IT World is hosting a 2018 hackathon with planned datasets on mouse sequencing and tumor profiling among others. We’ve also planned a FAIR Data for Genomic Applications track consisting of talks from publishing executives from Elsevier and Nature Genetics and detailed use cases and tools from AstraZeneca, Dell, Mount Sinai, and more.
Data Stewardship Landscape
Good data management shouldn’t be a goal in itself, the authors of the March 2016 Nature paper argued. Instead data stewardship is merely the conduit for discovery, innovation, and data use. And yet it seems this type of data stewardship is still lacking at many organizations for a variety of reasons.
“There’s a spectrum of difficult issues that we’ve never made systematic,” said Erik Schultes, FAIR Data Scientific Projects Lead at Dutch Techcentre for Life Sciences, in his lecture opening the Hackathon last May. “But we’re reaching a point in time where we need to start doing that… We’re starting to feel the effects of data overload and we’re having a hard time reusing data.”
But how much of a hard time? In a survey about data stewardship, Bio-IT World found that while data standards and sharing are widely considered important, most people still struggle with both.
We surveyed 122 researchers, most from pharma, biotech, or academic labs, though hospitals, government labs, CROs, and manufacturers were represented. 76% of the respondents said that data re-use is either very important or required within their organization, and yet 62% said that integrating internal data is a challenge while 68% said integrating external data is a challenge. Data use requirements are only somewhat understood at 44% of organizations according to the respondents, with equally small percentages—6% and 7%--either not understanding requirements at all or understanding them very well.
Nearly two-thirds of the respondents said that using public standards to structure their organization’s data is either very important or required. We listed twenty different vocabularies and ontologies for respondents to choose from, and still 21% reported using a different model.
When reporting the ease with which organizations can get to clinical data or metadata, most respondents said it was either neutral or hard to very hard to get the data. Metadata are slightly more difficult to access. When data are shared, half of the respondents reported they are shared by either email or through DropBox, Box, or SharePoint. Another nearly 20% reported using a project database or content management system to share data.
We didn’t limit our survey queries to open source data, and we didn’t ask about FAIR principles specifically, but it’s telling that users are expressing frustration even finding data within their own organizations and understanding data use requirements. It may be time for a data revolution.