The Imperative For Open Data Access And Sharing: A Progress Report
Contributed Commentary By Frank W Rockhold and Asba (AT) Tasneem
February 21, 2019 | Clinical data sharing creates new value, with benefits to science and society. These include the ability to verify research findings, combine data and insights from multiple studies, inform future research, reduce the cost of answering questions, and lower the risk to patients of interventional studies. Successful data sharing initiatives will need to assure high quality data, find the right balance between data generators and users, and optimize costs and data accessibility. Remaining obstacles to open data access include the need to consider patient privacy, lack of consistent use of data standards retain credit for data generators, and enable ease of data accessibility through improved data science. There has been a great deal of activity, which is a positive step, yet current approaches are somewhat fragmented in terms of process and openness.
The range of options and openness is displayed in Figure 1.
Standardized approaches will be needed to ensure that true transparency is exhibited in the broad access and utility of data, patient data privacy is protected, and funding is secured for data sharing technologies. To make true transparency a reality these systems should converge to a process offering the maximum openness with assurances to maintain participant privacy and data utility. This will allow the greater value of data sharing to be realized. A fair criticism of the current approaches is the relatively low volume of peer-reviewed papers appearing from secondary analyses. This broader ease of access would only increase that flow.
Progress Toward Standardized Approaches
Among the recent advances in creating standardized approaches are the launch of the Vivli platform, the Credit for Data Sharing initiative, the Supporting Open Access to clinical trials data for Researchers (SOAR) initiative, the mandate for data to be shared from all trials receiving National Institutes of Health funding, and European Medicines Agency (EMA) efforts to anonymize clinical trial data (Figure 2).
The Vivli platform is designed to bridge the fragmentation in the current data-sharing ecosystem and provide data archiving and hosting capacity (Figure 2). As of January 22, 2019, the platform included 3,200+ clinical trials from 16 organizations, with data from 100 countries and 1.5 million trial participants. The platform provides a secure research environment that allows researchers to analyze and aggregate data, share data across a number of existing repositories and platforms, and bring their own data sets and statistical software to use on an individual secure research environment within the Vivli platform.
Importantly, the Vivli data use agreement ensures that the privacy rights of clinical trial participants are respected while advancing the goal of scientifically valid secondary analyses and balancing the interests of data contributors. Vivli is implementing advanced informatics technologies to meet FAIR (Findable, Accessible, Interoperable, Reusable) principles. The platform governance involves transparently displaying the criteria for access, any exceptions to those criteria, and how studies may be accessed.
The Credit for Data Sharing initiative to describe and implement a system to appropriately credit researchers for sharing data was launched recently by the Association of American Medical Colleges (AAMC). This initiative aims to reward good data management and curation; build on and leverage current data citation efforts; be an end-to-end solution that is understandable and accessible to all stakeholders; permit tracking of data re-use, applicable to each contributor; be an anticipated and routine part of journal submission and publication; be recognized for academic advancement; and permit tracking of data reuse for funders for impact assessment.
The SOAR initiative involves two major efforts. One is a partnership to advance academic data sharing between the Duke Clinical Research Institute (DCRI) and Bristol-Myers Squibb. Some 37 data requests were fulfilled in the first nine months of 2018. This successful case study of data stewardship and data governance aims to facilitate open sharing of clinical research data with responsible researchers, verification of reported results, and pursuit of interesting secondary uses of data. The second and more unique as aspect of SOAR makes Duke data available for sharing, including the Duke Cardiac Catheterization Research Dataset (DukeCath), with more than 150,000 catheterization procedures in more than 80,000 unique patients, and the Duke Cardiac Catheterization Educational Dataset (DukeCathR). SOAR plays a key role in developing a data-sharing plan, and providing the IT infrastructure required. This was one of the first examples of open data access in the academic community.
EMA efforts to anonymize clinical trial data include individual patient level data and real world data in the context of registries and individual cohort studies. EMA has formed an advisory group to establish clear guidelines for patient data de-identification while retaining data utility.
The agency also has a dedicated portal providing online access to anonymized clinical data submitted by pharmaceutical companies to support marketing applications for human medicines, offering access to 6,650 documents as of September 7, 2018. EMA claims to be the first regulatory authority worldwide to provide such broad access to clinical data.
Time period |
Initiatives launched |
2007-12 |
Virtual International Stroke Trials Archive (VISTA) Infectious Diseases Data Observatory (IDDO) Immune Tolerance Network TrialShare (cosponsored by the National Institute of Allergy and Infectious Diseases and the Juvenile Diabetes Research Foundation) |
2013-15 |
Yale Open Data Access (YODA) ClinicalStudyDataRequest (CSDR) Project Datasphere (focused on cancer) SOAR™ Initiative (Supporting Open Access to clinical trials data for Researchers) |
2016-17 |
National Institutes of Health mandate for data to be shared from all trials with NIH funding Credit for Data Sharing initiative European Medicines Agency technical anonymization group |
2018 |
Vivli platform for data sharing |
2019 |
International Committee of Medical Journal Editors (ICMJE) began requiring a data sharing statement as a condition for publication |
Clinical trial participants’ views on data sharing
A recent survey provides interesting insights into clinical trial participants’ views on data sharing, concluding that few have strong concerns about the risks involved. The survey involved 771 current and recent participants in a diverse sample of clinical trials at three academic medical centers in the United States. Fewer than 8% of respondents felt that the potential negative consequences of data sharing outweighed the benefits. A total of 93% were very or somewhat likely to allow their data to be shared with university scientists, and 82% were very or somewhat likely to share with scientists from for-profit companies.
Funding for data sharing technologies
Looking ahead, funding institutions should be more careful about funding disparate data standards efforts and data siloes. Funding for data sharing remains a bottleneck, especially for academic organizations and after a trial has ended. Combining datasets—such as the Collaboration for Alzheimer’s Prevention planned effort—will increase insights into outcomes and supports informed decision-making without adding risk for patients. Investigators are increasingly exploring machine learning, data mining, and predictive analytics approaches to derive insights from combined datasets.
Despite some very real challenges on this data sharing journey, multiple promising data access platforms are in place. Strengths include development of a governance framework and agile processes for data sharing, providing a consistent and unbiased expert review on proposals. Vivli also has promise in making datasets findable and available through a central platform. The next step will be to populate the data search engine with valuable and reusable data, containing detailed information on datasets, such as metadata, a data dictionary, and supporting documents). In future, increasing collaboration between all clinical trial stakeholders will lead to further progress in the data access journey, by managing and overcoming the remaining challenges. Ultimately this will provide broad access to high quality data, with an appropriate balance between patient privacy, data utility and attribution of credit to both data originators and secondary users.
Frank W Rockhold, PhD, (frank.rockhold@duke.edu) is Professor of Biostatistics and Bioinformatics at the Duke Clinical Research Institute, Duke University School of Medicine, Durham, NC. He serves as senior advisor to Vivli and is a member of the European Medicines Agency technical anonymization group.
Asba (AT) Tasneem, PhD, (asba.tasneem@duke.edu) is Informatics Project Leader, Technology and Data Solutions, at the Duke Clinical Research Institute, Duke University School of Medicine, Durham, NC.