UK Biobank Contracts With DNAnexus, AWS to Build Data Analysis Platform
By Allison Proffitt
August 26, 2020 | UK Biobank, a long-term study of health data from 500,000 UK volunteers, announced today that it is developing its own cloud-based Data Analysis Platform to make its extensive medical data available to a broader range of researchers around the world. Funding for the Data Analysis Platform has come from Wellcome, and it will be developed over the next three years by DNAnexus in collaboration with Amazon Web Services (AWS).
UK Biobank stores blood, saliva, and urine samples from 500,000 people; those samples are analyzed in various ways and the data are shared. With grants and through pharma and industry partners, the UK Biobank has assayed biomarkers in the samples, genotyped them all, done some exome and now whole genome sequencing, and conducted imaging on some subsets. All of these data are linked to health outcomes data and all of it is made available to researchers worldwide.
Thanks to those partnerships and new sequencing projects, the UK Biobank database is expected to grow to 15 petabytes over the next five years, and the current data sharing model—data provided to approved researchers for download—is no longer feasible. It was time for a new approach.
“All of the data were getting so large, our current model of sending data to researchers so they can work on it is becoming limiting,” Professor Sir Rory Collins, UK Biobank Principle Investigator, told Bio-IT World. “The data are just so big; relatively few researchers have the scale of data storage or compute.”
By developing its own cloud-based Data Analysis Platform, UK Biobank can make the data more easily and more cost-effectively accessible to approved researchers around the world. Researchers will be able to analyze the data on the platform, enabling quicker processing times and reduced storage needs at much lower cost.
UK Biobank launched a tendering process to solicit proposals. Among many strong proposals, the joint DNAnexus and Amazon Web Services entry won the three-year contract. DNAnexus will build the platform; AWS will provide the computing power. Data will be stored in the AWS UK Region.
The Data Analysis Platform will undergo development and testing during the remainder of 2020 and the first half of 2021, being launched to all researchers by summer 2021. AWS has pledged $1.5 million in research credits over three years to support access to approved researchers from low- and middle-income countries and early career researchers. “It should be a very cost-effective way of allowing many more researchers around the world to use the data, particularly with this Amazon subsidy,” Collins said.
Data in the Big Bank
DNAnexus isn’t new to the UK Biobank. In 2019, DNAnexus worked with Regeneron to build a cohort browser to view the exome sequencing data Regeneron was generating along with other UK Biobank data as part of their own research. Bio-IT World recognized the effort with an Honorable Mention in our 2019 Innovative Practices competition.
But this new platform will be built specifically for the UK Biobank, Collins explained. “It’s going to be more extensive in terms scope because it’ll be dealing not just with exome sequence data but with whole genome sequence data, and also the platform is being designed so it can be used by lots of different researchers around the world.” Access to these data will remain, at all times, subject to the UK Biobank’s own non-discriminatory and transparent access procedures and protocols.
The proposed platform also won’t be only for analyzing UK Biobank data. Collins says that the platform will be designed to make it easy for researchers to bring their own data into the analysis environment and compare it with UK Biobank data.
He also hopes that the work UK Biobank does designing and refining the platform will be transferrable to other groups “There are any number of large cohorts being created,” around the world, Collins notes. “In working with DNAnexus, one of the things we want to do is to develop a platform that other cohorts in other parts of the world might also find useful for making the data available not only for their own analyses, but for other people to analyze their data.”
“DNAnexus is proud to be working with UK Biobank and AWS, two established leaders in this fast-paced industry, to deliver a novel Data Analysis Platform that democratizes data access, enhances productivity, and furthers understanding of this rich dataset,” said Richard Daly, CEO at DNAnexus in the public announcement of the project.
“Over the past 11 years, DNAnexus has supported the diverse scientific aims of researchers worldwide, accelerating digital transformation by simplifying complex data analysis, clinical data management, and insights at scale. We enthusiastically support the foundational UK Biobank project as it breaks new ground in the advancement of disease research through the integration of deep healthcare data with genomics and advanced tools.”