Allotrope Foundation: A Framework For Knowledge Management
By Benjamin Ross
August 25, 2017 | 2017 Best Practices Awards | The Allotrope Foundation, an international consortium of pharmaceutical, biopharmaceutical, and other scientific research-intensive industries, is rolling out their Allotrope Framework to enable cross-platform data transfer; facilitate the finding, accessing, and sharing of data; and enable increased automation in laboratory data flow with a reduced need for error-prone manual input. The Framework has been available to members and partners of the Foundation since 2015, and the first phase of its commercial release occurred this July.
The Allotrope Framework has been under development since 2014 with the help of Osthus, the Foundation’s technology partner, after members of the Foundation dug into the existing data standards, architecture, and related taxonomies and ontologies initiatives.
“At that point we were invested in understanding which of [the existing data standards, architecture, etc.] could be brought to bear in a way that would be useful for the problems that we wanted to solve,” Dana Vanderwall, Director of Biology & Preclinical IT at Bristol-Myers Squibb and Allotrope Vice-Chairperson, told Bio-IT World. “Even during the course of that [assessment] we were developing software and building a number of functional proof-of-concept applications that were deployed at a number of member companies, which enabled the kind of testing and evaluation of the different standards that allowed us to understand what would meet our industry's requirements.” Ultimately the output of that was to bring those standards together and introduce new concepts.
The result was the Allotrope Data Format (ADF), a platform and instrument independent federation of standards that features the ability to store datasets of nearly unlimited size and complexity in a single file, organized as a single or multiple n-dimensional arrays to record the measurements of the experiments, including time series and hyper-dimensional data.
According to the Allotrope Foundation, the ADF is built on the well-established HDF5 file specification for storage of data in a binary format, within which acquired data are stored in one or more Data Cubes. Metadata are stored in the Data Description layer using the Resource Description Framework (RDF) Data Model for process, material, instrument and result details, as well as the metadata describing the Data Cube and Data Package layers, all based on semantic web and linked data concepts. The Data Package accommodates any ancillary files such as native instrument formats, images, PDFs, videos, and more.
The ADF is crucial, serving as the core of the Allotrope Framework, although it is one of three main components to the Framework. In order to provide a controlled vocabulary for the metadata that is packaged in the data description layer, the Allotrope Foundation has also created the Allotrope Ontology.
Publicly-available knowledge sources are often modeled in very different ways for different purposes, and therefore can be challenging to relate to one another or integrate into a coherent whole. The Allotrope Oncology provides base-level taxonomies and formal ontologies to provide a controlled vocabulary and unambiguous meaning, Vanderwall explained. The taxonomies and ontologies are modeled into five domains: material, equipment, process, result, and properties.
The third component of the Allotrope Framework is a collection of data models using SHACL (SHAPES Constraint Language), a recently released W3C standard. These include ways of defining and constraining the way that the semantic content is packaged in the data description layer of the ADF.
This holistic set of capabilities is what has ultimately led to the Framework’s success and its level of recognition, which includes being awarded the 2017 Bio-IT World Best Practices Award for Knowledge Management in Boston this past May.
Collaboration has been crucial in the development of Allotrope Framework, drawing on feedback and contributions across the member companies as well as vendors in the Allotrope Partner Network. After all, the Allotrope Foundation was founded in 2012 with the idea that the companies that rely on producing and analyzing data could work together with the commercial entities that provide the products and services to support that fundamental need. The Foundation enables the integration of their framework in commercial products through a partner program they call the Allotrope Partner Network. The program serves as the venue through which the instrument software vendors and the service can join, as well as the academic and non-profit organizations looking to use the Allotrope Framework, Vanderwall said. “Potentially anyone providing capabilities in data acquisition, analysis, or management can join the partner network.”
Vanderwall also sees the application of the Framework expanding. “The initial focus for Allotrope was in the analytical chemistry domain,” he said. “But some of the members [of Allotrope] and the way they’re using [the Framework] are already stepping out into other types of data that we acquire in R&D. We can easily see it getting much more broadly used in the biologics area, for example.”
The Allotrope Foundation hopes their Framework will allow researchers to rethink the way they acquire and manage scientific data. “We’re excited about the fact that with clean, more homogenous data, and consistently controlled metadata, we’re [able] to do some really interesting data science and really… enable the application machine learning and artificial intelligence algorithms with a substantial volume of clean data,” Vanderwall said. “I think we’re really setting ourselves up for the next generation of analytics.”