Pfizer's Model For The Intelligent Data Framework
By Allison Proffitt
April 25, 2019 | Vijay Bulusu opened his plenary session at the Bio-IT World Conference & Expo last week by asking the audience whether—if given $1 million to spend—they'd buy a machine learning platform or improve the quality of their data.
Nearly everyone voted for improving data quality.
Bulusu was pleased. Life sciences doesn't really have a big data problem, he explained. It has a "lots-of-small-data problem," and data should be our focus.
Bulusu is head of data and digital innovation for PharmSci Worldwide Research & Development at Pfizer. He sees first-hand the challenges of unearthing data in big pharma. Typically data moves from research to development to manufacturing, he explained, but we don't get any feedback loop. Some problems are easy to solve within a company vertical, he said, but others require data across verticals or studies, and these are much tougher.
Pfizer wanted fix that, to feed learning back into the drug development process across verticals, so they built an Intelligent Data Framework (IDF). The mission, Bulusu said, is to create a robust, foundational information architecture and deliver access to internal and external information in an intuitive, self-service manner allowing efficient knowledge sharing.
The stakes are high, he insisted. Regulatory inspectors have stated that "quality is of concern" if data cannot be found within minutes. And getting the most from predictive analytics depends on rapid access to high quality and comprehensive data.
To build this flexible data ecosystem, users—both data creators and data analyzers—needed to understand the systems where data may reside (websites, repositories, databases); how to login to and navigate those systems; how data are identified in each system and how to accurately search; and finally, the level of trust or accuracy in the data in each system.
Pfizer began assembling the pieces of IDF puzzle. One of the keys to making sense of lots of small data is data standards, vocabularies, and ontologies. Standards are the "fundamental bit that helped build our ecosystem," Bulusu said. "We spent months discussing the difference between 'lot' and 'batch'."
Internally, the goal was to level the playing field between information consumers and information producers and guarantee common understanding and language between the groups. Externally, the group used the Allotrope Framework and FAIR data guiding principles to set standards.
IDF Foundational Services ensures consistent use of key identifiers between systems, linking target names and compound IDs, and drug substance lot IDs.
Once data were organized, data access needed to be web-based, scalable, and allow mining beyond the initial question. The team wanted a "bird's eye way to view our data that provides users the ability to navigate through disparate data sources," Bulusu said.
The solution was a linked data system called IDF Konnect. "Without having to create a huge warehouse, we created a mind map," he explained, that could be shared with others and edited. The map represented how data from different research areas were connected and helped provide easy access to information from systems.
Pfizer hasn't yet included all types of data in the system. Raw data from lab instruments, for example, aren't yet included. Simulated data are not included. The sneakernet is still the mode of access for these datasets, Bulusu said.
The IDF Scientific Data Cloud sits at the intersection of data capture, use, prediction, and reuse—linking data, equipment, Foundational Services, and Konnect. "It's a GxP compliant storage & analytics system for lab and manufacturing data that leverages big data and cloud computing technologies," Bulusu explained.
The IDF Scientific Data Cloud was inspired by Apple and iTunes. "You don't save files on your phone, name them, and remember that file name forever," Bulusu said. Pfizer wanted the IDF Scientific Data Cloud to work as flexibly, pulling and analyzing data directly from a cloud repository.
The IDF Scientific Data Cloud was the last step in creating a complete data ecosystem, Bulusu said. "We've been able to achieve data convergence, thus speed and quality."