Data Product Thinking: Bayer Builds Products with Users in Mind
By Allison Proffitt
December 8, 2022 | Last month at the Bio-IT World Conference & Expo Europe, Alexandra Grebe de Barron, senior data product owner, digital transformation & IT, introduced the audience to Bayer’s Real-World Data Store, a human- and machine-friendly self-service shop that facilitates real-world evidence generation across the life-long patient pathway.
At Bayer, de Barron said, the goal is to understand the entire patient pathway from pre-diagnosis to diagnosis to treatment. Shortening this odyssey is important, she said. “For the patient, that’s time lost to get a cure for the disease or treatment for the disease.”
But Bayer is also looking bigger picture. Once we have treatments, she said, the pharma looks at the healthcare setting. Are treatments affordable and accessible? Do they predict—and later see—good adherence? For chronic diseases, can they reach patients while preventative measures may still improve outcomes?
In addition, they are looking at the pre-diagnostic period. For Bayer to offer digital healthcare services that matter to the patient, the pharma needs to understand what patients care about, what quality of life markers are most valuable. For planning drug development, Bayer needs to understand trial recruitment, make use of decentralized studies, and identify patient-relevant endpoints. Finally, the pharma needs to make commercialization and communication decisions with the patient in mind.
The answer for every one of these questions, de Barron said, lies in real-world data.
The Answers Come with Questions
But real-world data sources may be even broader than the questions that need answers. Data come from electronic health records, lab tests, pathology reports, provider notes, pharmacist notes, medical claims data, molecular profiling, family histories, fitness trackers and wearables, social determinates of health, environmental factors, social media posts, and more.
In the past, teams at Bayer worked fairly autonomously to choose and use real-world data sources. Data were scattered across multiple data repositories and all teams could not access the data sources Bayer had purchased. Patient data analysis is complex—both technically and from a compliance standpoint—and teams spent a lot of time learning to work with the data and understanding the various coding schemes, de Barron said.
“License owners” were reluctant to share data outside of their teams because they are accountable for right usage, she explained. They wonder: “Why should I now share the data [that is already purchased] with others, because I’ll get a lot of questions on the data and it’s a lot of work for me to take care that they use the data in the right way.”
But Bayer, on the whole, wanted a better return on investment for these real-world data purchases. “For Bayer as a company, it would, of course, make sense if this data is broadly used,” de Barron said.
So she and her team set out to build a system to enable that.
Data Products in Theory
“We were very intrigued by the data mesh principles that were published by Zhamak Dehghani from Thoughtworks,” de Barron said. The “data as a product” approach resonated with the way Bayer approached the RWD Store.
Now Bayer defines a successful data product as one that drives value, is feasible, and is usable from both a technical and business perspective. “We only do this work if there is a good use case behind that—if you can really create business value with that,” de Barron said. The new approach also prioritizes data products that serve many users instead of niche tools.
The result, she said, is a shift from application-centric thinking to data-centric thinking, with data exposed to the many users and many domains. “The system where it was created is secondary in the end,” she said.
De Barron has found it helpful to think “a little bit like a startup company that wants to bring this product to the market.” With that mindset, user delight is prioritized: creating data products that are easy to use, understandable, accessible, secure, interoperable, trustworthy, and valuable. She recommends the design thinking method and Google’s HEART framework as a measure of user experience: noting Happiness, Engagement, Adoption, Retention, and Task (how easy is it to complete a task).
Finally, she said, it’s also important to do the right communication campaign. “Even if you develop a great product, if nobody knows about it, this can die very fast.” De Barron’s team has created SharePoint slides, presented to teams, and built a real-world community within Bayer.
Creating the Store
With this thinking in place, Bayer developed the Real-World Data Store, which currently houses three products: RWD Dashboard, Medical Definition Library, and RWD Assets. COLID, the Corporate-Linked Data Catalog created by Bayer and now open source, serves as the foundational data registry and marketplace for the RWD Store. (For a comparison between COLID and other data management systems, see Comparing Open Source Research Data Management Tools.)
The RWD Dashboard is an overview of 150 RWD sources and their metadata available within Bayer contributed by 30 data stewards. The RWD Dashboard includes details about how data sources have been historically used within Bayer, so anyone exploring the Dashboard can see prior use cases. Each dataset is tied to an individual, so interested users know who to contact for more information.
The Medical Definition Library is a central source for medical definitions to make patient cohort descriptions consistent. Instead of googling medical terms and copying codes from ICD-9 or ICD-10 or SNOMED into protocols, now teams can see centralized lists of codes for various medical terms linked to the original dictionary. It’s a lot of work to set up these libraries, de Barron said, “Really each project team does it on their own. We wanted to have a central place where people can register these medical definitions and share them with others.” This product is built on a platform already used within the company for ontology management and dictionary management. Again, terms are linked to projects and data sources where they have been used within Bayer so potential users can see terms in different use cases and understand context: in different geographies, as a primary diagnosis vs a comorbidity, etc.
Finally, RWD Assets is the cloud environment that allows data scientists access to the available data. The RWD Assets re-usable data service began as a pilot project to host Explorys EHR data in a data analytics environment consisting of more than 60 million unique patients. “We had in our mind to create this re-usable data service,” de Barron said. The team connected an AWS cloud infrastructure to Science@Scale, Bayer’s internal analytics cloud environment.
The team built an initial quick solution that did not scale well. Learning from that effort, they separated storage and compute and currently use Parquet for data storage and Snowflake as a big data query engine.
De Barron reports a huge savings with the RWD Assets—more than 80% decrease in cost—as well as increased performance. “This gave us a lot of confidence,” she said, and the team migrated legacy datasets from Cloudera and OMOP CDM from Redshift into RWD Store and connected RWD Store to the S3 buckets housing MIMIC III and MIMIC IV.
“Now people call us and ask, ‘We have a new real-world dataset; can you make it available for us?’ And we can usually in one or two weeks.”
De Barron has already identified the next data product her team will develop. With 10 to 15 years of longitudinal real-world data in some cases, she envisions a data product that will provide corrected longitudinal patient cohorts as a product. This will require standardizing across historically-evolving patient diagnosis code dictionaries and confirming that individual patients are identified correctly over time.
“We want to make a little data product to make the historization of these dictionaries and have one look and feel for our consumers,” she said.