Foundations for Change: A Framework for Transforming R&D Outcomes With AI

Contributed Commentary by Dr. Ben Sidders, Biorelate

November 8, 2024 | For AI and data-driven drug discovery to have a tangible impact on R&D success, a change in the culture is as important as any investment at a data, model, and validation level. Until quite recently, the perceived value of AI in drug discovery and development failed to match the hype. Now, however, a number of point solutions are having a positive impact within aspects of R&D, providing pointers about how to successfully embed AI at its core. In target discovery, for instance, knowledge graphs are now proving adept at integrating a vast number of data sources into a query-able structure, forming the basis for informed and relatively unbiased target prioritisation decisions.

Challenges remain, however. Predicting synergistic drug combinations has been the topic of extensive research with only limited success and almost no translational relevance. Nor are we any nearer to being able to predict the effect of a drug on a given patient without first running a clinical trial. To achieve tangible transformation to R&D outcomes, a structured and integrated approach to AI’s role and reach is necessary, spanning Data, Model, Culture, and Validation considerations.

Data Challenges

Thus far, AI has found most success where the dataset is large, complete, and in many cases has been generated specifically to solve the problem at hand. The UNI foundation model for computational pathology, for instance, was trained on >100 million images from 20 tissue types. In contrast, one of the largest datasets available to train models for drug combination synergy prediction has 910 combinations of 118 drugs—many orders of magnitude smaller. This problem is further exacerbated when we look at data from clinical trial cohorts, which is often sparse and inconsistent in what is measured. For example, one trial might collect demographics and data for a specific blood-based biomarker; another might also collect genomic data.

Biomedical knowledge is often locked away in text; the scientific literature or patents, for example. Extracting knowledge from this unstructured data is key. Historically, this has been achieved with natural language processing, but the rise of large language models is advancing us toward a human level understanding of the text. To make use of the output however, it remains critical to map the model findings to well-curated ontologies.

The underlying issue is that pharma’s data, particularly that from clinical trials, was not generated for AI. To exploit data in a meaningful way using AI, companies must develop a data strategy—and be willing to fund and generate data on clinical cohorts if possible—to build useful data of the required scale.

AI Models

While AI models excel at classification and predictive problems, if AI is to revolutionize drug discovery, it must incorporate causality. Predicting that a drug might work in a new indication is valuable, but it is not the same as explaining why the drug will work in that indication. To support internal and regulatory decision-making, it is essential to have explainable biology that supports a mechanistic understanding of the particular drug or biology.

The integration of prior knowledge and data-driven insights offers a promising solution. AI combined with highly accurate causal relationships can distil both a broader array of targets with strong promise and a mechanistic understanding of their biological role in disease.

Cause-and-effect relationships can be mined from the literature and created from experimental data. These relationships, defining the regulatory interactions between two biological entities, can be combined into structural causal models—a framework to represent and analyse the causal relationships between variables. Such models provide a systematic way to model how changes in one variable can lead to changes in another. These could be used during the training process of more expansive foundation models, but also to build specific mechanistic models that further describe the output from an upstream finding.

Acceptable Validation

The output from all AI solutions should be validated, experimentally if appropriate, with two provisos. First, the R&D function should be set up so that all data feeds back to the AI model. This helps to mitigate some of the challenges described above, while ensuring that the model can be continually improved.

Second, there needs to be a triage-based validation model. While an AI system is able to identify hundreds of targets, the challenge is to stay open to “left-field” opportunities that AI might highlight. Orthogonal in silico approaches might be used to go from 1000 to 100 targets, but to go from 100 to 10, the team should adopt the quickest, most high-throughput experiment to yield the next rung of supporting evidence.

Cultural Change

Underlying many of the data, model, and validation issues up to now has been the culture of the organization and its failure to fully adapt to an AI driven way of thinking or working.

While there are increasing efforts to bridge this gap, upskilling or recruiting talent with AI expertise is essential. At the same time, data scientists must be educated in the decision-making process of R&D and understand or develop methods that directly support that. More could also be done to build the understanding that AI will raise the productivity level of all R&D researchers—and is therefore an opportunity and not a threat.

Everyone Has a Role

Before long—certainly within a decade—we can expect every major decision taken along the drug R&D pipeline to be accelerated by unprecedented access to knowledge. But this assumes that companies have put the full suite of measures in place. For their part, data scientists will need to develop actionable models with causality at their heart. Biologists will need to determine how to effectively integrate data science into their workflows. Finally, heads of R&D will need to orchestrate more seamless integration and symbiosis between the two sciences.

Dr. Ben Sidders, Chief Scientific Officer at Biorelate, has been working at the forefront of pharma data science for the last two decades. Formerly Executive Director and Head of Early Data Science within Oncology R&D at AstraZeneca, Ben also previously spent eight years at Pfizer, and has extensive experience of many aspects of drug discovery for major pharma. He can be reached out at ben.sidders@biorelate.com.