Regeneron’s Platform Play for Protein Prediction at Scale
By Allison Proffitt
June 18, 2024 | AlphaFold was revolutionary to protein scientists and their work, but as soon as the new capabilities became clear, scientific demand exploded. Regeneron engineers used the AlphaFold2 model to create a custom, cloud-based solution to allow analysis of thousands of protein binding partners in parallel.
The project won the team a Bio-IT World Innovative Practices Award, sharing the “Informatics to Achieve Operational Excellence” category with two other Regeneron projects. Cuie Hu, director of Computational Engineering at Regeneron Pharmaceuticals, presented the work in the awards presentation at the Bio-IT World Conference & Expo in May.
Previously, computational solutions predicted the structure of a single protein at a time, Hu explained. “If they needed to predict structures of, say, five proteins, they would have to rerun the same job five times,” she said. Hu’s engineering group built a platform using the AlphaFold model that would let Regeneron scientists predict the structures of multiple proteins or multiple protein-protein interactions simultaneously.
The platform uses AlphaFold2 to analyze smaller proteins and OpenFold for larger proteins.
The engineers saved time in money in multiple ways. The first step of both the AlphaFold2 model and OpenFold is multi-sequence alignment (MSA), which chops up a protein into smaller segments and scans public databases of known proteins for matching sections. This search step does not need a GPU, Hu pointed out and can run efficiently—and cost effectively—on CPUs. The structure prediction step is computationally intense, though, and does need to run on GPUs, Hu said.
But this still only predicts the structure of a single protein at a time.
“When you have proteins in the system, they are never alone,” Hu said. “If they are going to, say, attack a virus they are going to bind to a protein on the surface of the virus first. It’s a protein-protein interaction. For our scientists to understand the disease mechanisms, for them to develop drugs and test them, they need to understand the interactions between the proteins.”
That was a problem of volume that Hu’s group wanted to address.
AlphaFold Multimer is an extension of AlphaFold2 that predicts protein-protein interactions. After enabling that, Hu’s group added custom features for structure prediction in parallel of thousands of protein pairs. For instance, the code was modified so that the MSA search step is not repeated for each pair, but done once, and then structure predictions are done are parallel. Some code was redundant between steps, so that code was optimized to take out redundant steps. “At this point, the runtime of our solution is probably 50% of what we started with,” Hu said. “So it’s a great cost savings by adding those customizations.”
Use Cases
“Now, our scientists can simply indicate which protein sequences they want to study, and the application will spawn thousands of jobs in parallel making it possible to predict interactions of thousands of protein pairs simultaneously. This change has reduced the computation time from months to a few days,” the team wrote in their application. Researchers can also keep one protein stable while screening thousands of candidates for the best binding partners.
In her presentation, Hu went on to outline the types of questions Regeneron researchers could now efficiently explore using the platform to model: protein-protein interactions including antibody-antigen, cell-cell, antibody-viral capsid binding, and transmembrane protein activity; antibody structures; and protein characterization.
“What’s unique about our project is that we didn’t wait for the scientist to come to us and go, ‘We want this. Give us this.’ We anticipated what they needed, put a solution together, and now they’re actually using our solution in drug development. Our team and I are proud to be making this little contribution to Regeneron’s drug development to make a patient’s life a little better.”