Automating FAIR, Clean Data Generation from Flow Cytometry at Regeneron
By Allison Proffitt
June 6, 2024 | “In any of my endeavors, regardless of the technology and the process and the workflow, the human element is always the one that shines through. It’s always the one that sticks, and it’s always the one that takes it to the next level,” said Ronald Realubit, Principal Scientific Business Analyst (Therapeutic Ab Development) at Regeneron, in his awards presentation at the Bio-IT World Conference & Expo.
Realubit punctuated that statement with a map of the Regeneron team members behind the Automated High-Throughput Flow Cytometry Data Processing Pipeline project, highlighting 38 team members by name. The pipeline was one of three Regeneron projects that won a 2024 Bio-IT World Innovative Practices Award, all sharing the “Informatics to Achieve Operational Excellence” category.
Flow cytometry raw data requires extensive data processing, the team explained in their entry, then collecting and viewing the processed data is burdensome and time consuming—but absolutely necessary for quality control. The challenge was to automate a non-standardized workflow in a way that lets scientists drive the configuration and automation while enabling them to explore their data to make decisions.
Both manual and automated processing had to coexist to support demand from the R&D program, the authors wrote in their entry. Balancing the two was key to the innovative approach.
Research IT took the lead on the project, Realubit said, which combined multiple automation steps. The first component was a custom raw data processing script that enabled flexibility for a variety of multicolor panels. The next step automated the execution of this script for multiple plates or datasets. The team used a JSON file as both a trigger for script execution and instructions. The automation here supports high throughput by scaling resources, parallel processing, and pulling raw data from an instrument data lake.
There was an immediate need, the authors write, for a shared location between the scientists and the automation of the analysis steps. “The innovative use of this shared network location enabled a data management workflow that supports both manual and automated data processing steps.”
An instrument data lake automatically ingests both raw and processed data with their metadata and serves as a single data source for all automation and dashboards, while also providing workflows to move those data to next steps. Workflows add metadata from user-driven dashboards (not just instruments) and applies them to relevant raw and processed data. Workflows generate files to report screening findings to trigger downstream drug discovery steps. Workflows transform PDF images to JSON contained string encoded images. And workflows manage the use of ZIP files to move data faster through the pipeline, but then automatically unzip it for storage in the instrument data lake.
Finally, the entire pipeline is set up as a series of dashboards, with Quality Control as the ultimate dashboard. “Now, technically this is not groundbreaking, but you have to imagine the scientific relevance of such simple technology,” Realubit said. “I’m really proud about the basics of data management that we’re able to operationalize in our diverse scientific pool.” These dashboards are not simply data visualization tools, but data management tools that collect metadata, surface other relevant data, and trigger data processing automation.
Culturally FAIR
The project ROI has been significant. It saves scientists’ time by as much as 14-fold and increases Regeneron’s antibody screening throughput. Automation also improves data processing reproducibility and the pipeline delivers a quality control experience that improves data integrity by making it easier to identify operational or regent issues, the authors wrote.
“The pipeline proved that applied FAIR data can deliver a solution for a specific business need,” the application authors wrote. “This was not a project to create FAIR data for the sake of creating FAIR data. This project represented a use case where a business need was met by the application of FAIR data to create the quality control dashboard, a specific final deliverable.”
But a greatest impact, Realubit said, is in how the project has changed data culture at Regeneron. “Now with this project, before assay development [scientists] have to be trained on data management,” he said. There is, “a foundation for increasing data literacy of our scientists because of this pipeline. At the point of data creation at Regeneron, we are creating clean and FAIR datasets. That’s huge value.”