Bio-IT World Virtual: First Plenary, AI Workshops, Globus For Data Transfer
October 6, 2020 | The Bio-IT World Conference & Expo Virtual launched today with an open plenary keynote from Drs. Susan Gregurick, Associate Director for Data Science at NIH and Rebecca Baker, Director of the HEAL Initiative at NIH. Gregurick outlined the NIH’s efforts to make its vast stores of data available to researchers via single login and encourage interoperability across the many institutes that are part of NIH. Baker shared how those efforts are enabling the mission of HEAL—helping to end addiction long-term—an effort of NIH in fighting opioid dependence. With the COVID-19 pandemic requiring so much of researchers worldwide—and the amplification social isolation has had on our existing opioid crisis—there has never been a more important time to be able to access, share, and mine data.
Workshops dominated the rest of day one. Nigel Greene, Director & Head Data Science & Artificial Intelligence, Drug Safety & Metabolism at AstraZeneca Pharmaceuticals outlined the application of data science and AI to improve candidate selection in drug discovery. Therapeutic index is often uncertain at candidate nomination, and dose makes the poison, he reminded the audience. But dose also impacts compliance—a drug to take once a day will have far more adherence to a drug you take three times a day.
Data science can shed light on much of these early questions, Greene argued, but good data science must be built on plentiful and high-quality data. He encouraged the industry to do the very hard work of gathering data from public and internal sources, particularly highlighting in vivo data that can be explored to drive decisions and wearable and sensor data to gather patient information. He also emphasized the importance of quality. When you’re doing machine learning, he said, the quality of the data impacts the system’s predictions.
Data and the knowledge gained from it are a company’s most valuable assets, he said, but warned that data generation is growing exponentially and will exceed our capacity to digest it all. Here is where artificial intelligence will be essential, he said. Drug discovery a multi-parameter problem that is beyond the human mind’s ability to fully manage. AI and machine learning are here to stay.
Pfizer’s Peter Henstock offered a two-hour crash course on artificial intelligence (AI) or what he describes as the “science of optimizing effective models from different types of data.” His focus was on supervised learning, where computers get trained on the relationship between labelled data and unsupervised learning, where distance-based “k-means clustering” is often employed to create buckets of data points intelligible to humans.
Workshop attendees learned “the good, the bad and the ugly” about various clustering approaches and the t-SNE technique and its successor UMAP, a tree hierarchy that is superior at preserving the global structure of data—and the only known tool that can visualize a large virtual library of compounds and show their ADME (absorption, distribution, metabolism, and excretion). Henstock also did a deep dive on the basic classification methods used in supervised learning, including linear regression, k-nearest neighbors, and random forest, and introduced his students to concepts such as “bias variance tradeoff” and the “bagging” (aka bootstrap aggregation) technique so algorithms don’t model training data too well. When making important predictions, like whether to amputate a limb or determine if a tumor is malignant or benign, he would advise paying attention to the receiver operating characteristic curve to compute the accuracy of the binary classifiers based on true-positive and false-positive rates.
While AI and machine learning (ML) are virtually synonymous, deep learning stands on its own with accuracy that outperforms traditional ML, says Henstock. Backpropagation, short for backward propagation of errors, is the workhorse algorithm here. Deep learning capabilities include automatic speech translation, recognition of street signs (Google Maps), and reading traffic signs better than the average human.
The use of digital biomarkers and wearables in pharma R&D and clinical trials was the subject of another workshop by Danielle Bradnan (Lux Research), Graham Jones (Novartis) and Ariel Downing (Takeda Pharmaceuticals). Smart and hybrid watches are leading the pack, although the vast majority of conditions addressable by the collected data are still being discussed in a research context. Remote monitoring (aka telehealth) and connected devices are good fits with a healthcare paradigm that is shifting toward value-based care and emerging care locations that include not just the home but also the gym, commute to work and vacation destinations, says Bradnan.
Sobering low patient adherence rates with pharmaceuticals may take some patient-centered “empathy mapping,” according to Jones, to include situational awareness (the usefulness of interventions), lifestyle congruence, and a frictionless experience. He highlighted the role of patient personas—rule following, researching, and disengaging—in improving adherence rates among heart failure patients as well as the potential of the smartphone-based Captivate app and linked web portal for conducting remote research. Potential use cases include identifying user errors with closed-loop systems for drug delivery.
Dowling shared Takeda’s experience with wearables in clinical trials via a pair of case studies, including clinical validation of a portable EEG device in a narcolepsy study and use of a digital biomarker (gate variability) as a primary endpoint in an early-phase trial quantifying the effective of a drug in preventing falls in patients with Parkinson’s disease and cognitive impairment. Adoption of digital biomarkers in studies is increasing, she says, but it will be a while before they’re used for assessing outcomes for pivotal trials requiring regulatory approval versus for internal decision-making about whether to move forward with a drug candidate.
Brigitte Raumann, project manager for Globus, a research data management department of the University of Chicago and a finalist in Bio-IT World’s 2019 Best of Show awards, demonstrated how Globus solves many of the problems surrounding moving data from instruments into storage and out for collaboration and analysis in another workshop. All of that moving data needs to be access controlled and encrypted, and the Globus solution can meet those needs in several different use cases.
For example, Raumann highlighted NYU Langone Medical Center, which has a great deal of storage on the premises, but still needs to burst to the cloud for compute occasionally; the SIGNAL clinical trial, which gathered large medical imaging data from all over to be analyzed at a central location; a cancer research network, which moves and replicates large datasets around the world; and finally Argonne National Laboratory which is doing serial crystallography on SARS-CoV-2, cycling analyzed findings immediately back into the experiment design. For each of these organizations, Globus enables smooth, safe, and efficient data transfer.
Along with Raumann and Rachana Ananthakrishnan, executive director of Globus, Mike Cianfrocco of the University of Michigan and James Weatherell of Harvard Medical School joined the course. Globus has been our go-to recourse for making data security available to collaborators, Weatherell said. It’s helped up simplify the process of data collaboration. And at Michigan, Cianfrocco reported that his group has offloaded all user authentication and data transfer to Globus when delivering hands-on cryoEM tutorials.
Globus does not store data or change any existing storage permissions. Instead, it orchestrates and moves data on behalf of the users, using one of several interfaces: RESTful API, command line, or web interface. Globus supports Posix-compliant systems including Box, Google Cloud, Wasabi, AWS, IBM Cloud and more. The next storage system Raumann expects to announce is iRods.