State of the Industry: Checking in on AI Trends and Progress with Fernanda Foertter
By Allison Proffitt
September 27, 2022 | In 2021, Fernanda Foertter spoke at the Bio-IT World Conference & Expo during the annual Trends from the Trenches session headlined by Chris Dagdigian. She didn’t mince words.
AI, “is not ready for most of us,” she opened. “If any of you are under any impression that somehow you’re going to say, ‘I have all this data and I’m going to implement AI at my organization and we’re going to cure cancer,’ you are absolutely dead wrong.”
In the latest episode of Bio-IT World’s Trends from the Trenches podcast, host Stan Gloss, founder of BioTeam, gets an update from Foertter about her AI experiences in the past year and whether we’ve made any real progress.
Foertter’s career began in physics, transitioned to genomics immediately out of graduate school, and she spent time at some of the industry leaders next: Oak Ridge National Laboratory, NVIDIA, and BioTeam. From there she has turned to hardware and biological sciences startups and now serves as a director at Voltron Data, using Apache Arrow to make data easy and accessible.
Gloss asked if she still thinks AI is a pipe dream. “It feels like it’s a pipe dream to be able to get these things right [today]. I don’t think it will be; I think as we get more technology, I think as we get the ability to regularize and get data in a way that looks as harmonized as possible, things will improve. But the problem is getting to that point.”
Creating models is easy, Foertter said, echoing a point she made in 2021, but having a dataset that is findable, accessible, interoperable, and reusable—that’s still the hard part. Great models have already been built by NVIDIA and many others and she encourages groups to borrow what they can.
“But that prep time—impossible, impossible [to avoid], unless you’re able to generate all new data.”
However she doesn’t believe that all of a company’s historic data needs to be harmonized and cleaned; Foertter advocates for imputation of large datasets from smaller, complete sets. She recommends hiring a statistician to help you understand how much data you need to train a model. “In the end, AI is 80% linear regression. And then there’s that 20% that’s neural networks and more complicated models.” A statistician can help determine if you could use just 10%, 5%, or even 1% of your data and impute the rest.
An insurance company, for example, may not need to curate everything. “They just need to curate a few high-fidelity individuals and then use that to infer what other individuals might be,” she envisions, “and use that as a new training model.”
Foertter also highlighted the value of creating edge cases. “That’s where I see a lot of the work being done now,” she said. Using self-driving cars as an example, Foertter said she can record reams and reams of standard driving data. “But there will be one day where there is a clown on a unicycle carrying an elephant, and that’s not in the training set and the car won’t know what to do.” Generating enough of these edge-case scenarios is becoming more and more important, she said.
Getting it Right, Getting it Wrong
Gloss asked Foertter to break down what she sees companies doing right and wrong with AI in the life sciences space, and again she spoke with candor.
First and foremost, any company with a human end customer should be working with an ethicist, she said. Someone needs to be thinking carefully about all the risks of manipulating the data.
Next, “Stop buying AI startups,” she advises. Companies are much better off doing a partnership with a startup, giving the company some real data to play with and seeing if the start-up’s models actually survive.
Finally, she says, we must learn to share our data. “What’s going to change the world is when we start learning how to share data with one another. The differentiator is going to be how you do the service, and not necessarily the data that you hold. People who understand that and see that will actually benefit far more than people who are thinking that their data is gold, and they have a lot more than the next person.”
Trends from the Trenches Podcast
Bio-IT World’s Trends from the Trenches podcast delivers your insider’s look at the science, technology, and executive trends driving the life sciences through conversations with industry leaders. As host, BioTeam co-founder Stan Gloss brings years of industry experience in science, data, and technology to conversations exploring what is driving data and discovery, and what’s coming next.
Catch up on earlier episodes on NIH’s Strategic Plan for Data Science, building AI/ML models for drug discovery, the evolution of supercomputing, digitization vs. digital transformation at Alnylam, AWS’s advice on digital transformation, NCI’s Commons of Commons approach to data management, and George Church on the value of neurodiversity.