AI in Healthcare: How To Assess What Works
By Allison Proffitt
February 18, 2025 | NASHVILLE—At a panel discussion this week at the ViVE conference in Nashville, Tenn., researchers and physicians discussed the promise and hype of AI. Moderated by Alexander Morgan, partner at Khosla Ventures, the discussion was wide ranging and discussed the AI tools ready for use by physicians; which classes of AI are not yet delivering; and how to assess what’s right for you.
Morgan asked his panelists which of the AI tools are working at scale, which are, “read for prime time”? Michael Pfeffer, Chief Information and Digital Officer at Stanford Medicine immediately suggested AI scribes. This is not a particularly new use case, Pfeffer said. “We’ve been trying to figure out how to write notes by listening to a conversation for 15 years.” Now, though, the technology is working. In the Stanford Medicine organization, Pfeffer reported that AI scribes have been scaled across more than 700 physicians. “The love it!” he added. “We have a pretty sophisticated way of measuring burnout, and it showed lower burnout scores.” The future potential of the technology is also exciting, he said. Stanford Medicine uses DAX Copilot, a Microsoft product, Pfeffer reported, and he foresees it becoming a platform more than just an app.
There are many AI scribe vendors in the space today. The differentiators, as Pfeffer sees them, are whether the technology works, whether it scales well, and whether the company has longevity. “It takes a long time to implement,” he said. Every physician is trained on the technology. “That’s where we’re seeing value; you have to have the training to use it.”
Rohit Chandra, EVP and Chief Digital Officer at Cleveland Clinic, agreed that AI scribes are a particularly nice use case. “It’s a perfect product in that the technology is ready and it has the correct and convenient capability to deliver. It’s excellent for patients, excellent for providers.” Like Pfeffer, Chandra encouraged the audience to consider longevity—not just in vendors but in the technologies overall. AI is a long game, he said, and as we solve these initial problems well, there will be bigger and bigger problems that can be transformed with AI.
Suchi Saria, CEO and founder of Bayesian Health, has a slightly different perspective. A professor of engineering and public health as well as director the AI and Healthcare Lab at Johns Hopkins University, Saria is both on the research side and startup side of the AI coin. Medical scribing and dictation is a “first line application”, Saria said. “As you see the amount of data that’s collected in the EMR and the use of that data, the reality is patients are only getting more complex. Our margins are declining as health system. Systems are growing… We need a way to leverage this data to bring efficiency in the way we operate,” she said.
Bayesian Health uses data in major electronic medical record (EMR) systems to help predict costly complications such as sepsis, deterioration, and pressure ulcers so frontline providers can prioritize attention to high-risk cases. But Saria emphasizes the goal of predicting complications accurately and without “alert fatigue.”
Smoke and Mirrors
Morgan next asked which tools are more marketing than material. “I’m excited about agents, but I don’t think they’re ready for prime time yet,” Pfeffer said. “I still think clinical decision support is hard; summarization of things is hard. We don’t have ways to monitor the output of GenAI. So how can you say for certain that when you summarize the chart, it’s going to be accurate enough? We still have work to do in these spaces.”
Then sharing, perhaps, the “Unpopular Opinion” of the day, Pfeffer said that any tool touting human-in-the-loop is a “huge red flag.” “If you’re expecting the humans to verify all of this AI all the time, it’s never going to work!”
Chandra does not believe any AI is ready for front line clinical care. Echoing Pfeffer’s concern, he recalled various app pitches where he asked if the developers were ready for a lawsuit of, “some flavor of malpractice. They avoid eye contact.”
That’s not to say either panelist was not hopeful about the possibilities coming soon. In fact, Pfeffer said one of the most successful programs Stanford Medicine has rolled out is to offer secure access to many models via a portal for all employees and clinicians. Users can access one of about ten models and they can put patient information in them and explore.
“People love it!” he said. “They’re using it. With no other technology have we had that opportunity. The creativity that’s happening is coming from the ground floor on a lot of this stuff and I think that’s really exciting. We’re learning how people are using it, and through that we’re developing the next solutions based on how our population sees these models being successful.”
Measures of Success
As these new tools and models come on the scene, value measures are key, Morgan said. How should we measure success in an AI tool?
Pfeffer referred to his earlier metric of physician burnout. Stanford uses the FURM framework for evaluating Fair, Useful, and Reliable AI Models in healthcare systems developed by Nigam H. Shah, Chief Data Scientist at Stanford Healthcare. When evaluating AI scribe tools, for instance, Pfeffer said he did not look simply for speed. Instead, he wanted to see that it was making a physician’s life and work easier, and he measured those outcomes with the burnout metric and by physician turnover. “One measure could be if I turn it off and I get hundreds of angry emails, you know?” he quipped. “That doesn’t happen that often in technology.”
Chandra pointed out that he does not expect perfection “on day one” of rolling out a new AI tool, but he does look for transformation over the long term.
Value isn’t only measured in money, Pfeffer added. “There are going to be things you decide to do as a health system that’s the right thing for patients even if it’s not financially great.” But there must be a financial balance somewhere. “When you look at balancing your portfolio, it you always put first what’s the best thing for the clinician, then you’re going to be moving in the right direction.”
Saria echoed similar sentiments, having been part of three large health systems as well as on the developer side. “I often [tell my team] to think about how to make sure we make lives easy for every stakeholder that we’re touching,” she said. For the IT stakeholders, tools need to be easy to maintain and easy to integrate. For frontline users, tools should be easy to use; it needs to simplify their life. And for administrative stakeholders, tools must fulfill the key value levers that brought you to these stakeholders in the first place.
Saria values honest value delivery above all else. “I feel like sometimes, there’s so much hoopla. Startup life feels a little bit like going to prom. There’s always show and tell and fancy feathers and who’s raising how much and who has this press release. I find it all a big, silly show,” she said. “I think it’s just so much easier if you just focus on the bottom line value to your key stakeholders, make sure you’re delivering that [value] time after time after time.”