Multi-Lab Preclinical Model Aims To Solve The Reproducibility Crisis

By Deborah Borfitz

October 18, 2023 | A new preclinical system put to the test by a network of labs to adaptively study a handful of stroke treatments has succeeded in demonstrating that high-quality science is “feasible, practical, and less expensive than doing things the wrong way,” according to Patrick D. Lyden, M.D., professor of physiology and neuroscience and of neurology at the Keck School of Medicine of University of Southern California. The approach mimics typical features of clinical trials—prefixed sample size, prespecified endpoint, treatment concealment, randomization, and blinded assessment of outcomes—while also discerning futility earlier than the study’s conclusion.

The multi-arm, multi-stage (MAMS) trial was conducted by the Stroke Preclinical Assessment Network (SPAN), comprising six labs doing the testing and a coordinating center headed by Lyden to mask the drugs, conduct centralized randomization, assess blinded outcomes, collect data, and perform statistics, he says. Six candidate treatments were assessed, in combination with intravascular revascularization—a model that mimics thrombectomy in humans—but only one (uric acid) met the primary outcome and is now being moved into clinical trials by the University of Iowa.

For decades now, it has been acknowledged that showing benefit in more than one lab is crucial to addressing the reproducibility crisis, says Lyden. Translational scientists have also long been advised to use animal models that resemble typical stroke patients who are generally older, have comorbid conditions, and are not all males.

SPAN’s study labs and coordinating center succeeded in implementing these widespread recommendations after responding to separate requests for applications from the National Institute of Neurological Disorders and Stroke (NINDS).

Because the first year of the awarded grants coincided with the start of the COVID pandemic, Lyden and his team unexpectedly found themselves with a lot of free time to figure out how to overcome dozens of operational barriers. He and his longtime colleagues Jessica Lamb and Karisma Nagarkatti literally pulled out a big piece of blank paper and sketched out all the particulars, Lyden says.

Simply finding the right type of labels to affix to the vials was an ordeal, since they had to ascertain which type wouldn’t freeze, melt, or fall off during shipment and storage, continues Lyden, in providing one example of the dozens of operational decisions. They also had to be sure the ink was available in a colorblind friendly palette, and that the labels could be printed without destroying the backing. Many hundreds of similar operational decisions had to be researched and resolved.

The protocols are spelled out in a supplement to the research article that published recently in Science Translational Medicine (DOI: 10.1126/scitranslmed.adg8656), Lyden reports. Anyone interested in replicating the model, which is the idea here, has now effectively been handed the roadmap. An even more detailed paper, with step-by-step instructions for setting up a lab network, is also soon to post on the grassroots Bio-protocol Exchange platform advocating for research reproducibility.

Costly Distractions

Interest in stroke research is high, which is perhaps unsurprising given that the disease is one of the leading causes of disability in the world. It also didn’t hurt that the field had an early win with the pivotal, NINDS-supported trial showing that treating stroke survivors with tissue plasminogen activator (tPA) within three hours of symptom onset significantly minimizes disability, says Lyden.

But the translational gaps between the animal models used in preclinical trials and the human population they are meant to represent are common to the broader world of research, including Alzheimer’s disease, epilepsy, and traumatic brain injury, Lyden says. Results of many basic scientific studies have likewise proven difficult or impossible to reproduce, fueling his hope that the success of the model system created for acute ischemic stroke will be replicated across all domains of study.

Scientists have long agreed that animal models of a disease should resemble their human counterparts—meaning, in the case of stroke, rodents of both sexes who are older and hypertensive with diet-induced obesity or hyperglycemia—so treatments that are going to fail do so earlier in the drug development process. Yet they persist in “doing it wrong,” distracted by the desire to complete the experiments and publish the results quickly and cheaply, says Lyden.

Speed is typically associated with the use of young animals who are preferentially male on the theory that females, owing to their estrus cycle, have a different response to the disease insult, he continues. “This is certainly true in stroke where at different points in the cycle female rats might have a different size stroke, but at the end of that experiment if you use only young, male animals you know [only] that the drug works in the equivalent of a teenage male who is unlikely to get a stroke.”

On the other hand, “we don’t really know if elderly mice are more like an elderly human than a young mouse,” meaning there may not really be a benefit to mirroring age in the animal models, adds Lyden. The rationale is clearer with hypertension and diabetes because the physiology of the disease in mice resembles that in humans.

Minimizing Biases

The approach taken by the SPAN could be implemented with almost any preclinical model imaginable, including ones where animals are modeling Alzheimer’s disease or organoids are being used rather than mice, says Lyden. In all those scenarios, for example, treatment concealment is rarely done simply because it is hard to do.

Blinding, where readouts might be done by an investigator who doesn’t know which animals received placebo versus the treatment, “is a little more widespread and acceptable,” he says. Concealment is different, referring to which drug is being used, which the SPAN accomplished by the creation of a centralized system for placing different drug treatments into identical vials that were coded with a number corresponding to the specific drug or placebo—as known only by the coordinating center, which wasn’t doing any of the testing. The six labs were emailed instructions regarding which coded vial to use.

One of the six treatments involved a medical device that could not be concealed.

The randomization process has effectively been standardized under the new preclinical model. Animals in the lab have for years been picked out of a box “at random,” notes Lyden, which doesn’t in fact ensure that each one has an equal chance of receiving any of the treatments under study. Either consciously or subconsciously, researchers may pick up a more active animal to get the placebo. Similarly, surgeon-investigators are subject to potential unconscious biases such as how well they control their hands or administer anesthesia, or even how they respond to the temperature of the room. All of these sources of bias are mitigated by concealing the identity of the drug from the surgeons, and through centralized randomization.

The reality in some labs at present is that researchers “keep doing more and more animals until they get the answer they want and the animals that are discarded are never known about,” what is referred to as attrition bias. The SPAN approach involved having the six labs affix a numbered ear tag on the mice when they arrived and enter those numbers into a database. When it was time to randomize, they first completed an “intention to treat” form indicating the identification number, sex, and weight of the animals, along with the intended surgery date.

Randomization was then initiated by the coordinating center. “There was no opportunity for subterfuge,” says Lyden.

Adaptive Design

The statistical method used to evaluate efficacy of the treatments—each proposed by one of the NINDS-selected study labs—was motivated by a desire to minimize the number of animals required to get an answer, Lyden says. The MAMS methodology, derived from an adaptive study design used in clinical trials for the past two decades, involved an interim analysis of results based on pre-specified criteria for futility and efficacy when 25%, 50%, and 75% of the total sample size (corresponding to stages 1, 2, and 3) were randomized.

At each of those time points, enrollment stopped for any intervention found to be ineffective. None of the interventions were eliminated at stage 1, which included only young, healthy mice. After stage 2, which included aging mice, spontaneously hypertensive rats, and diet-induced obese mice, half of the interventions were eliminated. And at the end of stage 3, which included the same three comorbidity models, two of the three remaining treatments were eliminated.

Only uric acid continued to stage 4 for confirmation of benefit in young, healthy rats with no comorbidities, and at the study’s conclusion was found to be a “potent cerebral protectant.” The caveat here, Lyden is quick to point out, is that the compound showed benefit only in terms of the primary endpoint—the corner test for asymmetric turning—and none of the secondary endpoints, including a grid walk test, hanging wire test, and MRI scan to look at stroke size.

Lyden considers this a “cautionary red flag,” and says it merits asking why that happened and what it means. His best guess is that the secondary endpoints were inadequate. But it could also be that “there’s something funny about uric acid.”

Another cause for pause is that uric acid has previously been studied in people and failed to meet its primary endpoint in one clinical trial. “It’s a confusing and puzzling story, so we did not in the paper endorse it unequivocally [and] wholeheartedly,” says Lyden.

SPAN 2.0 is underway to evaluate five new acute ischemic stroke treatments, he notes. This study will again have six labs doing the testing, the second of which is now launching.