New Math Model Could Impact The Study Of Rapidly Evolving Diseases
By Paul Nicolaus
May 20, 2019 | Researchers at Florida State University have developed a computational model with the potential to change the way researchers approach population genetics and the spread of diseases that evolve quickly in response to different environments.
"This method is the first application of fractional calculus to population genetics," said Florida State University postdoctoral mathematician Somayeh Mashayekhi. It allows for variation in simulations designed to identify the origins of genetic variations in a species.
Population genetics came about in the early 1900s, explained Florida State University computational biologist Peter Beerli, and for nearly a century people mainly used frequency calculations to forecast the future. In the 1980s, however, British mathematician John Kingman came up with a retrospective way of looking at the history of a population by tracing sample individuals into the past.
Known as the n-coalescent, Kingman's method allowed scientists to calculate the probabilities of relationships and gather insight into the interactions that shape variability in a species over time. Since the 80s, Beerli said, this was essentially state-of-the-art in population genetics, and people have developed many tools based upon this model.
The problem is that the n-coalescent makes some assumptions about individual populations. When considering an influenza strain, for example, the assumption is that all influenza viruses encounter exactly the same problem in every human, Beerli explained. In other words, it assumes a similar environment for all individuals. But not every human reacts the same way when infected with a pathogen.
"If you want to follow the history of the pathogen, then you should actually take into account that the environment for the pathogen is heterogeneous," he said, and fractional calculus makes it possible to introduce this heterogeneity into the similar but more limited n-coalescent model. The result, according to the researchers, is a better sense of when different genetic variations emerge in a population.
The coalescent is a fundamental idea in population genetics that makes it easier to think about how related individuals are in populations and across many populations, explained Daniel Bolnick, a professor of ecology and evolutionary biology at the University of Connecticut who was not involved in the study. When the tool was first introduced in the 80s, "it completely revolutionized the field of population genetics," he said, and "a lot of how we think about evolution and biology today draws on this idea of the coalescent process."
It assumed, however, that the extent to which some individuals leave more offspring than others is just happenstance and has nothing to do with the genes that they carry. "They needed to make that assumption for the math to work because otherwise it got too complicated," he said. It's an odd assumption to make, though, considering one of the fundamental ideas of evolution in biology is that natural selection takes place and individuals carrying certain genes are more likely to contribute to the next generation than others.
So this fundamental evolutionary model, the coalescent, essentially says that everybody has an equal lottery ticket to contribute to the next generation. Not everybody wins that lottery, but everybody has an equal ticket. It fundamentally contradicts the idea of natural selection. The recently published work, according to Bolnick, "is the first paper to really effectively crack that disconnect" and "bridge these two ideas."
The Florida State researchers "are offering a set of mathematical tools that will allow us to do these same calculations but now with the possibility that some individuals are going to be much more likely to reproduce than other individuals. It's not just that some individuals win the lottery and others don't; it's that some have multiple lottery tickets. And that previously wasn't available as a method."
New Model Could Improve Estimates
In their study, which was published March 26 in the journal Proceedings of the National Academy of Sciences (doi: 10.1073/pnas.1810239116), Beerli and Mashayekhi applied their new model (called the f-coalescent for its use of fractional calculus) to three datasets that included the mitochondria sequence data of humpback whales, the mitochondrial data of a malaria parasite, and the complete genome data of an H1N1 influenza virus strain.
Although environmental heterogeneity seemed to make little difference for humpback whales, the influenza and malaria data indicated that heterogeneity ought to be considered when analyzing pathogens that evolve quickly.
Humpback whales have a typical lifespan of several decades and have more than one offspring over time. That means if the ocean is, say, a bit colder one year than the next it might not impact the total number of offspring for this organism all that much, Beerli explained.
Consider malaria, on the other hand. The parasite that causes this disease is a microscopic, single-celled organism, and if that single parasite is not getting into the right mosquito and is not getting into the right human, malaria will not persist, which means the genetic material of that particular organism would be lost. The same is true for influenza. Some human immune systems are more adept at fighting off the influenza particles that matter.
"As a result, we would expect that the environment has a much bigger influence on malaria and influenza than on the humpback whales," he said. "For humpback whales, we are not able to reject the Kingman coalescent process that's based on standard Poisson theory, but we are able to essentially say the Kingman process is not very good in talking about the history of influenza and malaria pathogens."
In that sense, Beerli and Mashayekhi believe their method could offer improvements over the n-coalescent model and lead to more appropriate estimates. "If we look at the results, then we also see that the standard procedures underestimate the diversity in these species," Beerli said, "and as a result we would say the strains that we see for influenza are not that variable or they are not that numerous if you used the standard coalescent process."
Paper Brings Natural Selection Back
For the sake of mathematical simplicity, this branch of evolutionary biology ignored natural selection because they couldn't figure out how to work it in, Bolnick said, "and this paper is providing a way to work natural selection into the picture again."
Bolnick's research interests focus on how evolution maintains genetic variation within species. Most recently, he has studied how parasites and their hosts co-evolve, and how their antagonism shapes variation in host immunity. He is familiar with coalescent models as well as the MIGRATE software that the Florida State researchers are updating with this particular step forward.
The MIGRATE software is freely available, he said, and it aims to estimate two of the key variables in population genetics. "When you want to understand why populations are different from each other genetically, you need to know how much they exchange genes" because when individuals move from one place to another they bring their alleles with them and that tends to make two populations more similar to each other.
Another main process that's going on is genetic drift, or the tendency for small populations to change their gene frequencies through time just because of a random sampling of events. "Some individuals leave more offspring in the long run than others just by random chance, and that leads to changes in the frequencies of genotypes through time," he said. "That can lead to populations looking different from each other just because of this happenstance of who managed to reproduce and who didn't within each population."
What that depends on is a term called the effective population size, which is the number of individuals, on average, who are contributing to the next generation. (Not everyone who is alive at one point in time is actually contributing effectively to the next generation, so the effective population size is less than the actual number of individuals.)
The genetic drift is strong—it changes populations quickly—when there are small effective population sizes because random chance has a large impact. When there are large effective population sizes and lots of individuals are reproducing, on the other hand, then the randomness is averaged out and doesn't have much of an effect.
The strength of genetic drift and the strength of migration are two key factors that determine how different populations are from each other, Bolnick said. The MIGRATE software allows users to provide genetic data from different populations and then attempts to estimate the effective population size in each one, along with the extent of the migration, or movement, between them.
The software is a nice tool, he added, "but it's the underlying idea and the underlying math" from the Florida State University researchers "that's more important here."
Still Work to Be Done
"I think there's some work still to do to connect this to the kinds of things that I’m most interested in," Bolnick added, such as examining natural selection in the wild and figuring out what genes natural selection acts on. "This is laying the groundwork to be able to do a better job of estimating that, but they're not quite there yet," he said. This is, however, opening up a new line of research and providing a tool to start to use to solve the problem.
Bolnick envisions a lot of follow-up work to this paper and believes it will end up being implemented in software. Researchers like him who are collecting genetic data will be able to plug those data into these mathematical tools and calculate numbers of interest that reveal how quickly evolution is occurring, for example, and other aspects that researchers previously had to just ignore. "So that's exciting to be able to make those calculations," he added.
Beerli and Mashayekhi agree that there's still work to be done. At the moment, the published theory pertains to a single population but needs to be broadened to account for factors that may impact shifting populations.
If you think about influenza, there are large reservoirs in Southern China, for example, and the American population is another reservoir. They exchange migrants between China and the US as tourists and business people fly sick from one location to the other. "Our current model cannot deal with these types of situations," Beerli said, highlighting a deficiency. "We want to develop the model from one population to more than one population," he added, in addition to working on the computational cost.
"Down the road, we hope that we can develop our method a little bit more," he said. Large scale experiments or large-scale data sets could make it possible to look at the long-term history of, say, influenza parasites. "It's still kind of an open question whether we actually make considerable mistakes if we tried to forecast what's happening using the traditional theory or not," he added. The goal is to compare the forecasting abilities of the standard processes with their new model to figure out which method is more effective.
While there isn't necessarily an immediate benefit to the recently published findings, the paper could wind up helping other researchers figure out ways to take variability of environment into account. "Don't forget," Beerli noted, that "currently all papers that use computer programs in population genetics assume that every population is homogenous."
Paul Nicolaus is a freelance writer specializing in science, nature, and health. Learn more at www.nicolauswriting.com.