Whole Genome Covid-19 Sequencing Study Proves WGS Ready for Therapeutic Development

By Allison Proffitt

April 14, 2022 | Last month an international team of researchers published whole genome sequencing results comparing thousands of individuals with critical Covid-19 to a control group, discovering and replicating 23 independent variants that significantly predispose an individual to critical Covid-19.

The consortium data came from the GenOMICC study, 23andMe, and the Covid-19 Human Genetics Initiative with additional validation data coming from UK Biobank, AncestryDNA, Penn Medicine Biobank, and Geisinger Health Systems.

“Genomics England did the whole genome sequencing and a genome-wide association study where we tried to associate particular genomic variants in the genome with critical-ness by using the individual cases and participants in another project that we completed a few years ago, the 100,000 Genomes Project,” explained Athanasios Kousathanas, first author on the paper and Principal Genomics Data Scientist at Genomics England.

The Nature paper (DOI:10.1038/s41586-022-04576-6) reveals 23 genomic variants that predispose an individual to severe Covid-19 disease. Five critical Covid-19-associated variants have direct roles in interferon signaling and one risk variant disrupts a nuclear localization signal important for the antiviral effect of interferon. The data also suggests a potentially therapeutic role for mucins in the development of critical Covid-19. Several of these findings are amenable to therapeutic targeting, the authors write, though they warn that, “large-scale randomized trials will be essential before translating our findings into clinical practice.”

The findings have been shared with pharma companies since September, said Kousathanas, “enabling work in clinical trials for particular drugs. There was an urgency. We didn’t want results in 10 years, we wanted them now in terms of loci and targets for treatment.”

But for the Genomics England team, the work represents more than even just therapeutic targets for critical Covid-19, as valuable as those targets may be. The March 2022 paper is the second paper to come out of this Covid-19 dataset, and a third is expected. This goal of the study was to compare the genomes of patients with severe Covid-19 and patients with cases of mild disease, Kousathanas explained. The first publication (DOI: 10.1038/s41586-020-03065-y) reported on 1,339 cases, and this second dataset expands on that. A third publication is expected, explains Francis Carpenter, Senior Product Manager at Genomics England, a final analysis of roughly 30,000 patients, half with critical disease and half with mild Covid-19, plus controls. Those data are now being analyzed.

In total, this paper represents whole genome sequences for more than 55,000 individuals: 7,491 were critically ill patients gathered from 224 intensive care units within the UK health system; 48,400 were controls pulled from the 100,000 Genomes Project or patients with mild Covid-19.

This particular analysis was notable, Carpenter explained, because the same analysis spanned the entire cohort. “In the 100,000 Genomes Project we recruited roughly 100,000 participants,” Carpenter said, “but there’s actually not that many analyses that work across all 100,000 of these participants. For example, within the 100,000 Genomes we might have a few thousand with this kind of cancer. Most of the analysis would focus on a certain set of those cases and a certain set of controls. We’d be more often working in the hundreds to low thousands of cases and controls.”

This dataset—both in size and in the significant urgency of the work—raised some new challenges.

For other large cohorts it may take several months or years to do aggregation, said Kousathanas. In this case the work was done in one month—with analyses repeated again and again because of the novelty of the disease.

“Because we wanted to do this analysis very fast and get the results out quickly to make a difference in this pandemic, we had to do things that would be done in year, we did them in one month. And we did them multiple times!”

So the team just quit eating and sleeping to get it done?

Well, yes, the work represents significant sacrifice of sleep. But Carpenter also said that while the analysis team worked around the clock to complete the work, the experience has also yielded fruit in how projects like this could be sped up in the future.

Experience at Scale

“We’re certainly learning a huge amount about how to make—from an infrastructure perspective—those kinds of workflows performant and scalable,” Carpenter said. “A lot of the transition at the moment is between moving from working from on-premises high-performance compute clusters that have a sort of fixed set of memory and cores—you can buy some more but ultimately it’s limited—and moving much more toward a cloud computing model with scalable compute on demand.” In Genomics England’s case, this means working with Amazon Web Services. “We can recruit thousands of CPUs when we want to run a big analysis in a short time frame, and then spin down and not be paying for them indefinitely. The ability to recruit and auto-scale computational power on demand according to what is needed for the analysis is why a lot of these analyses are moving to the cloud.”

The sequencing and processing itself also introduced some challenges. The sequencing was done on HiSeq and NovaSeq platforms from Illumina, with older 100,000 Genomes samples having been aligned on older pipelines and new Covid-19 samples being aligned on new platforms. This created batch effects, Kousathanas explained, that needed to be addressed.

“The way we managed to control that was we had individuals that were processed with both pipelines… so we could basically figure out the locations of the genome that had these batch effects,” he said. “In the end we had between 8 and 15 million variants that we tested, but what we started with was more than half a billion variants.”

Illumina worked directly with Genomics England to address some of these challenges, Kousathanas said, even changing software source code to make the processing more scalable. “We had a lot of help!”

Finally, Kousathanas points out the new cultural bar set by the work. “We have a new environment where we can summon resources as we want, so this makes things faster to set up and run. We’re trying to make communication between different scientists easier so, for example, this genetic analysis will be possible,” he said. “The fact is that whole genome sequencing, right now, is a new technology, but the more of these projects that we deliver, the more we’ll become mature.

Whole Genome Futures

The study is an important use case for using whole genome sequencing not just for research, but for making healthcare decisions, Kousathanas and Carpenter said.

“There’s an argument that maybe genetic studies take time to find the causal mechanisms. It takes time to use this knowledge or it’s difficult to use this knowledge to create treatments. There could be doubts on the usefulness on this particular type of approach and the amount of expense for this,” Kousathanas acknowledged. “I hope that this proves the value of this type of analysis definitely.”

Carpenter agreed. “For me, one of the very important things to come out of this was the end-to-end proof of value. Genomics has not just done an internal research project on an existing dataset at very large scale, but from inception through to publishing a set of significant findings that I’m sure will change treatment.”