The Genomics of Infectious Diseases

August 7, 2014

By Aaron Krol 
 
August 7, 2014 | Twenty years ago, at the Institute for Genomic Research (TIGR) in Rockville, Maryland, Claire Fraser became one of the first scientists in the world to take part in sequencing an entire genome. In 1995, she was part of a large team that produced the whole genome of Haemophilis influenzae, the first free-living organism ever sequenced. That project was also the first appearance of whole genome shotgun assembly, a now widely-used method by which a genome is split into random fragments, then reassembled by a computer. Fraser would go on to become TIGR’s president, leading the teams who sequenced the bacteria behind syphilis and Lyme disease, and eventually the first plant genome, Arabidopsis thaliana.
 
When the genome of H. influenzae was first sequenced, 14 Sanger sequencing machines had to run every day for three months to generate all the DNA data, and a custom-built software program, TIGR ASSEMBLER, took 30 hours to stitch the genetic fragments back together. Today, as director of the Institute for Genome Sciences (IGS) at the University of Maryland in Baltimore, Fraser oversees a battery of sequencers and a bioinformatics core that can turn out thousands of times that amount of sequence in a single day. What hasn’t changed is her interest in the agents of infectious disease, a research area that has been embraced by the genomics revolution.
 
21st century genomics, a field that has transformed our understanding of biology at an incredible pace, has coincided with a period when chronic disease consumes the attention of the biomedical community. With the worst infectious diseases largely eliminated from the developed world, and conditions like cancer and heart disease overwhelming healthcare resources, the highest-profile genomic projects have mainly focused on the genetic factors behind chronic illness. But for billions of people around the globe, infectious disease is still a daily reality, and emerging problems like antibiotic resistance threaten to reverse a century of gains even in the wealthiest countries. In this environment, pathogen researchers are still racing to catch up with data-hungry approaches, like population genetics, transcriptomics, and metagenomics, which are almost routine in other fields.
 
This summer, Fraser’s IGS received a $15.2 million, five-year grant from the National Institute of Allergy and Infectious Diseases (NIAID) to establish a Genome Center for Infectious Diseases (GCID), which will be dedicated to bringing new genomics technologies to bear on the bacteria, fungi, and parasites that cause disease in humans and livestock. The GCID will be the third NIAID-funded pathogen-sequencing center Fraser has overseen, through a decade when projects have evolved from creating single reference genomes, to using genetic markers to track complex interactions between hosts and pathogens over time.
 
“It’s been very exciting to see how the field has evolved,” Fraser tells Bio-IT World. “With the first round of funding that began in 2004, the projects we took on were often requests to sequence one isolate of a particular pathogen… Ten years ago, I’m not sure we would have been able to anticipate where we’d be today.”
 
Growing Ambitions
 
Fraser’s first NIAID-funded center, originally located at TIGR, was called the Microbial Sequencing Center, and it existed largely to save other institutions the expense of going after new bacterial genomes. “The microbial genetics revolution was in full swing, and NIAID was getting a lot of applications requesting money to sequence high-priority pathogens,” says Fraser. “Back in 2004, these were still very expensive projects, and NIAID was concerned about inadvertent duplication of effort.” By concentrating new sequencing projects at a few institutes, NIAID hoped to collect important reference genomes without wasting funds.
 
By 2009, when the Microbial Sequencing Center was replaced with a Genome Sequencing Center for Infectious Diseases, redundancy was no longer an issue; in fact, Fraser’s team was now expected to produce genomes on an industrial scale. Three GSCIDs were established: at the IGS in Baltimore, the Broad Institute in Cambridge, and the J. Craig Venter Institute in La Jolla, California. As the cost of sequencing plummeted, these Centers began tackling hundreds of clinical isolates from the same species, to identify important sources of variation in pathogen populations.
 
With this comparative information, the IGS could start to draw meaningful distinctions between strains of the same organism. Projects under the GSCID have included an effort to map the phylogeny of strains of Clostridium difficile, a common hospital-acquired infection; and a search for mutations in Acinetobacter baumanii that confer resistance to the last-resort antibiotic carbapenem. Related work has also given rise to new publicly available software tools, such as Phylomark, which the IGS created to search sets of whole genomes for DNA markers that can be used to trace the phylogenetic relationships between different strains of bacteria.
 
While large comparative studies like these have edged researchers closer to a functional understanding of pathogen genetics, the focus has remained on bulk collection of data. NIAID requires the Center to make all data public as quickly as possible, mainly through NCBI (National Center for Biotechnology Information) websites. This has contributed to an exponential growth of pathogen genomes in popular databases like GenBank, but little of this data has so far translated into new approaches to fighting infectious disease.
 
Pathogens in the Body 
 
The new GCID, which will again join similar centers at the Broad and the Venter Institute, represents more of a philosophical shift. The IGS will still produce genomic data, in greater and greater volumes, but sequencing is increasingly just the first step toward a deeper understanding of how pathogens survive, thrive, and cause disease. “I think there was a realization that getting at important biological questions was going to require us to do more than just generate DNA sequence,” says Fraser. “[Whole genome sequencing] is obviously a very powerful tool, but it is just one tool that scientists can use today to tackle important questions in infectious disease research.”
 
Fraser’s own research interests have lately centered on the human microbiome, the collection of microbes that live in and on us. As new metagenomic tools are built to handle the muddle of genetic information that comes from sequencing these communities, the microbiome has become a hot topic of research in connection with chronic diseases like diabetes and autoimmune disorders. Much less attention, however, has been paid to its role in infectious disease.
 
“That needs to change,” says Fraser. “It’s surprising to me that it’s been the case.” While her personal research on the microbiome has looked at chronic diseases like Crohn’s and obesity, she has also studied whether infections by bacteria like Shigella and Salmonella can be impacted by certain conditions in the gut microbiome. “If you want to understand the outcome of infectious disease, you have to look beyond just the virulence of a given isolate,” she adds. “Depending on the host context in which pathogens are causing infection — and in the case of enteric infections, [that includes] the composition of the microbiome — you really need to take a more systems approach.”
 
One arm of the new GCID, led by David Rasko, will work expressly on these microbiome-pathogen interactions. Rasko’s group will take advantage of bacterial vaccine studies that are ongoing at the University of Maryland, which provide a rare opportunity to look at patients purposefully infected with disease-causing bacteria. By sequencing the microbiota of these patients before, during, and after infection, the GCID hopes to discover whether factors in the human microbiome can have a positive effect on vaccine response.
 
“We are hoping to find that there may be a certain composition, or dominant members of a community, which correlate with a high degree of protection,” says Fraser. “We could then go on to some very specific studies to address whether those are properties that could be transferred, through fecal transplantation or development of new probiotics.”
 
While Fraser stresses that any therapy inspired by this research will be a much longer-term goal, the suggestion does indicate a maturing field. Not long ago, it would have been unimaginable to sequence the random assortment of bacteria in the human gut, repeat this process in enough people to pick out genes associated with positive disease outcomes, and try to match those genes back to the species of bacteria that carry them. Today, with better metagenomics tools, high-throughput sequencers, and a growing log of bacterial reference genomes, it’s a project worth pursuing.
 
Emerging Diseases 
 
With the support of the NIAID, it’s not surprising that the GSCID has often taken part in the response to urgent public health problems. Among other projects, the Center has undertaken a large phylogenetic study of the cholera bacterium in Haiti to help trace the source of the massive 2010 outbreak, and run transcriptomic studies of human and monkey cells challenged with the emerging MERS coronavirus.
 
In the same vein, the new GCID grant will include funding for research into artemesinin resistance in Plasmodium falciparum, the parasite that causes the most deadly type of malaria. Artemisinin combination therapy remains the state-of-the-art treatment for malaria, but a growing number of cases in parts of Southeast Asia are resisting the standard three-day course of treatment. Researchers fear that artemisinin-resistant Plasmodium will soon reach major centers of travel in Thailand or India, at which point it will be nearly impossible to stop its spread to Africa where the vast majority of malaria deaths occur each year.
 
In collaboration with Chris Plowe of the University of Maryland’s Center for Vaccine Development, and an international network of field investigators, the GCID will try to determine where and how drug-resistant malaria emerges. “The IGS investigator on this project, Joana Silva, has been interested in population genomics and evolutionary biology,” says Fraser. “Are these mutations arising independently? Are they somehow being transferred? By looking at longitudinal studies, can you begin to see drug resistance develop as a series of steps, and if so, can you identify key mutations that may predict the emergence of artemisinin resistance?”
 
Genomic studies of P. falciparum are always a difficult proposition. While the parasite was first sequenced in 2002 — by a consortium that included Claire Fraser at TIGR — its genome has always been hard to read and assemble, thanks to an unusual dependence on just two of the four DNA bases. As Joana Silva, who along with Julie Hotopp will lead the GCID’s parasitic tropical diseases arm, tells Bio-IT World, “it took six years to complete the original genome using Sanger sequencing technologies, and the primary reason is the high AT content. This parasite is about 80% AT.”
 
Sequencing Plasmodium genomes rapidly and reliably enough for population studies has required some innovations at the IGS. The Institute has developed a hybrid approach using both ultra-high-throughput Illumina sequencers, and the long-reading PacBio instrument. By splitting the genome into long DNA fragments, in the range of ten thousand bases, PacBio sequencers can more accurately order the AT-rich genome in a basic scaffold, which can then be filled in for accuracy with short Illumina reads at very high coverage.
 
The GCID will take that approach with dozens of cryopreserved samples, says Silva, “to create genomic references for different locations in the globe, in particular in Africa and Southeast Asia.” These references can then be used as a baseline for later population studies, to help infer the geographic origins of migrating strains, or simply to get a better picture of local Plasmodium diversity. “Once we generate genome assemblies, we can also annotate them,” adds Silva, “and that data is disseminated by a conjunction of GenBank databases and EuPathDB, which is a database for eukaryotic pathogens.”
 
A whole-genome view of the parasite might also help to predict where drug-resistant strains are likely to arise in the future. By comparing the full scope of genetic variation in different areas, the GCID hopes to recreate the paths by which malaria travels. “The idea is to determine if there are standard migration routes for the parasite,” says Silva. “If we look at Cambodia as a source,” where certain key resistance mutations have been found to originate, “what tends to be the strongest path of migration of parasites in the region? That might inform how to keep track of the spread of resistance mutations.”
 
The team led by Silva, in conjunction with Chris Plowe, will also be testing a new SNP chip developed by Plowe’s lab, which looks at over 33,000 genomic variants including those in and around key drug resistance loci in the Plasmodium genome. This chip takes advantage of the recent discovery that a large number of mutations in separate artemisinin-resistant parasites all occur in a single domain of a single gene, coding for the kelch 13 (K13) protein. The mechanism of resistance that might be conferred by these mutations is still mysterious, but they have become a core focus of malaria control programs.
 
While the new SNP chip will be of more limited use than whole genome sequencing, it could be a powerful tool for surveillance, and it can be used on a much greater scale. As starting material, this chip relies on drops of malaria-infected blood collected on paper, and the ability to do genome-wide analyses on these easily obtainable clinical samples makes it possible to relate parasite genetic differences to responses to drugs and vaccines in large clinical trials and field studies. Silva expects to run the chip on thousands of samples in the course of her work under the GCID grant.
 
The Course of Infection 
 
Yet another piece of the malaria project will take advantage of a vaccine trial run by the company Sanaria, which is attempting to inoculate children in Tanzania against the disease using frozen sporozoites, a stage of the Plasmodium lifecycle. Sanaria is supplying the GCID with blood samples from participants in the trial, including controls who did not receive the vaccine but still did not develop malaria.
 
The GCID plans to use peripheral blood mononuclear cells (PBMCs) from similar samples from previous malaria vaccine trials to look at host RNA expression during a malaria infection, to see whether specific genes are expressed more or less in humans with greater resistance to the parasite. “We selected a few children that never had an episode of malaria [during the trial], and others who had two or more, and we’re going to expose the PBMCs from these children to lysates of Plasmodium,” says Silva. “We’re trying to get at the host part of the equation, and see what acquired immunity might look like in these two groups.”
 
The turn toward transcriptomics is a promising direction for infectious disease research. No pathogen works in isolation; its life in the body is a constant battle with host cells, and action on both sides can change the course of a disease. Measuring RNA expression at discrete moments gives a more holistic, and more longitudinal, view of an infection.
 
Transcriptomics has, however, seen less use in infectious disease research than in studies of chronic conditions like cancer, largely because pathogen RNA is very hard to recover. “When you are infecting mammalian hosts, and trying to capture RNA from both host and pathogen in the same sample, you need to be convinced that you can see enough signal from the pathogen against a very large background of human host RNA,” explains Fraser.
 
A previous GSCID project on chlamydia took a new approach to this problem by collecting both host and pathogen RNA together, then enriching the total mRNA content without trying to separate out the human material. This resulted in a mixed mRNA sample that could be sequenced sensitively enough to preserve the bacterial transcriptome.
 
With this proof of concept established, the GCID will be making much more liberal use of RNA sequencing in future projects. “Our fungal project,” the third arm of the Center, “is essentially all transcriptomics, which is very much a shift in how we’re using the same technology,” says Fraser. “We’re able to look at the infectious disease process through a somewhat different lens.”
 
All these changes amount to a more complete understanding of how pathogens operate than was possible even a few years ago. “The projects are much more based on doing biology,” says Fraser, rather than blindly churning out data.
 
In the days of TIGR, efforts to bring genomics to infectious disease were somewhat abstract. Scientists like Fraser built unfamiliar genomes, one by one, with little sense of how they functioned or which genes would prove significant. Today, through programs like the GCID, the field is finally coming full circle to the kind of research that has always led to advances in fighting disease: observing these organisms in their own environments, and learning what it takes for them to survive.