Sequencing Reveals Gene Regulation in E. Coli
By Bio-IT World Staff
November 7, 2012 | Eric Schadt and colleagues at Mount Sinai School of Medicine and Harvard Medical School have used single-molecule real-time (SMRT) DNA sequencing to determine mechanisms of gene regulation in the E. coli bacteria involved in the deadly outbreak in Germany in May-June 2011. Published online today in Nature Biotechnology, the findings provide novel insights on the role of epigenetic DNA base modifications in driving molecular processes of the E. coli strain.
In 2011, the E. coli outbreak affected thousands of people; 50 were killed and nearly 1,000 suffered kidney failure from hemolytic-uremic syndrome. At the time, Eric Schadt was among researchers from around the world conducting DNA sequencing of the outbreak strain, along with 11 related strains, to provide a detailed identification of the outbreak strain and insights regarding the strain’s evolutionary origins. (BGI was the first group to sequence the strain with a Life Technologies instrument, though many groups worked on the problem.) Despite the valuable findings from the initial work, the DNA sequence alone did not fully explain the unusually high virulence seen in the E. coli outbreak.
Now, Schadt, director of the Institute for Genomics and Multiscale Biology at Mount Sinai School of Medicine and CSO of Pacific Biosciences, and co-author Matthew Waldor, professor of Medicine at Harvard Medical School, have applied PacBio’s SMRT sequencing genome-wide to examine the chemical modifications to individual DNA bases.
“The information content of the genetic code is not limited to the primary nucleotide sequence of A’s, G’s, C’s and T’s,” said Schadt, in a press release. “Individual DNA bases can be chemically modified, changing how proteins interact with that particular sequence and as a result having significant functional consequences. Without genome-wide DNA base modification information, you simply don’t have a complete picture of all the variation and the phenotypic variability that we see.”
Despite the importance of capturing base modification information during genomic analysis, limitations in sequencing technology have made it difficult for scientists to study the many types of base modifications that occur in nature. The team employed the PacBio RS system from Pacific Biosciences, which can collect data on base modifications simultaneously as it collects DNA sequence data. The researchers applied advanced algorithms published online October 23 in Genome Research in the first paper to statistically model PacBio sequencing data for epigenetics.
Waldor commented: "By enabling genome-wide detection of chemical modifications of DNA, PacBio sequencing opens a new window in our understanding of epigenetic mechanisms in bacteria.”
The researchers detected widespread base modifications in the genome of the E. coli outbreak strain, identifying ~50,000 methyladenine modified bases. In collaboration with Richard Roberts, CSO at New England BioLabs, Schadt and team discovered a series of enzymes that appeared to target specific DNA sequence motifs throughout the genome as they made their chemical changes. For example, the well-known Dam methyltransferase was directly observed to target the A residue in DNA with the sequence motif GATC, while a different, novel methyltransferase acts on the ACCACC motif.
“We found a whole array of methylase enzymes that were making modifications by targeting different sequence motifs,” said Schadt. “It almost appears like another language. The DNA bases targeted for modification are highly non-random, and the targeting had a broad effect on the transcription of genes.”
The team followed up the base modification study with RNA sequencing to determine how the genes inducing these epigenetic marks were affecting the transcriptome. “The accepted dogma for the primary role of restriction modification systems is defense of host cells from invasion by foreign DNA” Schadt says. “However, we found that these modifications had a very significant impact on the transcription of genes, and that the genes being affected were enriched in a number of different pathways.”
Notably, the team found marked enrichment for pathways linked to processes associated with bacterial growth and motility. Throughout the organism’s genome, many pathways were up- or down-regulated by one of the methylases found in a mobile element next to the Shiga toxin gene, a gene known to impact virulence. That methylase, targeting the motif CTGCAG, was specific to the outbreak strain, but absent in several non-virulent strains that were also studied. In addition, upon removing this methylase in a knock out experiment, structural changes in the genome of the outbreak strain were observed, providing for the potential role of further affecting processes associated with virulence and pathogenicity.
This research highlights the importance of generating more dimensions of data— such as DNA, DNA base modifications, RNA, and proteins — and then integrating to form a multiscale view of the organism’s biology. “Living systems are composed of lots of pieces interacting in very complex ways,” Schadt says. “To understand such systems, we need to take into account more of the information on a global level, not just a single protein level. This is how to see the whole picture of an organism’s biology.”
Study authors include scientists from Harvard Medical School; Howard Hughes Medical Institute; University of Minnesota; New England Biolabs; University of Michigan; Pacific Biosciences; Stanford University.