Major Update to the Human Reference Genome
By Bio-IT World Staff
December 26, 2013 | The Genome Reference Consortium (GRC), the research collaborative responsible for maintaining a comprehensive reference genome for humans, mice, and zebrafish, has released its first complete update to the human reference genome in nearly half a decade. Reference genomes are crucial to genomic research, providing a baseline for assembling genetic fragments into a full picture of the genome, and comparing rare variants against "normal" sequences. Because assembling a reference genome is a far more complex task than even ordinary whole genome sequencing – which can rely on existing reference genomes to align fragments and fill in difficult-to-capture gaps – a large international team, including the Wellcome Trust Sanger Institute, the European Bioinformatics Institute, the Genome Institute of Washington University, the National Center for Biotechnology Information (NCBI), and others, cooperates to maintain this publicly-available resource.
The new reference genome, dubbed GRCh38, corrects various assembly errors in its predecessor, closes or shrinks many of the gaps in the genome, contains the first rough sequences of the centromeres at the heart of each chromosome, and adds new "alternate" sequences for areas where no single reference can adequately describe a "normal" human genome. GRCh38 includes anywhere between two and six alternate sequences for over 250 regions; the record by far is the LRC/KIR complex, a gene cluster on chromosome 19 involved in the immune system, for which 35 alternate sequences are provided. Only 72 regions had alternate sequences in the previous reference genome, making the new resource much more powerful in describing the most diverse areas of the human genetic code.
The GRC does not annotate its reference genomes, so for now, GRCh38 is available only in the form of raw nucleic acid bases, which can be downloaded through the NCBI. However, the NCBI genome browser, along with other public tools like Ensembl and the UC Santa Cruz Genome Browser, will shortly be updated to link the GRCh38 sequences with specific genes and their functions. GRCh38 is the first update to the human reference genome since GRCh37 was released in February of 2009; although patches to GRCh37 have kept the reference genome up to date in the meantime, the new release is a welcome upgrade that will contribute to countless studies of human genetics.
This April, Bio-IT World spoke with NCBI Coordinator of Variation Resources Deanna Church about the GRC and the reference genome. You can read the full interview here.