Complete Genomics Makes 29 Genome Public
By Bio-IT World Staff
April 7, 2011 | Complete Genomics announced yesterday that it has added 29 high-coverage, complete human genome sequences to its public genomic repository. Combined with the 40 genome datasets that Complete Genomics released on Feb. 3, 2011, this provides the research community with a public data repository of 69 complete human genome datasets.
The 69 genomes included in this public dataset were drawn from two resources housed at the Coriell Institute for Medical Research: the National Institute of General Medical Sciences (NIGMS) Human Genetic Repository and the NHGRI Sample Repository for Human Genetic Research. These 29 newly available datasets include the previously announced Puerto Rican trio from the NHGRI Repository and a 17-member, three-generation CEPH pedigree from the NIGMS Repository, as well as nine ethnically diverse samples from the NHGRI Repository, representing Tuscans from Italy, Gujarati Indians from Houston, Texas, and Maasai from Kinyawa, Kenya. Most of these samples have been previously analyzed as part of the International HapMap Project or the 1000 Genomes Project.
“We are delighted to see how quickly our public multi-genome repository has been embraced by the global research community,” said Clifford Reid, chairman, president and CEO of Complete Genomics in a press release. “Since we launched this initiative on Feb. 3 at Advances in Genome Biology and Technology (AGBT), 30 terabytes of data have been downloaded from our website by more than 550 unique IP addresses. As this data is also available on Bionimbus and DNAnexus mirror sites, the total number of downloads is likely much higher.” (see, “The Complete Picture at Marco Island” Bio-IT World)
This new dataset continues to meet Complete Genomics’ high quality data standards. All the genomes sequenced in this dataset have a median genome call rate of 97.1 percent and a median exome call rate of 96.3 percent.
Because many researchers are interested to learn how Complete Genomics’ data compares with other publicly available datasets, the company has performed the following comparisons. Complete Genomics was able to make a call at a median of 99.34 percent of the HapMap 1 SNP loci genotyped using the Illumina Infinium assay; of these calls, the SNP concordance rate was 99.94 percent. Complete Genomics also was able to make a call at a median of 99.45 percent of the HapMap 3 SNP loci; of these, 99.73 percent were called concordantly. Complete Genomics was able to make a call at a median of 97.28 percent of the 1000 Genomes Project Pilot 2 (high-depth sequencing of trios) SNP loci; 99.70 percent were called concordantly.
Complete Genomics is posting variant reports for each sample, which include SNPs, insertions/deletions, copy number variations and structural variations, and also is providing the read alignments supporting those calls, as well as coverage information and quality scores. This dataset is available on both human genome reference build 36 and 37.
This new data is available for download from either the Complete Genomics website at http://www.completegenomics.com/sequence-data/download-data or the Bionimbus mirror site at http://www.bionimbus.org under “public data”. DNAnexus customers can also gain access to this dataset through that cloud-based platform.