Assembly
This site provides a data set based on the February 2009 Homo sapiens high coverage assembly GRCh37 from the Genome Reference Consortium. This assembly was used by UCSC to create their hg19 database. The data set consists of gene models built from the genewise alignments of the human proteome as well as from alignments of human cDNAs using the cDNA2genome model of exonerate.
This release of the assembly has the following properties:
- 27478 contigs.
- contig length total 3.2 Gb.
- chromosome length total 3.1 Gb.
It also includes nine haplotypic regions, mainly in the MHC region of chromosome 6.
Patches
As the GRC maintains and improves the assembly, patches are being introduced. Currently, assembly patches are of two types:
- Novel patch: new sequences that add alternative sequence at a loci and will remain as haplotypes in the next major assembly release by GRC
- Fix patch: sequences that correct the reference sequence and will replace the given region of the reference assembly at the next major assembly release by GRC
Other assemblies
Gene annotation
The Ensembl human gene annotations have been updated using Ensembl's automatic annotation pipeline. The updated annotation incorporates new protein and cDNA sequences which have become publicly available since the last GRCh37 genebuild (March 2009).
This archive displays a joint gene set based on the merge between the automatic annotation from Ensembl and a freeze of the manual annotation from Havana (first published in Vega Release 55). Transcripts from the two annotation sources are merged if they share the same internal exon-intron boundaries (i.e. have identical splicing pattern) with slight differences in the terminal exons allowed. Importantly, all Havana transcripts are included in the final Ensembl/Havana merged (GENCODE) gene set. See the summary table opposite for the corresponding GENCODE version number. The Consensus Coding Sequence (CCDS) identifiers have also been mapped to the annotations. More information about the CCDS project.
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | GRCh37.p13 (Genome Reference Consortium Human Reference 37), INSDC Assembly GCA_000001405.14, Feb 2009 |
Base Pairs | 3,098,825,702 |
Golden Path Length | 3,098,825,702 |
Annotation provider | Ensembl |
Annotation method | Full genebuild |
Genebuild started | Jul 2010 |
Genebuild released | Apr 2011 |
Genebuild last updated/patched | Sep 2013 |
Database version | 113.37 |
Gencode version | GENCODE 19 |
Gene counts (Primary assembly)
Coding genes | 20,805 (excl 463 ) readthrough |
Non coding genes | 22,966 |
Small non coding genes | 7,057 |
Long non coding genes | 13,870 (excl 184 ) readthrough |
Misc non coding genes | 2,039 |
Pseudogenes | 14,181 (excl 4 ) readthrough |
Gene transcripts | 196,668 |
Gene counts (Alternative sequence)
Coding genes | 2,606 (excl 37 ) readthrough |
Non coding genes | 1,436 |
Small non coding genes | 517 |
Long non coding genes | 783 (excl 24 ) readthrough |
Misc non coding genes | 136 |
Pseudogenes | 1,730 |
Gene transcripts | 18,303 |
Other
Genscan gene predictions | 48,597 |
Short Variants | 1,087,806,087 |
Structural variants | 7,608,658 |