Chromosome Reports below are synced with the current NCBI genome assembly.
NCBI now integrates dbSNP entries with other sequence and mapping resources via BLAST and E-PCR analysis. This analysis associates all SNPs with a nucleotide sequence record and/or sequence contig. RefSNPs that have been mapped to a contig will be annotated as a 'variation' feature on the appropriate contig. This variation annotation will include a /db_xref qualifier (i.e. database pointer) back to dbSNP. If the SNP is in a gene region, it is now annotated on the appropriate NCBI Reference Sequence mRNA record. If the variation is inferred to change the protein peptide sequence (a non-synonymous substitution), it is annotated as a variation feature on the protein sequence as well.
Currently 2,165,448 (80.1 % ) of human refSNPs have been mapped onto the current human genome sequence assembly. The other 19.9% resides in regions of extensive repetitive sequence. Submissions for mouse and mosquito have been mapped to their respective genomes as described in Table 1.
Hits to genome | Human | Mouse | Mosquito |
1 | 1,959,054 | 93,047 | 417,483 |
2 | 84,772 | 235 | 4,808 |
3-10 | 121,624 | 578 | 6,08 |
dbSNP submissions for other species are currently being mapped into GenBank accession coordinates. The figure to the left shows the per-nucleotide distribution of sequence diversity for the human genome. The mean density value for the genome is 0.000771 SNPs per base ( 7.708 SNPs per 10 kb). The full report for the human genome is available here from the dbSNP FTP site. A similar report of variation density has also been generated for the mouse genome. At the current survey density, the mouse has a variation density of XXXXX SNPs per base (XXXXX SNPs per 10 kb).
The figure to the right shows a plot
of variation density on 10kb segments of human genome sequence (excluding
fragment gaps). As noted here, an appreciable fraction of genome sequence are
"deserts" with respect to variation, as they have no variations
observed over 10 kb. Windows of invariant sequence continue to be a feature
with larger window sizes as well (up to 500kb, data not shown).
The distribution plotted here has a long tail to
the right (truncated in this figure), as occasional regions of the genome
exhibit levels of diversity several orders of magnitude larger than the genome
as a whole. Variation rates also vary from chromosome to chromosome (data not
shown).
The inter-SNP distance distribution of mapped refSNPs is shown to the left. This is a cumulative probability distribution tabulated over all human genome data showing the probability a distance between two neighboring variations will be less than or equal to some value L. Distances were binned into window lengths of 50 base pairs.
The table below provides statistics for the refSNPs mapped to the human genome.
Chromosome Report for ordered list of refSNPs: |
Map View |
RefSNPs on chromosome: | Chromosome Length (Mb): |
Mean Intermarker Distance (Kb) |
---|---|---|---|---|
1 | 142,629 | 252.5 | 1.77 | |
2 | 114,530 | 238.0 | 2.08 | |
3 | 100,670 | 204.4 | 2.03 | |
4 | 86,173 | 189.6 | 2.20 | |
5 | 117,256 | 180.5 | 1.54 | |
6 | 106,080 | 178.0 | 1.68 | |
7 | 85,708 | 160.4 | 1.87 | |
8 | 60,246 | 143.4 | 2.38 | |
9 | 64,928 | 131.7 | 2.03 | |
10 | 67,505 | 139.7 | 2.07 | |
11 | 92,726 | 141.2 | 1.52 | |
12 | 67,263 | 138.4 | 2.06 | |
13 | 61,001 | 116.7 | 1.91 | |
14 | 50,904 | 105.4 | 2.07 | |
15 | 44,587 | 98.9 | 2.22 | |
16 | 47,532 | 92.6 | 1.95 | |
17 | 40,743 | 83.3 | 2.05 | |
18 | 54,813 | 81.3 | 1.48 | |
19 | 32,039 | 77.1 | 2.41 | |
20 | 33,693 | 61.8 | 1.84 | |
21 | 26,284 | 46.2 | 1.76 | |
22 | 31,765 | 47.2 | 1.49 | |
X | 35,933 | 150.3 | 4.18 | |
Y | 3,723 | 59 | 15.85 |