NCBI Entrez UniGene Logo
Entrez PubMed Nucleotide Protein Genome Structure PMC Taxonomy Books
 Search for
  Limits  Preview/Index  History  Clipboard  Details     
NCBI

UniGene
Homepage
Query Tips
FAQ
DDD
Download UniGene

Related Resources
LocusLink
HomoloGene
dbEST
Trace Archive
BLAST
CGAP
MGC cDNA clones

UniGene is an experimental system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location.

Species UniGene Entries
Chordata
Mammalia
25,713
15,694
54,560
46,544
3,154
40,468
24,028
Aves
21,447
Amphibia
24,087
15,137
Actinopterygii
23,471
14,241
8,154
1,081
2,326
Ascidiacea
14,370
Echinodermata
Echinoidea
2,614
Arthropoda
Insecta
14,653
5,900
2,071
14,611
Nematoda
Chromadorea
15,943
Platyhelminthes
Trematoda
1,195
Cnidaria
Hydrazoa
6,709
Embryophyta
Bryopsida
6,962
Coniferopsida
13,037
Eudicotyledons
21,133
13,586
2,206
9,660
8,187
5,075
6,509
5,300
2,710
8,638
12,518
Liliopsida
12,220
33,722
4,771
8,146
24,854
14,277
Chlorophyta
Chlorophycaea
6,077
Mycetozoa
Dictyosteliida
3,860
Apicomplexa
Coccidia
7,169
Sordariomycetes
Ascomycota
5,721
3,217

In addition to sequences of well-characterized genes, hundreds of thousands novel expressed sequence tag (EST) sequences have been included. Consequently, the collection may be of use to the community as a resource for gene discovery. UniGene has also been used by experimentalists to select reagents for gene mapping projects and large-scale expression analysis.

However, it should be noted that the procedures for automated sequence clustering are still under development and the results may change from time to time as improvements are made. Feedback from users has been especially useful in identifying problems and we encourage you to report any problems you encounter.

It should also be noted that no attempt has been made to produce contigs or consensus sequences. There are several reasons why the sequences of a set may not actually form a single contig. For example, all of the splicing variants for a gene are put into the same set. Moreover, EST-containing sets often contain 5' and 3' reads from the same cDNA clone, but these sequences do not always overlap.

Currently, sequences from the animals human, rat, mouse, cow, zebrafish, clawed frog, fruitfly and mosquito have been processed. Plant organisms are wheat, rice, barley, maize and cress. These species were chosen because they have the greatest amounts of EST data available and represent a variety of species. Additional organisms may be added in the future.

A representation of the UniGene datasets is available by ftp.

Descriptions of the UniGene transcript based and genome based build procedures are available.



 UniGene References

Pontius JU, Wagner L, Schuler GD. UniGene: a unified view of the transcriptome. In: The NCBI Handbook. Bethesda (MD): National Center for Biotechnology Information; 2003.
[Full Text] [PDF]

Wheeler DL, et al. Database Resources of the National Center for Biotechnology. Nucl Acids Res 31:28-33;2003.
[PubMed] [Full Text] [PDF]

Schuler GD. Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J Mol Med 75:694-698; 1997.
[PubMed] [PDF]

Schuler GD, et al. A gene map of the human genome. Science 274:540-546; 1996;
[PubMed] [Full Text]

Boguski MS, Schuler GD ESTablishing a human transcript map. Nature Genetics 10: 369-371; 1995.
[PubMed]