UniGene is an experimental system for automatically partitioning
GenBank sequences into a non-redundant set of gene-oriented
clusters. Each UniGene cluster contains sequences that represent a
unique gene, as well as related information such as the tissue types
in which the gene has been expressed and map location.
Chordata |
| Mammalia |
| | 25,713 |
| 15,694 |
| 54,560 |
| 46,544 |
| 3,154 |
| 40,468 |
| 24,028 |
Aves |
| | 21,447 |
Amphibia |
| | 24,087 |
| 15,137 |
Actinopterygii |
| | 23,471 |
| 14,241 |
| 8,154 |
| 1,081 |
| 2,326 |
Ascidiacea |
| | 14,370 |
|
|
|
|
|
|
|
|
|
|
|
|
Embryophyta |
| Bryopsida |
| | 6,962 |
Coniferopsida |
| | 13,037 |
Eudicotyledons |
| | 21,133 |
| 13,586 |
| 2,206 |
| 9,660 |
| 8,187 |
| 5,075 |
| 6,509 |
| 5,300 |
| 2,710 |
| 8,638 |
| 12,518 |
Liliopsida |
| | 12,220 |
| 33,722 |
| 4,771 |
| 8,146 |
| 24,854 |
| 14,277 |
|
|
|
|
|
|
|
|
|
|
|
In addition to sequences of well-characterized genes, hundreds of
thousands novel expressed sequence tag (EST) sequences have been
included. Consequently, the collection may be of use to the community
as a resource for gene discovery. UniGene has also been used by
experimentalists to select reagents for gene mapping projects and
large-scale expression analysis.
However, it should be noted that the procedures for automated
sequence clustering are still under development and the results may
change from time to time as improvements are made. Feedback from users
has been especially useful in identifying problems and we encourage
you to report any problems you encounter.
It should also be noted that no attempt has been made to produce
contigs or consensus sequences. There are several reasons why the
sequences of a set may not actually form a single contig. For example,
all of the splicing variants for a gene are put into the same
set. Moreover, EST-containing sets often contain 5' and 3' reads from
the same cDNA clone, but these sequences do not always overlap.
Currently, sequences from the animals human, rat, mouse, cow,
zebrafish, clawed frog, fruitfly and mosquito have been processed. Plant
organisms are wheat, rice, barley, maize and cress. These species
were chosen because they have the greatest amounts of EST data
available and represent a variety of species. Additional organisms
may be added in the future.
A representation of the UniGene datasets is available by
ftp.
Descriptions of the UniGene
transcript based and
genome based build procedures are available.
UniGene References |
|
Pontius JU, Wagner L, Schuler GD. UniGene: a unified view of the transcriptome. In: The NCBI Handbook. Bethesda (MD): National Center for Biotechnology Information; 2003.
[Full Text]
[PDF]
Wheeler DL, et al. Database Resources of the National Center for Biotechnology. Nucl Acids Res 31:28-33;2003.
[PubMed]
[Full Text]
[PDF]
Schuler GD. Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J Mol Med 75:694-698; 1997.
[PubMed]
[PDF]
Schuler GD, et al. A gene map of the human genome. Science 274:540-546; 1996;
[PubMed]
[Full Text]
Boguski MS, Schuler GD ESTablishing a human transcript map. Nature Genetics 10: 369-371; 1995.
[PubMed]
|