NCBI Home Page

    ProtEST

 PubMed  Entrez  BLAST  OMIM  Taxonomy  Structure
  Search
underline

NCBI

UniGene
Query Tips
FAQ
DDD
Download UniGene

Related Resources
LocusLink
HomoloGene
dbEST
Trace Archive
BLAST
CGAP

 


ProtEST -Protein matches for ESTs

 The ProtEST System

The nucleotide sequences in UniGene are matched with possible translational products through sequence comparison using BLASTX with -e 1e-6. The sequences are compared with proteins from eight organisms and the best match in each organism is recorded. UniGene nucleotide sequences can thus have up to eight matches in ProtEST.

In order to exclude protein sequences which are strictly conceptual translations or models, the proteins used in ProtEST are those originating from the structural databases Swissprot, PIR, PDB or PRF. The proteins represent eight organisms, namely:

Homo sapiens
Mus musculus
Rattus norvegicus
Drosophila melanogaster
Caenorhabditis elegans 
Saccharomyces cerevisiae 
Arabidopsis thaliana
Escherichia coli

The dataset is currently being updated to include RefSeq proteins for these organisms as well.

Most of proteins are redundant, being represented in more than one organism, more than once in an organism, or as a sub-sequence of another protein. All of these cases are combined into one ProtEST record.

The ProtEST page is a summary of the unique ProtEST protein and those UniGene nucleotide sequences which include the protein as one of their 8 organism specific matches. When UniGene protein similarities are updated, the new values are incorporated into ProtEST. The ProtEST web site is accessible from any UniGene cluster or sequence page.

The ProtEST page is in a tabular format, and for each nucleotide match, includes the UniGene cluster ID, the nucleotide's Genbank accession ID, and the percent identity in the alignable region. A link is also provided to the sequence trace, when available in the NCBI trace archive. A schematic of the alignment is available, which indicates the alignable region and a hyper-link to the BLAST alignment. The entries in table can be sorted by:

percent identity in the alignable region
length of the alignable region
origin of the alignable region
end point of the alignable region
UniGene cluster ID
GenBank accession code
Various rows can also be omitted by choosing a cut-off for the percent identity of the alignment, the length of the alignment, or by restricting the organisms of the nucleotide sequences.

As new EST sequences are constantly being added to UniGene, the ProtEST pages are constantly being updated.




NLM | NIH | UniGene | Privacy Statement | Disclaimer | NCBI Help