ProtEST -Protein matches for ESTs
The nucleotide sequences in UniGene are matched with possible
translational products through sequence comparison using BLASTX with
-e 1e-6. The sequences are compared with proteins from eight
organisms and the best match in each organism is recorded. UniGene
nucleotide sequences can thus have up to eight matches in ProtEST.
In order to exclude protein sequences which are strictly conceptual
translations or models, the proteins used in ProtEST are those
originating from the structural databases Swissprot, PIR, PDB or PRF.
The proteins represent eight organisms, namely:
Homo sapiens
Mus musculus
Rattus norvegicus
Drosophila melanogaster
Caenorhabditis elegans
Saccharomyces cerevisiae
Arabidopsis thaliana
Escherichia coli
The dataset is currently being updated to include RefSeq proteins for
these organisms as well.
Most of proteins are redundant, being represented in more than one
organism, more than once in an organism, or as a sub-sequence of
another protein. All of these cases are combined into one ProtEST
record.
The ProtEST page is a summary of the unique ProtEST protein and
those UniGene nucleotide sequences which include the protein as one of
their 8 organism specific matches.
When UniGene protein similarities are updated, the new values
are incorporated into ProtEST. The ProtEST web site is accessible
from any UniGene cluster or sequence page.
The ProtEST page is in a tabular format, and for each nucleotide
match, includes the UniGene cluster ID, the nucleotide's Genbank
accession ID, and the percent identity in the alignable region. A
link is also provided to the sequence trace, when available in the
NCBI trace archive. A schematic of the alignment is available, which
indicates the alignable region and a hyper-link to the BLAST
alignment. The entries in table can be sorted by:
percent identity in the alignable region
length of the alignable region
origin of the alignable region
end point of the alignable region
UniGene cluster ID
GenBank accession code
Various rows can also be omitted by choosing a cut-off for the percent
identity of the alignment, the length of the alignment, or by
restricting the organisms of the nucleotide sequences.
As new EST sequences are constantly being added to UniGene, the ProtEST
pages are constantly being updated.
|