NCBI logo Tools for Data Mining  
PubMed Entrez BLAST OMIM Books TaxBrowser Structure
  Search for

NCBI
back to NCBI homepage
back to NCBI homepage

Site Map
Guide to NCBI resources

BLAST
Standard tool for sequence analysis

BLink
BLAST Link

CDART
Conserved Domain Architecture Retrieval Tool

CD search
Conserved Domain Database search

CGAP
Cancer Gene Anatomy Project

Cn3D
View 3-dimensional structures


COGs
Clusters of Orthologous Groups

Electronic PCR
Compare your sequence to COG database

Entrez Gene
Gene-based view of the data from a wide range of genomes

Entrez Genomes
Whole genomes of over 1000 organisms

GEO
Gene Expression Ominibus

Map Viewer
Interactive chromosome viewer


Model Maker
View evidence used to build a gene model

ORF finder
Open reading frames

Organism Specific Resources
Bee, Cat, Chicken, Cow, etc.

ProtEST
Protein matches for ESTs

Retrovirus Resources
Enveloped RNA viruses

SAGEmap
Serial Analysis of Gene Expression Tag to Gene Mapping

Sequin
A DNA Sequence Submission and Update Tool

SKY/M-FISH & CGH Database
Share and compare molecular
cytogenetic data

VAST search
Structure similarity search

VecScreen
Vector contamination identifier

TaxPlot
Protein homologs in Complete Microbial / Eukaryotic genomes


UniGene DDD
Gene-oriented clusters

















Tools - Nucleotide Sequence Analysis
The Basic Local Alignment Search Tool (BLAST) for comparing gene and protein sequences against others in public databases, now comes in several types including PSI-BLAST, PHI-BLAST, and BLAST 2 sequences. Specialized BLASTs are also available for human, microbial, malaria, and other genomes, as well as for vector contamination, immunoglobulins, and tentative human consensus sequences.
Electronic PCR - allows you to search your DNA sequence for sequence tagged sites (STSs) that have been used as landmarks in various types of genomic maps. It compares the query sequence against data in NCBI's UniSTS, a unified, non-redundant view of STSs from a wide range of sources.
Entrez Gene - each Entrez Gene record encapsulates a wide range of information for a given gene and organism. When possible, the information includes results of analyses that have been done on the sequence data. The amount and type of information presented depend on what is available for a particular gene and organism and can include: (1) graphic summary of the genomic context, intron/exon structure, and flanking genes, (2) link to a graphic view of the mRNA sequence, which in turn shows biological features such as CDS, SNPs, etc., (3) links to gene ontology and phenotypic information, (4) links to corresponding protein sequence data and conserved domains, (5) links to related resources, such as mutation databases. Entrez Gene is a successor to LocusLink.
Model Maker - allows you to view the evidence (mRNAs, ESTs, and gene predictions) that was aligned to assembled genomic sequence to build a gene model and to edit the model by selecting or removing putative exons. You can then view the mRNA sequence and potential ORFs for the edited model and save the mRNA sequence data for use in other programs. Model Maker is accessible from sequence maps that were analyzed at NCBI and displayed in Map Viewer.
ORF Finder - identifies all possible ORFs in a DNA sequence by locating the standard and alternative stop and start codons. The deduced amino acid sequences can then be used to BLAST against GenBank. ORF finder is also packaged in the sequence submission software Sequin.
Organism Specific Resources - Bee, Cat, Chicken, Cow, etc.
Retrovirus Resources - a collection of resources specifically designed to support the research of retroviruses. Resources include a genotyping tool that uses the BLAST algorithm to identify the genotype of a query sequence; an alignment tool for global alignment of multiple sequences; an HIV-1 automatic sequence annotation tool; and annotated maps of 16 retroviruses viewable in GenBank, FASTA, and graphic formats, with links to associated sequence records.
SAGEmap - provides a tool for performing statistical tests designed specifically for differential-type analyses of SAGE (Serial Analysis of Gene Expression) data. The data include SAGE libraries generated by individual labs as well as those generated by the Cancer Genome Anatomy Project (CGAP), which have been submitted to Gene Expression Omnibus (GEO). Gene expression profiles that compare the expression in different SAGE libraries are also available on the Entrez GEO Profiles pages. It is possible to enter a query sequence in the SAGEmap resource to determine what SAGE tags are in the sequence, then map to associated SAGEtag records and view the expression of those tags in different CGAP SAGE libraries.
Spidey - aligns one or more mRNA sequences to a single genomic sequence. Spidey will try to determine the exon/intron structure, returning one or more models of the genomic structure, including the genomic/mRNA alignments for each exon.
VecScreen - a tool for identifying segments of a nucleic acid sequence that may be of vector, linker, or adapter origin prior to sequence analysis or submission. VecScreen was developed to combat the problem of vector contamination in public sequence databases.

Tools - Protein Sequence Analysis
The Basic Local Alignment Search Tool (BLAST) for comparing gene and protein sequences against others in public databases, now comes in several types including PSI-BLAST, PHI-BLAST, and BLAST 2 sequences. Specialized BLASTs are also available for human, microbial, malaria, and other genomes, as well as for vector contamination, immunoglobulins, and tentative human consensus sequences.
BLink - ("BLAST Link") displays the results of BLAST searches that have been done for every protein sequence in the Entrez Proteins data domain.
CD Search - search the Conserved Domain Database with Reverse Position Specific BLAST.
CDART - when given a protein query sequence, CDART displays the functional domains that make up the protein and lists proteins with similar domain architectures.
ProtEST - a tool that presents a graphical view of matches between nucleotide sequences in UniGene and possible translational products. To generate the alignments, the 6-frame translations of mRNA and EST sequences in UniGene are compared to protein sequences using BLASTX with -e 1e-6. The translated nucleotide sequences are compared with proteins from eight model organisms and the best match in each organism is recorded. UniGene nucleotide sequences can thus have up to eight matches in ProtEST.
TaxPlot - a tool for 3-way comparisons of genomes on the basis of the protein sequences they encode. To use TaxPlot, one selects a reference genome to which two other genomes are compared. Pre-computed BLAST results are then used to plot a point for each predicted protein in the reference genome, based on the best alignment with proteins in each of the two genomes being compared.

Tools - Structures
Cn3D - Cn3D is a helper application for your web browser that allows you to view 3-dimensional structures from NCBI's Entrez retrieval service. Cn3D runs on Windows, Macintosh, and Unix.
VAST Search - VAST Search is NCBI's structure-structure similarity search service. It compares 3D coordinates of a newly determined protein structure to those in the MMDB/PDB database.
CD Search - search the Conserved Domain Database with Reverse Position Specific BLAST.

Tools - Genome Analysis
Entrez Genomes - whole genomes of over 1000 organisms. The genomes represent both completely sequenced organisms and those for which sequencing is in progress. All three main domains of life - bacteria, archaea, and eukaryota - are represented, as well as many viruses, phages, viroids, plasmids, and organelles.. Entrez Genomes provides graphical overviews of complete genomes/chromosomes and the ability to explore regions of interest in progressively greater detail.
COGs - Clusters of Orthologous Groups - a natural system of gene families from complete genomes. Clusters of Orthologous Groups (COGs) were delineated by comparing protein sequences encoded in 43 complete genomes, representing 30 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain.
Map Viewer - shows integrated views of chromosome maps for many organisms, including human and numerous other vertebrates, invertebrates, fungi, protozoa, and plants. Map Viewer is used to view assembled genomes (either draft or complete) and is a valuable tool for the identification and localization of genes and other biological features. Multiple map displays are aligned based on shared marker and gene names when available, and sequence map displays are based on a common sequence coordinate system. Sequence data for chromosome regions of interest can be downloaded, biological annotations can be viewed in graphical format and/or downloaded in tabular format, and gene models can be manipulated in the associated ModelMaker tool.
SKY/M-FISH & CGH Database - The NCI and NCBI SKY/M-FISH and CGH Database is a repository of publicly submitted data from Spectral Karyotyping (SKY), Multiplex Fluorescence In Situ Hybridization (M-FISH), and Comparative Genomic Hybridization (CGH), which are complementary fluorescent molecular cytogenetic techniques. SKY/M-FISH permits the simultaneous visualization of each human or mouse chromosome in a different color, facilitating the identification of chromosomal aberrations; CGH can be used to generate a map of DNA copy number changes in tumor genomes. Collaborative project with the National Cancer Institute. (data submission instructions...)

Tools - Gene Expression
GEO Gene Expression Omnibus - The Gene Expression Omnibus (GEO) provides several tools to assist with the visualization and exploration of GEO data. Datasets may be viewed as hierarchical cluster heat maps, providing insight into the relationships between samples and co-regulated genes. Individual gene expression profiles showing significant differences between experimental subsets may be located using average subset rank value comparisons. Related gene expression profiles may be identified on the basis of sequence similarity, profile similarity, or homology. Indicators of dataset normalization quality are provided as distribution graphs, and by flagging outliers. Links to other NCBI sequence, mapping and publication database resources are provided where possible.
SAGEmap -provides a tool for performing statistical tests designed specifically for differential-type analyses of SAGE (Serial Analysis of Gene Expression) data. The data include SAGE libraries generated by individual labs as well as those generated by the Cancer Genome Anatomy Project (CGAP), which have been submitted to Gene Expression Omnibus (GEO). Gene expression profiles that compare the expression in different SAGE libraries are also available on the Entrez GEO Profiles pages. It is possible to enter a query sequence in the SAGEmap resource to determine what SAGE tags are in the sequence, then map to associated SAGEtag records and view the expression of those tags in different CGAP SAGE libraries.
The Cancer Genome Anatomy Project (CGAP) - aims to decipher the molecular anatomy of cancer cells. CGAP develops profiles of cancer cells by comparing gene expression in normal, precancerous, and malignant cells from a wide variety of tissues.
UniGene DDD - Digital Differential Display - an online tool to compare computed gene expression profiles between selected cDNA libraries. Using a statistical test, genes whose expression levels differ significantly from one tissue to the next are identified and shown to the user. Additional information about UniGene is above, including a list of organisms represented.



Revised: October 26, 2004.