NCBI Resource Guide |
PubMed | Entrez | BLAST | OMIM | Taxonomy | Structure |
Each link in this Resource Guide leads to a brief description of the resource on this page, then to the resource itself. An Alphabetical Quicklinks Table provide direct links to resources and bypass the descriptions. |
indicates a resource which has become available in the last 12 months. |
About NCBI | Overview |
About NCBI - The science behind our resources. An introduction for researchers, educators and the public. Includes a Science Primer, with plain language introductions to bioinformatics, genome mapping, molecular modeling, SNPs, ESTs, microarray technology, molecular genetics, pharmacogenomics, and phylogenetics. |
Programs and Services - basic research, databases and software, outreach and education |
NCBI Handbook - an online book, written by NCBI staff, that discusses the many resources available at NCBI. Each chapter is devoted to one service; after a brief overview on using the resource, there is an account of how the resource works, including topics such as how data are included in a database, database design, query processing, and how the different resources relate to each other. |
What's New - recently released resources and enhancements to existing resources |
NCBI News - announcements about new resources, enhancements to existing resources, staff publications, tutorials, FAQs |
Exhibit Schedule - NCBI exhibits at upcoming conferences |
Postdoctoral Fellowships - general information, application procedure |
Organizational Structure - functions of the three NCBI branches: Computational Biology Branch (CBB), Information Engineering Branch (IEB), and Information Resources Branch (IRB) |
Board of Scientific Counselors - advises the NIH Director and the Deputy Director for Intramural Research; the NLM Director, and the NCBI Director about the intramural research and development programs of the NCBI. |
Contact Information - postal address, phone, e-mail addresses for various services |
NCBI Announcements Email Lists - Receive announcements about changes and updates to a variety of NCBI services. In addition to a general NCBI-announce list, topic-specific e-mail lists are available for BLAST, GenBank, dbSNP, Genomes, LinkOut, RefSeq, Sequin, and Entrez Utilities (for making WWW Links to Entrez). Information on how to subscribe is provided. |
Statistics for NCBI Resources - A page listing statistics that are available for selected NCBI resources, including number of records present in various databases, number of genomes available at NCBI and statistics for the individual genomes, and server usage. |
Site Search - Search the NCBI web site and display results in various formats. The default Homepage view sorts NCBI pages based on the number of other NCBI pages that link to them. The NCBI Site Search function is part of the Entrez system (described below). Therefore, the search features described in the Entrez help document also apply to the site search function. |
GenBank | Overview |
General Information |
What is GenBank? - a database of nucleotide sequences from >130,000 organisms. Records that are annotated with coding region (CDS) features also include amino acid translations. GenBank belongs to an international collaboration of sequence databases (described below), which also includes EMBL and DDBJ. GenBank is updated daily in NCBI search systems, and a full release is issued on the FTP site approximately the 15th of every February, April, June, August, October, and December. It contains all the data present in GenBank as of the cutoff date specified in the release notes (described below). The FTP site also provides daily cumulative an non-cumulative update files (more about the FTP site below). |
Sample Record - detailed description of each field in a GenBank record. Includes, for example, information about accession number formats, sequence identifiers (GI number and accession.version), a listing of GenBank divisions, and more. Describes some commonly annotated biological features, such as CDS, and provides links to documents that list and define the complete set of biological features that can be annotated on sequence records. Includes a link to a sequence revision history tool that can be used to track changes that have occurred to the sequence data in a record. Also lists the Entrez search field(s) that can be used to search each part of a sequence record. |
GenBank Divisions - summary of GenBank divisions, including abbreviations, full spellings, information about what the GenBank divisions are, and what they are not. (This information is part of the GenBank sample record, described above.) |
Access GenBank - through Entrez Nucleotides. Search by accession number, author name, organism, gene/protein name, and a variety of other text terms. Additional information about Entrez is below. Use BLAST for sequence similarity searches against GenBank and other databases. E-mail access to BLAST is provided through the BLAST server. An option to download the GenBank full release and updates via FTP is also available. |
Growth Statistics (graph) - see also Release Notes sections 2.2.6 (per division statistics), 2.2.7 (per organism statistics), 2.2.8 (growth of GenBank) |
GenBank Release Notes - A document that accompanies each full release (described above) of the GenBank database. The release notes describe the format and content of the flat files that comprise the release. They also include notices of recent and upcoming changes, information about GenBank divisions, growth statistics, citing GenBank, and more. |
Genetic Codes - synopsis of 17 genetic codes; used to ensure correct translation of coding sequences in GenBank records. |
GenBank Bionet Newsgroup - A moderated list that includes announcements of new GenBank releases, recent and upcoming changes, and discussion among subscribers. For information on how to subscribe by e-mail, see the NCBI Email Lists page. |
GenBank Submissions |
General Information |
|
Submission Software Programs |
|
|
Special Types of Submissions to GenBank |
Genomes,
Alignments,
ESTs,
GSSs,
HTGs,
STSs,
WGS |
|
|
|
|
|
|
Other Types of Data Submissions (Other NCBI databases, separate from GenBank, to which data can be submitted) |
|
|
|
|
|
|
International Nucleotide Sequence Database Collaboration |
GenBank, DDBJ, EMBL - Overview of collaborative projects and links to home pages. The GenBank, DDBJ (DNA Data Bank of Japan), and EMBL (European Molecular Biology Laboratory) databases share data on a daily basis and are therefore equivalent. The record formats and search systems might differ among the databases, but the accession numbers, sequence data, and annotations are the same in all of them. E.g., you can retrieve the record with accession number U12345 from GenBank, DDBJ, or EMBL and it will contain the same sequence data, references, etc. in all three databases. |
DDBJ/EMBL/GenBank Feature Table - feature table formats and standards used in the annotation of sequence records by the collaborating databases; makes possible sharing of data; includes detailed appendices such as:
|
FTP GenBank and Daily Updates |
GenBank flat file format - see sample GenBank record and detailed description in GenBank release notes; download most recent full release (described above) and daily cumulative or non-cumulative update files. |
ASN.1 format - Abstract Syntax Notation 1, an International Standards Organization (ISO) data representation format; download most recent full release (described above) and daily cumulative or non-cumulative update files. (more on ASN.1) |
FASTA format - definition line followed by sequence data only (example); see readme file for database descriptions, including nt.Z (daily updated non-redundant BLAST nucleotide database, contains GenBank+EMBL+DDBJ+PDB sequences, but no EST, STS, GSS, or HTGS sequences), nr.Z (daily updated non-redundant proteins), est.Z, gss.Z, htg.Z, sts.Z, and others. |
Molecular Databases | Overview |
Nucleotide Sequences |
Entrez Nucleotides - combines data from a number of source databases, including GenBank, RefSeq, TPA, and PDB. Data can be searched by accession number, author name, organism, gene/protein name, and a variety of other text terms. Additional information about Entrez below. For retrieval of large data sets, Batch Entrez (described below) is available. |
GenBank - a database of nucleotide sequences from >130,000 organisms. Records that are annotated with coding region (CDS) features also include amino acid translations. GenBank belongs to an international collaboration of sequence databases (described above), which also includes EMBL and DDBJ. A sample record, which provides a detailed description of each field in a GenBank record, is also available. A variety of sequence records exist in GenBank, such as characterized genes that have been well-studied and annotated, batch produced sequences (ESTs, GSSs, STSs), high throughput genomic sequences, complete genomes, and more. Additional information about GenBank is given in the GenBank Overview section of this guide. |
RefSeq - NCBI database of Reference Sequences. Curated, non-redundant set including genomic DNA contigs, mRNAs and proteins for known genes, mRNAs and proteins for gene models, and entire chromosomes. Accession numbers have the format of two letters, an underscore bar, and six digits. Nucleotide sequence records have accessions: NT_123456, NM_123456, NC_123456, NG_123456, XM_123456, XR_123456 (more info about accession numbers and access). Additional details about RefSeq are provided in the NCBI Handbook, which is available online in the Entrez Books database. |
Third Party Annotation (TPA) database - a database of experimentally supported annotations on assemblies of sequences already present in DDBJ/EMBL/GenBank. Whereas DDBJ/EMBL/GenBank contains primary sequence data and corresponding annotations submitted by the laboratories that did the sequencing, the TPA database contains third-party assemblies of primary data with experimentally supported annotation that has been published in a peer-reviewed scientific journal. Details about how to submit data, as well as examples of what can and cannot be submitted to TPA, are provided on the TPA home page.
Note: Although TPA records are derived from DDBJ/EMBL/GenBank, TPA is actually a separate database. Therefore, TPA records are not present in the GenBank FTP files, but will be available in separate FTP files. |
dbEST - database of expressed sequence tags; short, single pass read cDNA (mRNA) sequences. Also includes cDNA sequences from differential display experiments and RACE experiments. Note: EST sequences are available from two sources: dbEST and the EST division of GenBank. The sequences and accession numbers in both sources are the same but the record formats differ. (data submission instructions...) |
dbGSS - database of genome survey sequences; short, single pass read genomic sequences, exon trapped sequences, cosmid/BAC/YAC ends, others. Note: GSS sequences are available from two sources: dbGSS and the GSS division of GenBank. The sequences and accession numbers in both sources are the same but the record formats differ. (data submission instructions...) |
dbMHC - A new NCBI resource that provides a platform for genetic and clinical data related to the human Major Histocompatibility Complex (MHC) where users can submit, edit, view, exchange, and analyze MHC data. |
dbSNP - database of single nucleotide polymorphisms, small-scale insertions/deletions, polymorphic repetitive elements, and microsatellite variation. dbSNP includes polymorphism data that is experimentally derived, computationally derived, as well as hybrid data that is determined by the alignment of an experimentally derived molecule to genomic sequence data. Currently, dbSNP is comprised of 4 general classes of submissions: (a) The SNP Consortium (TSC) - candidate SNPs identified by sequencing using either the reduced representation shotgun strategy or by alignment of random reads to genomic sequence; (b)
Overlaps - candidate SNPs were identified in sequence overlaps between individual BACs or PACs; (c) ESTs - SNPs identified in EST clusters, including those identified by the Cancer Genome Anatomy Project (described below); (d) Other - SNPs identified after screening larger numbers of chromosomes include many with alleles of lower frequency (1%-20%). (data submission instructions) To receive announcements about updates and new features to dbSNP, see the NCBI Email Lists page. Note: Although dbSNP is a separate database from GenBank, SNP records include cross-references to GenBank records. |
dbSTS - database of sequence tagged sites; short sequences that are operationally unique in the genome, used to generate mapping reagents. Note: STS sequences are available from two sources: dbSTS and the STS division of GenBank. The sequences and accession numbers in both sources are the same but the record formats differ. (data submission instructions...) |
UniSTS - a unified, non-redundant view of sequence tagged sites (STSs). UniSTS integrates marker and mapping data from a variety of public resources. If two or more markers have different names but the same primer pair, a single STS record is presented for the primer pair and all the marker names are shown. Each UniSTS record displays the primer sequences, product size, mapping information, and cross references to LocusLink, dbSNP, RHdb, GDB, MGD, and the Map Viewer. The marker report also lists GenBank and RefSeq records that contain the primer sequences, as determined by Electronic PCR (e-PCR). Data sources include dbSTS, RHdb, GDB, various human maps (Genethon genetic map, Marshfield genetic map, Whitehead RH map, Whitehead YAC map, Stanford RH map, NHGRI chr 7 physical map, WashU chrX physical map), various mouse maps (Whitehead RH map, Whitehead YAC map, Jackson laboratory's MGD map). |
UniGene - ESTs and full-length mRNA sequences organized into clusters that each represent a unique known or putative gene within the organism from which the sequences were obtained. UniGene clusters are annotated with mapping and expression information when possible (e.g., for human), and include cross-references to other resources. Sequence data can be downloaded by cluster through the UniGene web pages, or the complete data set can be downloaded from the repository/UniGene directory of the FTP site. In addition, UniGene DDD (described below) can be used to show differential expression of genes between cDNA libraries. The organisms represented in UniGene are listed on the UniGene home page. |
HomoloGene - a gene homology tool that compares nucleotide sequences between pairs of organisms in order to identify putative orthologs. Curated orthologs are incorporated from a variety of sources via LocusLink. Organisms represented are listed on the HomoloGene home page. |
Mammalian Gene Collection (MGC) - The NIH Mammalian Gene Collection (MGC) is a trans-NIH initiative that seeks to identify and sequence a representative full open reading frame (FL-ORF) clone for each human, mouse, and rat gene. The MGC project entails the production of cDNA libraries and sequences, database and repository development, as well as the support of research for improved library construction, sequencing, and analytic technologies. All the resources generated by the MGC are publicly accessible to the biomedical research community. |
Trace Archive - a repository of the raw sequence traces generated by large sequencing projects. It allows retrieval of both the sequence file and the underlying data which generated the file. In the case of projects that rely on a Whole Genome Shotgun (WGS) strategy, the Trace Archive will be the sole source of raw sequence data. NCBI will be exchanging data regularly with the Ensembl Trace Server. The Trace Archive can be searched by using MegaBLAST (described below), or by entering a term in the search box at the top of the Trace Archive Page. (data submission instructions...) |
Assembly Archive - links the raw sequence information found in the Trace Archive with assembly information found in publicly available sequence repositories (GenBank/EMBL/DDBJ). The Assembly Viewer allows a user to see the multiple sequence alignments as well as the actual sequence chromatogram. |
UniVec - a database that can be used to quickly identify segments within nucleic acid sequences which may be of vector origin. Screening using UniVec is efficient because a large number of redundant sub-sequences have been eliminated to create a database that contains only one copy of every unique sequence segment from a large number of vectors. The VecScreen tool, described below (under sequence analysis tools), can be used to compare a query sequence against the UniVec database in order to identify possible vector contamination. |
Genomes - Resources in the Genomes and Maps section contain the nucleotide sequences for a variety of genomes. Examples of the genomes available include: >1000 organisms in Entrez Genomes, human, mouse, rat, zebrafish, Drosophila, nematode, plant genomes, yeast, malaria, microbial genomes, viruses, viroids, plasmids, eukaryotic organelles. |
Nucleotide Sequence Analysis - various tools are available for analyzing nucleotide sequences and are described below. |
Protein Sequences |
Entrez Proteins - search protein sequence records (from GenPept + RefSeq + Swiss-Prot + PIR + RPF + PDB) by accession number, author name, organism, gene/protein name, and a variety of other text terms. Additional information about Entrez below. For retrieval of large data sets, Batch Entrez (described below) is available. Entrez proteins also includes BLink ("BLAST Link"), a feature which displays the results of BLAST searches that have been done for every protein sequence in the Entrez Proteins data domain. To access it, follow the BLink link displayed beside any hit in the results of an Entrez Proteins search. More information about BLink is provided below. |
RefSeq - NCBI database of Reference Sequences. Curated, non-redundant set including genomic DNA contigs, mRNAs and proteins for known genes, mRNAs and proteins for gene models, and entire chromosomes. Accession numbers have the format of two letters, an underscore bar, and six digits. Protein sequence records have accessions: NP_123456 or XP_123456 (more info about accession numbers and access). |
FTP GenPept - download the "relxxx.fsa_aa.gz" file. The filename stands for "Release number XXX FASTA formatted amino acid translations". The translations are extracted from GenBank/EMBL/DDBJ records that are annotated with one or more CDS features |
Conserved Domain Database (CDD) - a collection of sequence alignments and profiles representing protein domains conserved in molecular evolution. It includes domains from Smart and Pfam, as well as domains contributed by NCBI researchers. It also includes alignments of the domains to known 3-dimensional protein structures in the MMDB database (described below). CDD can be used to identify conserved domains in a protein query sequence, using the CD-Search service (described below). In addition, the CDART tool (described below) uses CDD and RPS-BLAST (described below) to retrieve proteins with similar domain architectures. |
HIV Interactions - The HIV-1, Human Protein Interaction Database contains information about known interactions of HIV-1 proteins with proteins from human hosts. It provides annotated bibliograhies of published reports of protein interactions, with links to the corresponding PubMed records and sequence data. More information about this database is provided under "Literature Databases". |
PROW - Protein Resources on the Web - short authoritative guides on the approximately 200 human CD cell-surface molecules. Peer-reviewed; provides approximately 20 standardized categories of information (biochemical function, ligands, etc.) for each CD antigen. |
Protein Sequence Analysis - various tools are available for analyzing protein sequences and are described below. |
Proteomes |
|
|
|
Structures |
Structure Home - general information about the NCBI Structure Group and its research projects, as well as access to the Molecular Modeling Database (MMDB) and related tools to search and display structures. |
MMDB: Molecular Modeling Database- a database of three-dimensional biomolecular structures derived from X-ray crystallography and NMR-spectroscopy. MMDB is a subset of three-dimensional structures obtained from the Brookhaven Protein DataBank (PDB), excluding theoretical models. MMDB reorganizes and validates the information in a way that enables cross-referencing between the chemistry and the three-dimensional structure of macromolecules. Its data specification includes a description of a biopolymer's spatial structure, a description of how it is organized chemically, and a set of pointers linking the two. By integrating chemical, sequence, and structure information, MMDB is designed to serve as a resource for structure-based homology modeling and protein structure prediction. MMDB records are stored in ASN.1 format and can be displayed with the Cn3D, Rasmol, or Kinemage viewers. In addition, similar structures within the database have been identified usingVAST, and new structures can be compared against the database using VASTsearch. |
3D Domains Database - compact structural domains identified automatically in MMDB, Entrez's macromolecular three-dimensional structure database. These domains are identified by searching for breakpoints in the structure between major secondary structure elements so that the ratio of intra- to inter-domain contacts falls above a set threshhold. 3D Domains are the units of comparison for structure neighbor ("related structures") calculations using the VAST algorithm. |
Conserved Domain Database (CDD) - a collection of sequence alignments and profiles representing protein domains conserved in molecular evolution. It includes domains from Smart and Pfam, as well as domains contributed by NCBI researchers. It also includes alignments of the domains to known 3-dimensional protein structures in the MMDB database (described above). CDD can be used to identify conserved domains in a protein query sequence, using the CD-Search service (described below). In addition, the CDART tool (described below) uses CDD and RPS-BLAST (described below) to retrieve proteins with similar domain architectures. |
PubChem - contains the chemical structures of small organic molecules and information on their biological activities. It is intended to support the Molecular Libraries and Imaging component of the NIH Roadmap Initiative. PubChem's chemical structure database may be searched on the basis of descriptive terms, chemical properties, and structural similarity. When possible, PubChem's chemical structure records are linked to other NCBI databases, including the PubMed scientific literature database and NCBI's protein 3D structure database. PubChem also contains the results of high-throughput biological screening experiments. PubChem is organized as three linked databases within the Entrez/PubMed information retrieval system. |
|
|
|
Structure-Related Tools - in addition to the structure databases described above, NCBI offers several tools: |
|
|
|
|
Genes |
Entrez Gene - Entrez Gene provides a gene-based view of the data from a wide range of genomes. It supplies key connections in the nexus of map, sequence, expression, structure, functional, and homology data. Each record represents a single gene from a given organism. The minimum set of data in a gene record includes a unique identifier or GeneID assigned by NCBI, a preferred symbol, and any of sequence information, map information, or official nomenclature from an authority list. In addition, a gene record can also include expression, structure, functional, and homology data, when available. Entrez Gene includes data from all organisms that have RefSeq genome records (with NC_* accessions, see more info above), and can also include data from recognized genome-specific databases that provide NCBI with information about genes (preferably with defining sequence) or mapped phenotypes. Entrez Gene can be considered as the successor to LocusLink (described below). |
GeneRIF - Gene References into Function (GeneRIFs) provide a simple mechanism to allow scientists to add to the functional annotation of loci described in Entrez Gene. They appear as annotated bibliographies in Entrez Gene records, and consist of brief statements on gene function with links to the corresponding PubMed records (example: human MLH1). The GeneRIF help page describes the simple steps needed to submit information. GeneRIFs are also added to the Entrez Gene records by the MEDLINE Indexing Staff of the National Library of Medicine. GeneRIFs are currently available for a subset of organisms in Entrez Gene, and will be provided for the loci of other organisms as the development of Entrez Gene continues. |
LocusLink - provides a single query interface to curated sequence and descriptive information about genetic loci. LocusLink issues a stable ID for each locus and presents information on official nomenclature, aliases, sequence accession numbers, phenotypes, EC numbers, OMIM numbers, UniGene clusters, map information, and relevant web sites. LocusLink is a collaborative effort among NCBI, Human Gene Nomenclature Committee, OMIM, and others. LocusLink currently contains data for a number of species such as human, mouse, rat, zebrafish, nematode, fruit fly, cow, sea urchin, African clawed frog, and HIV-1. Organisms can be searched together or separately.
Data for these organisms are also available in the Entrez Gene database (described above), which can be considered as a successor to LocusLink. The major differences between LocusLink and Entrez Gene are scope of data and search interface. Entrez Gene contains data from all organisms with RefSeq (described below) genome records. It also uses the Entrez search system, and therefore offers the helpful functions such as Preview/Index, History, and LinkOut that are available for other Entrez databases. |
UniGene - ESTs and full-length mRNA sequences organized into clusters that each represent a unique known or putative gene within the organism from which the sequences were obtained. UniGene clusters are annotated with mapping and expression information when possible (e.g., for human), and include cross-references to other resources. Sequence data can be downloaded by cluster through the UniGene web pages, or the complete data set can be downloaded from the repository/UniGene directory of the FTP site. In addition, UniGene DDD (described below) can be used to show differential expression of genes between cDNA libraries. The organisms represented in UniGene are listed on the UniGene home page. |
HomoloGene - a gene homology tool that compares nucleotide sequences between pairs of organisms in order to identify putative orthologs. Curated orthologs are incorporated from a variety of sources via LocusLink. Organisms represented are listed on the HomoloGene home page. |
Mammalian Gene Collection (MGC) - The NIH Mammalian Gene Collection (MGC) is a trans-NIH initiative that seeks to identify and sequence a representative full open reading frame (FL-ORF) clone for each human, mouse, and rat gene. The MGC project entails the production of cDNA libraries and sequences, database and repository development, as well as the support of research for improved library construction, sequencing, and analytic technologies. All the resources generated by the MGC are publicly accessible to the biomedical research community. |
HIV Interactions - The HIV-1, Human Protein Interaction Database contains information about known interactions of HIV-1 proteins with proteins from human hosts. It provides annotated bibliograhies of published reports of protein interactions, with links to the corresponding PubMed records and sequence data. More information about this database is provided under "Literature Databases". |
Expression |
Gene Expression Omnibus (GEO) - a gene expression and hybridization array data repository, as well as a curated, online resource for gene expression data browsing, query and retrieval. GEO was the first fully public high-throughput gene expression data repository, and became operational in July 2000. Many types of gene expression data from platforms such as spotted microarray (microarray), high-density oligonucleotide array (HDA), hybridization filter (filter) and serial analysis of gene expression (SAGE) data, are accepted, accessioned, and archived as a public data set. GEO data can be accessed through several search and browsing tools on the GEO home page, Entrez (via Entrez GEO Profiles and Entrez GDS (GEO DataSets)), and the FTP site. The Tools/Gene Expression section of this file provides information about data visualization and exploration capabilities available in GEO. |
Expression-Related Tools - in addition to the GEO database, described above, NCBI offers several tools: |
|
|
|
Taxonomy |
NCBI Taxonomy Database Home - general information about the Taxonomy project, including taxonomic resources and a list of outside curators collaborating with NCBI taxonomists. The NCBI Taxonomy Database contains the names and lineages of >130,000 organisms, both living and extinct, that are represented in the genetic databases with at least one nucleotide or protein sequence. New organisms are added to the database as sequence data are deposited for them. The purpose of the taxonomy project at NCBI is to build a consistent phylogenetic taxonomy for the sequence databases. |
Taxonomy Browser - The search bar on the Taxonomy home page allows you to browse the NCBI taxonomy database. Enter the scientific or common name of a species (e.g., Canis familiaris or dog) or a higher taxon (e.g., Canidae) to view that organism or taxon's lineage; retrieve the available nucleotide, protein, structure, and genome records; and browse up and down the taxonomic tree. (Tip: For the broadest search results, select the "token set" option in the search bar, which searches for any string, whether in the beginning, middle, or end of a word.) Entrez also provides an interface for browsing the taxonomy database, and offers features such as the Common Tree function, which allows you to build a tree for your own selection of organisms or taxa (more...). |
Taxonomy BLAST - an implementation of Gapped BLAST (2.x) that groups hits by source organism, according to information in NCBI's Taxonomy database. Species are listed in order of sequence similarity to the query sequence; the strongest match listed first. Three report views are available:
|
TaxPlot - a tool for 3-way comparisons of genomes on the basis of the protein sequences they encode. To use TaxPlot, one selects a reference genome to which two other genomes are compared. Pre-computed BLAST results are then used to plot a point for each predicted protein in the reference genome, based on the best alignment with proteins in each of the two genomes being compared. |
Literature Databases | Overview |
PubMed - A database of citations and abstracts for biomedical literature. These citations are from MEDLINE and additional life science journals. PubMed also includes links to many sites providing full text articles and other related resources. PubMed is accessible through the Entrez search and retrieval system (described below) |
|
|
|
PubMed Central - a digital archive of life sciences journal literature managed by the National Center for Biotechnology Information (NCBI) at the U.S. National Library of Medicine (NLM). It is not a journal publisher. Access to PubMed Central (PMC) is free and unrestricted. |
OMIM - Online Mendelian Inheritance in Man - continuously updated catalog of human genes and genetic disorders, with links to associated literature references, sequence records, maps, and related databases. |
Entrez Books - In collaboration with book publishers, the NCBI is adapting textbooks for the web and linking them to PubMed, the biomedical bibliographic database. The idea is to provide background information to PubMed, so that users can explore unfamiliar concepts found in PubMed search results. |
HIV Interactions - The HIV-1, Human Protein Interaction Database contains information about known interactions of HIV-1 proteins with proteins from human hosts. RefSeq protein sequence records serve as anchors for collecting published information about interactions between HIV-1 and human proteins. Each HIV Interactions database record lists an HIV protein and the human proteins with which it has been found to interact. In turn, the LocusLink and Entrez Gene records for each human protein contain annotated HIV-1 Interactions bibliographies, which consist of brief statements on protein interactions with links to the corresponding PubMed records and sequence data. The HIV Interactions database is a collaborative project among the developers of RefSeq (description) and Entrez Gene (description), and is similar in concept to GeneRIF (description). In contrast to GeneRIFs for single genes, however, the publications cited in the HIV Interactions Database contain statements about binding between two proteins rather than statements about the function of a single gene. |
Genomes and Maps | Overview |
organism collections (including Entrez Genomes, Map Viewer, and UniGene), human, mouse, rat, zebrafish, Drosophila, nematode, plant genomes, yeast, malaria, microbial genomes, viruses, viroids, plasmids, eukaryotic organelles |
Organism Collections |
Entrez Genomes - whole genomes of over 1000 organisms. The genomes represent both completely sequenced organisms and those for which sequencing is in progress. All three main domains of life - bacteria, archaea, and eukaryota - are represented, as well as many viruses, phages, viroids, plasmids, and organelles.. Entrez Genomes provides graphical overviews of complete genomes/chromosomes, and the ability to explore regions of interest in progressively greater detail. ProtTables and TaxTables are provided for organisms on which analyses have been done by NCBI staff. In addition, the Map Viewer, a software component of Entrez Genomes, provides views of integrated chromosome maps for a variety of organisms (see additional information about the Map Viewer below). |
Genomes Announcements - To receive announcements about recently completed genomes, see the NCBI Email Lists page. |
Map Viewer - The Map Viewer is a software component of Entrez Genomes (described above) that provides special browsing capabilities for a subset of organisms. It allows you to view and search an organism's complete genome, display chromosome maps, and zoom into progressively greater levels of detail, down to the sequence data for a region of interest. If multiple maps are available for a chromosome, it displays them aligned to each other based on shared marker and gene names, and, for the sequence maps, based on a common sequence coordinate system. The organisms currently represented in the Map Viewer are listed on the Map Viewer home page and in the Map Viewer help document, which provides general information on how to use that tool. The number and types of available maps vary by organism, and are described in the "data and search tips" file provided for each organism. |
Entrez Gene - Entrez Gene provides a gene-based view of the data from a wide range of genomes. It supplies key connections in the nexus of map, sequence, expression, structure, functional, and homology data. Each record represents a single gene from a given organism. The minimum set of data in a gene record includes a unique identifier or GeneID assigned by NCBI, a preferred symbol, and any of sequence information, map information, or official nomenclature from an authority list. In addition, a gene record can also include expression, structure, functional, and homology data, when available. Entrez Gene includes data from all organisms that have RefSeq genome records (with NC_* accessions, see more info above), and can also include data from recognized genome-specific databases that provide NCBI with information about genes (preferably with defining sequence) or mapped phenotypes. Entrez Gene can be considered as the successor to LocusLink (described below). |
UniGene - ESTs and full-length mRNA sequences organized into clusters that each represent a unique known or putative gene within the organism from which the sequences were obtained. UniGene clusters are annotated with mapping and expression information when possible (e.g., for human), and include cross-references to other resources. Sequence data can be downloaded by cluster through the UniGene web pages, or the complete data set can be downloaded from the repository/UniGene directory of the FTP site. In addition, UniGene DDD (described below) can be used to show differential expression of genes between cDNA libraries. The organisms represented in UniGene are listed on the UniGene home page. |
Download Genomes <350 KB via Entrez Genomes pages for individual organisms |
Download Genomes >350 KB from the NCBI ftp site - see FTP information below; ftp links are also available from Entrez Genomes pages for individual organisms |
Genome Sequencing Centers - list of genome sequencing centers and the organisms on which they work |
Human Genome |
Guide, Chromosomes, Sequences, Genes, BLAST, Clones, Genome Maps, Mapped Markers, Cytogenetics, Gene Expression, Genetic Variation, Disorders, Cancer Research, FTP |
Guide |
|
|
|
|
Chromosomes |
|
|
Sequences |
|
|
|
|
Genes |
|
|
|
|
BLAST against human genomic sequence data |
|
Clones |
NCBI does not distribute clones. However, some NCBI resources contain information about clones and the sources from which they can be obtained. |
|
|
|
|
Clone Information for Other (Non-human) Organisms - Some organisms have additional clone information resources. For example, the resources available for the mouse genome include several items mentioned above, plus a CloneFinder, described below. In addition, many records in dbEST (described above) include information about clone sources such as the I.M.A.G.E. consortium. |
Genome Maps |
|
|
|
|
|
|
|
|
Mapped Markers | |
|
|
|
|
|
|
|
|
Cytogenetics | |
|
|
|
|
Gene Expression |
|
|
|
|
Genetic Variation | |
|
|
|
|
Disorders |
|
|
|
|
Cancer Research |
|
|
|
|
|
|
FTP |
|
Mouse Genome |
Guide, Chromosomes, Sequences, Genes, Clones, Maps and Mapped Markers, Cytogenetics, BLAST, FTP |
Guide |
|
Chromosomes |
|
|
Sequences |
|
Genes |
|
|
Clones |
|
|
|
Maps and Mapped Markers |
|
|
|
Cytogenetics |
|
BLAST |
|
FTP |
|
Rat Genome |
Rat Genome Resources Guide - brings together information on diverse rat-related resources from multiple centers: sequence, mapping, and clone information as well as pointers to strain and mutant resources. |
Map Viewer - integrated chromosome maps - The Map Viewer is a software component of Entrez Genomes that displays one or more maps which have been aligned to each other based on shared marker and gene names, and, for the sequence maps, based on a common sequence coordinate system. The maps that are currently available for rat are described in the Rattus norvegicus data and search tips document. The Map Viewer help document provides general information on how to use that tool. |
LocusLink - provides a single query interface to curated sequence and descriptive information about genetic loci. LocusLink issues a stable ID for each locus and presents information on official nomenclature, sequence accession numbers, UniGene clusters, map information, and relevant web sites. LocusLink currently contains data for a number of species such as human, mouse, rat, zebrafish, nematode, fruit fly, cow, sea urchin, African clawed frog, and HIV-1. Organisms can be searched together or separately. |
BLAST against the rat genome - Nucleotide or protein query sequences can be used. A variety of database choices are provided. |
UniGene - ESTs and full-length mRNA sequences organized into clusters that each represent a unique known or putative gene within the organism from which the sequences were obtained. Additional information about UniGene is provided above. |
HomoloGene - a gene homology tool that compares nucleotide sequences between pairs of organisms, including human, mouse, rat, zebrafish, and fruit fly, in order to identify putative orthologs. Curated orthologs are incorporated from a variety of sources via LocusLink. |
Cow Genome |
UniGene - ESTs and full-length mRNA sequences organized into clusters that each represent a unique known or putative gene within the organism from which the sequences were obtained. Additional information about UniGene is provided above. |
Zebrafish Genome |
Zebrafish Genome Resources Guide - brings together information on diverse zebrafish-related resources from multiple centers: sequence, mapping, and clone information as well as pointers to strain and mutant resources. |
LocusLink - provides a single query interface to curated sequence and descriptive information about genetic loci. LocusLink issues a stable ID for each locus and presents information on official nomenclature, sequence accession numbers, UniGene clusters, map information, and relevant web sites. LocusLink currently contains data for a number of species such as human, mouse, rat, zebrafish, nematode, fruit fly, cow, sea urchin, African clawed frog, and HIV-1. Organisms can be searched together or separately. |
Map Viewer - integrated chromosome maps - The Map Viewer is a software component of Entrez Genomes that displays one or more maps which have been aligned to each other based on shared marker and gene names, and, for the sequence maps, based on a common sequence coordinate system. The maps that are currently available for Danio rerio are described in the Danio rerio genome data and search tips document. The Map Viewer help document provides general information on how to use that tool. |
UniGene - ESTs and full-length mRNA sequences organized into clusters that each represent a unique known or putative gene within the organism from which the sequences were obtained. Additional information about UniGene is provided above. |
HomoloGene - a gene homology tool that compares nucleotide sequences between pairs of organisms, including human, mouse, rat, zebrafish, and fruit fly, in order to identify putative orthologs. Curated orthologs are incorporated from a variety of sources via LocusLink. |
Drosophila Genome |
Drosophila melanogaster Home Page - provides an overview of available resources for that organism, graphically displays all the chromosomes (to scale), and allows you search both cytogenetic and sequence data across the whole genome through the Entrez Genomes browser. Entrez Genomes presents a unified graphical view of maps (genetic and physical) and sequence data for an organism. After you search for a term such as a gene symbol, it presents a graphic Genome View of search results, from which you can zoom into progressively more detailed Map Views of the region of interest, and link to sequence data and associated resources that contain additional detail. |
Map Viewer - integrated chromosome maps - The Map Viewer is a software component of Entrez Genomes that displays one or more maps which have been aligned to each other based on shared marker and gene names, and, for the sequence maps, based on a common sequence coordinate system. The sequence and cytogenetic maps that are currently available for Drosophila are described in the Drosophila melanogaster genome data and search tips document. The Map Viewer help document provides general information on how to use that tool. |
LocusLink - provides a single query interface to curated sequence and descriptive information about genetic loci. LocusLink issues a stable ID for each locus and presents information on official nomenclature, sequence accession numbers, UniGene clusters, map information, and relevant web sites. LocusLink currently contains data for a number of species such as human, mouse, rat, zebrafish, nematode, fruit fly, cow, sea urchin, African clawed frog, and HIV-1. Organisms can be searched together or separately. |
HomoloGene - a gene homology tool that compares nucleotide sequences between pairs of organisms, including human, mouse, rat, zebrafish, and fruit fly, in order to identify putative orthologs. Curated orthologs are incorporated from a variety of sources via LocusLink. |
BLAST against Drosophila melanogaster genome sequence
|
FTP Site - see additional information about the genomes FTP directories, below |
Nematode Genome |
Caenorhabditis elegans Home Page - Graphical representation of chromosomes that can be viewed in their entirety or explored in progressively greater detail in the Map Viewer (described above). Home page also includes links to many related resources, such as sequencing centers, other nematode sequencing projects, related databases, etc. | |
FTP Site - the chromosome data sets are available for ftp in a variety of formats, including GenBank, FastA, and ASN.1, and others in the genbank/genomes/C_elegans/ directory of the NCBI FTP site (ftp://ftp.ncbi.nih.gov/). An NCBI curated version of the data is available in the genomes/C_elegans/ directory. (See additional note in the FTP section, below, about the two different FTP directories) |
Plant Genomes |
Plant Genomes Central - provides access to data from large-scale sequencing projects, genetic maps, and large-scale EST sequencing projects. All organism names on the page are linked to the corresponding taxonomic information in NCBI's Taxonomy database (described above). In addition, organisms listed under "large-scale sequencing projects" and "genetic maps" are represented in the Map Viewer (described above). Organisms listed under "large-scale EST sequencing projects" are linked to their EST sequences in Entrez (described above). |
UniGene - ESTs and full-length mRNA sequences organized into clusters that each represent a unique known or putative gene within the organism from which the sequences were obtained. Additional information about UniGene is provided above. |
Yeast Genome |
Saccharomyces cerevisiae Home Page - baker's yeast - graphical representation of chromosomes that can be viewed in their entirety or explored in progressively greater detail in Entrez Genomes (described above), with links to associated sequence data. Home page also includes links to many related resources, such as sequencing centers, other fungi sequencing projects, related databases, etc. |
Schizosaccharomyces pombe Home Page - fission yeast - similar to the home page for Saccharomyces cerevisiae, described above. |
COGs - Clusters of Orthologous Groups - natural system of gene families from complete genomes. Clusters of Orthologous Groups (COGs) were delineated by comparing protein sequences encoded in complete unicellular genomes representing 30 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. The Initial Version of COGs includes 44 organisms. The Updated Version of COGs includes 66 organisms in the Unicellular Clusters, plus Eukaryotic Clusters (called KOGs). More organisms will be added in the future. |
BLAST against the Saccharomyces cerevisiae or Schizosaccharomyces pombe genome sequences
|
FTP Saccharomyces cerevisiae Chromosomes |
Malaria Genome |
Malaria Genetics & Genomics - provides data and information relevant to malaria genetics and genomics. Resources include organism specific sequence BLAST databases (Plasmodium falciparum only, all Plasmodium, and all Toxoplasma), genome maps, linkage markers, and information about genetic studies. Links are provided for other malaria web sites and genetic data on related apicomplexan parasites, including Toxoplasma gondii. |
Map Viewer - The Map Viewer (described above) provides graphical views and search capabilities for both Plasmodium falciparum and Anopheles gambiae (malaria mosquito). |
BLAST against Malaria sequences
|
FTP
|
Microbial Genomes |
Entrez Genomes - Graphical representation of complete bacterial genomes that can be viewed in their entirety or explored in progressively greater detail; links to associated sequence data. A "ProtTable" of protein coding genes is provided for each bacterium. There are also links to a "TaxTable," showing the distribution of BLAST protein homologs by taxa (sequences grouped by superkingdom), and to a distribution of BLAST protein homologs by 3-D structure (sequences with known structure). Additional information about Entrez Genomes is also provided above. |
Completed Microbial Genomes Sequencing Projects - completed sequencing projects, with links to NCBI graphical views (Entrez Genomes), sequencing centers, and various analyses that have been done on the genomes at NCBI (e.g., TaxTable, COG Table, 3-D Neighbors, and more). |
In-Progress Microbial Genomes Sequencing Projects - ongoing sequencing projects, with links to sequencing centers and to BLASTable data. |
COGs - Clusters of Orthologous Groups - natural system of gene families from complete genomes. Clusters of Orthologous Groups (COGs) were delineated by comparing protein sequences encoded in complete unicellular genomes representing 30 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. The Initial Version of COGs includes 44 organisms. The Updated Version of COGs includes 66 organisms in the Unicellular Clusters, plus Eukaryotic Clusters (called KOGs). More organisms will be added in the future. |
BLAST against Microbial Genomes - sequences from selected completed and unfinished eukaryotic and prokaryotic genomes; partial genomic sequences have been graciously provided by the sequencing centers or extracted from GenBank. NCBI encourages sequencing centers to submit partially sequenced genomes to be included in this BLAST page. Data can be submitted via ftp, after contacting genomes@ncbi.nlm.nih.gov to set up an account. |
FTP - download complete bacterial genomes in a variety of formats, including GenBank flat file (*.gbk), GenBank summary file (*.gbs), FASTA Nucleic Acid file (*.fna), FASTA Amino Acid file (*.faa), Protein Table (*.ptt), and others.
(See additional note in the FTP section, below, about the two different FTP directories) |
Viral Genomes |
Entrez Genomes - Graphical representation of complete viral genomes that can be viewed in their entirety or explored in progressively greater detail; links to associated sequence data. A summary of Coding Regions (described above) is provided for each virus. Additional information about Entrez Genomes is also provided above. |
Retrovirus Resources - A collection of resources specifically designed to support the research of retroviruses. Resources include a genotyping tool that uses the BLAST algorithm to identify the genotype of a query sequence; an alignment tool for global alignment of multiple sequences; an HIV-1 automatic sequence annotation tool; and annotated maps of 16 retroviruses viewable in GenBank, FASTA, and graphic formats, with links to associated sequence records. |
Viral Reference Sequences - A collection of reference sequences for more than 1000 viral genomes. |
HIV Interactions - The HIV-1, Human Protein Interaction Database contains information about known interactions of HIV-1 proteins with proteins from human hosts. It provides annotated bibliograhies of published reports of protein interactions, with links to the corresponding PubMed records and sequence data. More information about this database is provided under "Literature Databases". |
Viroid Genomes |
Entrez Genomes - Graphical representation of complete viroid genomes that can be viewed in their entirety or explored in progressively greater detail; links to associated sequence data. A summary of Coding Regions (described above) is provided for each viroid. Additional information about Entrez Genomes is also provided above. |
Plasmids |
Entrez Genomes - Graphical representation of complete plasmids that can be viewed in their entirety or explored in progressively greater detail; links to associated sequence data. A summary of Coding Regions (described above) is provided for each plasmid. Additional information about Entrez Genomes is also provided above. |
Eukaryotic Organelles |
Eukaryotic Organelles Home Page - Provides an overview of eukaryotic organelles; a description of the Organelle Reference Sequences project (part of RefSeq, see above); and links to (a) lists of completely sequenced organelles shown in taxonomic hierarchy and alphabetically by organism, (b) gene and RNA order in metazoan mitochondria, and (c) related web sites. |
Entrez Genomes - Graphical representation of complete eukaryotic organelles that can be viewed in their entirety or explored in progressively greater detail; links to associated sequence data. A summary of Coding Regions (described above) is provided for each organelle. Additional information about Entrez Genomes is also provided above. |
Tools | Overview |
Data Retrieval - Text Term Searching |
Entrez - provides integrated access to nucleotide and protein sequence data from >130,000 organisms, along with 3D protein structures, genomic mapping information, PubMed MEDLINE, and more.Sequence data are combined from various sources, including GenBank, EMBL, DDBJ, RefSeq, PIR-International, PRF, Swiss-Prot, and PDB. A Data Model provides a schematic illustration of the connections between the many data types in Entrez.
|
Batch Entrez - allows you to retrieve a large number of nucleotide sequences or protein sequences from Entrez, in a batch mode, by importing a file containing a list of the desired GI or accession numbers. Search results are saved directly to a local disk file on your computer. |
Entrez Utilities - Entrez Programming Utilities, also called E-Utilities, are tools that provide access to Entrez data outside of the regular web query interface. They represent a method of making WWW links to Entrez. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL. For example, EFetch retrieves records in the requested format from a list of one or more primary IDs or from the user's environment. The E-Utilities web page describes the available utilities and links to a brief help document for each one. E-Utilities can be helpful for retrieving search results for future use in another environment. To receive announcements about about Entrez Utilities, see the NCBI Email Lists page. |
LinkOut - a registry service to create links from specific articles, journals, or biological data in Entrez (described above) to resources on external web sites. Third parties can provide a URL, resource name, brief description of their web site, and specification of the NCBI data from which they would like to establish links. The specification can be written as a valid Boolean query to Entrez, or as a list of identifiers for specific articles or sequences. Entrez PubMed users can then select which external links are visible in their searches, through the NCBI Cubby service (described below). To receive announcements about updates and new features in LinkOut, see the NCBI Email Lists page. |
Cubby - allows Entrez users to store and update searches, and to customize their LinkOut (described above) display to include or exclude links to providers. The Cubby requires that your system accepts cookies. You must also complete a brief registration form in which you select a username and password. You will need those in order to access your "cubby." |
Query E-mail Server - The Query server, which provided e-mail access to a subset of Entrez databases, was discontinued on April 15, 2002 because of limited usage. Almost all Entrez searchers now use the WWW Entrez interface, described above. It provides access to more databases and more features than are possible through the e-mail interface. |
Citation Matcher - allows you to find the PubMed ID of any article in the PubMed database, given its bibliographic information (journal, volume, page, etc.). |
Sequence Similarity Searching |
|
BLAST Announcements - To receive announcements about updates and new features, and advance notices about upcoming changes in the NCBI BLAST service, see the NCBI Email Lists page. |
BLAST 2.x - A version of BLAST (Altschul, et al., 1997) that permits gaps in the alignments it produces. Assessments of statistical significance are based upon prior simulations using random sequences. (more...) |
QBLAST - A queuing system that allows users to retrieve Gapped BLAST results at their convenience and format their results multiple times with different formatting options. This system also allows the NCBI to more efficiently use computational resources, better serving the community. As of Fall 1999, the QBLAST system is used for all BLAST searches. (more...) |
MegaBLAST - permits searching with batches of ESTs or with large cDNA or genomic sequences. (more...) |
|
|
PHI-BLAST - Pattern Hit Initiated BLAST (Zhang, et al., 1998) - A program to search a protein database using a protein query, seeking only alignments that preserve a specified pattern contained within the query. (more...) |
PSI-BLAST - Position-Specific Iterated BLAST (Altschul, et al., 1997) - A program for searching protein databases using protein queries, in order to find other members of the same protein family. All statistically significant alignments found by BLAST are combined into a multiple alignment, from which a position-specific score matrix is constructed. This matrix is used to search the database for additional significant alignments, and the process may be iterated until no new alignments are found. (more...) |
RPS-BLAST - Reverse Position-Specific BLAST - A program used to identify conserved
domains in a protein query sequence. It does this by comparing a query
protein sequence to position-specific score matrices that have been prepared
from conserved domain alignments. The service is accessible through
Conserved Domain Search (CD-Search), described below. A readme file provides additional detail about the RPS-BLAST program.
Note: RPS-BLAST is a "reverse" version of position-specific iterated BLAST (PSI-BLAST), described above. Both RPS-BLAST and PSI-BLAST use multiple alignments and position-specific score matrices (PSSMs) to derive conserved features of a protein family. However, RPS-BLAST compares a query sequence against a database of profiles prepared from ready-made alignments, while PSI-BLAST builds alignments starting from a single protein sequence. The programs also differ in purpose: RPS-BLAST is used to identify conserved domains in a query sequence, while PSI-BLAST is used to identify other members of the protein family to which a query sequence belongs. |
Taxonomy BLAST - an implementation of Gapped BLAST (2.x) that groups hits by source organism, according to information in NCBI's Taxonomy database. Species are listed in order of sequence similarity to the query sequence; the strongest match listed first. Three report views are available:
|
BLAST 2 Sequences - A BLAST-based tool for aligning two nucleotide or protein sequences, producing a pairwise DNA-DNA or protein-protein sequence comparison. (more...) |
IgBLAST - IgBLAST was developed to facilitate analysis of immunoglobulin sequences in GenBank. It allows blastp or blastn searches of either the nr database or a special database of Immunoglobulin (Ig) germline V (variable region) genes. Searches may be limited to either human or mouse genes. IgBLAST performs three main functions: (1) reports the variable, D, or J regions that most closely match the query sequence; (2) annotates the immunoglobulin domains (FWR1 through FWR3) according to Kabat et al.; and (3) for searches against the nucleotide nr or protein nr database, simplifies the process of identifying related sequences by matching the IgBLAST hits to the closest germline V genes. (more...) |
BLink - BLink ("BLAST Link") displays the results of BLAST searches that have been done for every protein sequence in the Entrez Proteins data domain. To access it, follow the Blink link displayed beside any hit in the results of an Entrez Proteins search. In contrast to Entrez's "Related Sequences" feature, which lists the titles of similar sequences, BLink displays the graphical output of pre-computed blastp results against the protein non-redundant (nr) database. The output includes the positions of up to 200 BLAST hits on the query sequence, scores, and alignments. (View sample BLink output for human MLH1 protein.) BLink offers a variety of display options, including the distribution of hits by taxonomic grouping, the best hit to each organism, the protein domains in the query sequence, similar sequences that have known 3-D structures, and more. Additional options allow you to specify which taxa you would like to exclude, increase or decrease the BLAST cutoff score, or filter the BLAST hits to show only those from a specific source database, such as RefSeq or Swiss-Prot. See the BLink help document for additional information. |
BLAST E-mail server - an e-mail-based sequence similarity search service; this was discontinued on June 17, 2002 because of limited usage. Most BLAST searchers are now done through BLAST web page. |
Network BLAST - a TCP/IP-based client-server version of WWW Entrez. Makes a direct connection with the NCBI databases over the Internet to retrieve data. No web browser is required. Client software is available for PC, Mac, and Unix on the FTP site at ftp://ftp.ncbi.nih.gov/blast/blastcl3/ |
Stand-alone BLAST - download BLAST executables for local use from ftp://ftp.ncbi.nih.gov/blast/executables/. Binaries are provided for IRIX 6.2, Solaris 2.6, DEC OSF1 (ver. 4.0d), LINUX, and Win32 systems. Please read the README file in the ftp directory for more information. BLAST databases also available for downloading. There is also some information on setting up Standalone BLAST at the NHGRI site at http://genome.nhgri.nih.gov/blastall/blast_install. |
Nucleotide Sequence Analysis |
BLAST - see sequence similarity searching, above, for a complete list of BLAST programs. |
e-PCR - Electronic PCR - compare a query sequence to mapped sequence-tagged sites (STSs) to find a possible map location for the query sequence. E-PCR finds STSs in DNA sequences by searching for subsequences that closely match the PCR primers present in mapped markers. The subsequences must have the correct order, orientation, and spacing that they could plausibly prime the amplification of a PCR product of the correct molecular weight. e-PCR searches against data in NCBI's UniSTS, described above. e-PCR can be used on the WWW, or the software can be downloaded from the /pub/schuler/e-PCR directory of the NCBI ftp site. |
Entrez Gene - as described above, each Entrez Gene record encapsulates a wide range of information for a given gene and organism. When possible, the information includes results of analyses that have been done on the sequence data. The amount and type of information presented depend on what is available for a particular gene and organism and can include: (1) graphic summary of the genomic context, intron/exon structure, and flanking genes, (2) link to a graphic view of the mRNA sequence, which in turn shows biological features such as CDS, SNPs, etc., (3) links to gene ontology and phenotypic information, (4) links to corresponding protein sequence data and conserved domains, (5) links to related resources, such as mutation databases. (Entrez Gene is a successor to LocusLink, also described above.) |
Malaria Genetics and Genomics - provides data and information relevant to malaria genetics and genomics. Resources include organism specific sequence BLAST databases (Plasmodium falciparum only, all Plasmodium, and all Toxoplasma). More about the Malaria genome resources below. |
Model Maker - allows you to view the evidence (mRNAs, ESTs, and gene predictions) that was aligned to assembled genomic sequence in order to build a gene model, and to edit the model by selecting or removing putative exons. You can then view the mRNA sequence and potential ORFs for the edited model, and save the mRNA sequence data for use in other programs. Model Maker is accessible from sequence maps that were analyzed at NCBI and displayed in Map Viewer (described above). To see an example, follow the "mm" link beside any gene annotated on the human "Gene_Sequence" map in the Map Viewer. (More info about human data in Map Viewer is given above.) |
ORF Finder - graphical analysis tool which finds all open reading frames of a selected minimum size in a user's sequence or in a sequence already in the database. Designed for prokaryotic sequences. Identifies all open reading frames using the standard or alternative genetic codes. The deduced amino acid sequence can be saved in various formats and searched against the sequence database using the WWW BLAST server. The ORF Finder is also packaged with the Sequin sequence submission software. The stand alone program can be downloaded from NCBI ftp site. |
ProtEST - a tool that presents a graphical view of matches between nucleotide sequences in UniGene and possible translational products. To generate the alignments, the 6-frame translations of mRNA and EST sequences in UniGene are compared to protein sequences using BLASTX with -e 1e-6. The translated nucleotide sequences are compared with proteins from eight model organisms and the best match in each organism is recorded. UniGene nucleotide sequences can thus have up to eight matches in ProtEST. |
Retroviruses Resources - A collection of resources specifically designed to support the research of retroviruses. Resources include a genotyping tool that uses the BLAST algorithm to identify the genotype of a query sequence; an alignment tool for global alignment of multiple sequences; an HIV-1 automatic sequence annotation tool; and annotated maps of 16 retroviruses viewable in GenBank, FASTA, and graphic formats, with links to associated sequence records. |
SAGEmap - SAGEmap provides a tool for performing statistical tests designed specifically for differential-type analyses of SAGE (Serial Analysis of Gene Expression) data. The data include SAGE libraries generated by individual labs as well as those generated by the Cancer Genome Anatomy Project (CGAP, described above), which have been submitted to Gene Expression Omnibus (GEO, described above). Gene expression profiles that compare the expression in different SAGE libraries are also available on the Entrez GEO Profiles pages. It is possible to enter a query sequence in the SAGEmap resource to determine what SAGE tags are in the sequence, then map to associated SAGEtag records and view the expression of those tags in different CGAP SAGE libraries. |
Spidey - mRNA-to-genomic alignment program that was designed to find good alignments regardless of intron size, and to avoid getting confused by nearby pseudogenes and paralogs. It uses a combination of alignment algorithms and heuristics to construct its models. Spidey has been optimized for both intraspecies and interspecies alignments. (more... ) |
UniGene DDD - Digital Differential Display - an online tool to compare computed gene expression profiles between selected cDNA libraries. Using a statistical test, genes whose expression levels differ significantly from one tissue to the next are identified and shown to the user. Additional information about UniGene is in the Molecular Databases/Genes section. |
VecScreen - a tool for identifying segments of a nucleic acid sequence that may be of vector, linker or adapter origin prior to sequence analysis or submission. VecScreen was developed to combat the problem of vector contamination in public sequence databases. It is also useful to run a new sequence through VecScreen before performing any kind of analysis on the sequence, since the presence of vector sequences can lead to misleading BLAST hits, etc. VecScreen compares a query sequence against the UniVec database, described above. |
Protein Sequence Analysis |
BLAST - see sequence similarity searching, above, for a complete list of BLAST programs. |
BLink - BLink ("BLAST Link") displays the results of BLAST searches that have been done for every protein sequence in the Entrez Proteins data domain. To access it, follow the BLink link displayed beside any hit in the results of an Entrez Proteins search. In contrast to Entrez's "Related Sequences" feature, which lists the titles of similar sequences, BLink displays the graphical output of pre-computed blastp results against the protein non-redundant (nr) database. The output includes the positions of up to 200 BLAST hits on the query sequence, scores, and alignments. (View sample BLink output for human MLH1 protein.) BLink offers a variety of display options, including the distribution of hits by taxonomic grouping, the best hit to each organism, the protein domains in the query sequence, similar sequences that have known 3-D structures, and more. Additional options allow you to specify which taxa you would like to exclude, increase or decrease the BLAST cutoff score, or filter the BLAST hits to show only those from a specific source database, such as RefSeq or Swiss-Prot. See the BLink help document for additional information. |
CD-Search - The Conserved Domain Search Service (CD-Search) can be used to identify the conserved domains present in a protein sequence. CD-Search uses RPS-BLAST (described above) to compare a query sequence against position-specific score matrices that have been prepared from conserved domain alignments present in the Conserved Domain Database (CDD) (described above). Hits can be displayed as a pairwise alignment of the query sequence with a representative domain sequence, or as a multiple alignment. Alignments are also mapped to known 3-dimensional structures, and can be displayed using Cn3D (described above). In the Cn3D display, residues in sequence alignments are variously colored, based on their degree of conservation. (more...) |
COGnitor - compare your sequence to the COGs database (described above) to identify the cluster of orthologous groups to which it belongs. A stand-alone dignitor program is also available. It runs cognitor in batch mode, comparing a large group of proteins to the COGs database, and can be downloaded from the ftp site. |
Conserved Domain Architecture Retrieval Tool (CDART) - When given a protein query sequence, CDART displays the functional domains that make up the protein and lists proteins with similar domain architectures. The functional domains for a sequence are found by comparing the protein sequence to a database of conserved domain alignments, CDD (described above), using RPS-BLAST (described below). |
ProtEST - a tool that presents a graphical view of matches between nucleotide sequences in UniGene and possible translational products. To generate the alignments, the 6-frame translations of mRNA and EST sequences in UniGene are compared to protein sequences using BLASTX with -e 1e-6. The translated nucleotide sequences are compared with proteins from eight model organisms and the best match in each organism is recorded. UniGene nucleotide sequences can thus have up to eight matches in ProtEST. |
TaxPlot - a tool for 3-way comparisons of genomes on the basis of the protein sequences they encode. To use TaxPlot, one selects a reference genome to which two other genomes are compared. Pre-computed BLAST results are then used to plot a point for each predicted protein in the reference genome, based on the best alignment with proteins in each of the two genomes being compared. |
3-D Structure Display and Similarity Searching |
Cn3D - "See in 3-D," a structure and sequence alignment viewer for NCBI databases. It allows viewing of 3-D structures and sequence-structure or structure-structure alignments. Cn3D can work as a helper application to your browser, or as a client-server application that retrieves structure records from MMDB (described above) directly over the internet. The Cn3D home page provides access to information on how to install the program, a tutorial to get started, and a comprehensive help document. |
VAST - Vector Alignment Search Tool - a computer algorithm developed at NCBI and used to identify similar protein 3-dimensional structures. The "structure neighbors" for every structure in MMDB are pre-computed and accessible via links on the MMDB Structure Summary pages. These neighbors can be used to identify distant homologs that cannot be recognized by sequence comparison alone. |
VAST search - - structure-structure similarity search service. Compares 3D coordinates of a newly determined protein structure to those in the MMDB/PDB database. VAST Search computes a list of structure neighbors that you may browse interactively, viewing superpositions and alignments by molecular graphics. |
CD-Search - The Conserved Domain Search Service (CD-Search) can be used to identify the conserved domains present in a protein sequence. CD-Search uses RPS-BLAST (described above) to compare a query sequence against position-specific score matrices that have been prepared from conserved domain alignments present in the Conserved Domain Database (CDD) (described above). Hits can be displayed as a pairwise alignment of the query sequence with a representative domain sequence, or as a multiple alignment. Alignments are also mapped to known 3-dimensional structures, and can be displayed using Cn3D (described above). In the Cn3D display, residues in sequence alignments are variously colored, based on their degree of conservation. |
Threading - As part of NCBI's Computational Biology Branch (described above), the Structure group, led by Dr. Steve Bryant, conducts research in protein threading. Protein threading predicts the three-dimensional structure of a protein sequence by threading it through known structures and calculating its energy. The experimental software developed by the NCBI Structure group is available on the FTP site. A readme file provides more information as well as references. |
Genome Analysis Tools |
Entrez Genomes - whole genomes of over 1000 organisms. The genomes represent both completely sequenced organisms and those for which sequencing is in progress. All three main domains of life - bacteria, archaea, and eukaryota - are represented, as well as many viruses, phages, viroids, plasmids, and organelles.. Entrez Genomes provides graphical overviews of complete genomes/chromosomes, and the ability to explore regions of interest in progressively greater detail. ProtTables and TaxTables are provided for organisms on which analyses have been done by NCBI staff. |
Map Viewer - shows integrated views of chromosome maps for many organisms. Used to view the NCBI assembly of complete genomes, including human, Map Viewer is a valuable tool for the identification and localization of genes, particularly those that contribute to diseases. Additional information about Map Viewer is provided in the Genomes and Maps section of this guide. |
SKY/M-FISH & CGH Database - The NCI and NCBI SKY/M-FISH and CGH Database is a repository of publicly submitted data from Spectral Karyotyping (SKY), Multiplex Fluorescence In Situ Hybridization (M-FISH), and Comparative Genomic Hybridization (CGH), which are complementary fluorescent molecular cytogenetic techniques. SKY/M-FISH permits the simultaneous visualization of each human or mouse chromosome in a different color, facilitating the identification of chromosomal aberrations; CGH can be used to generate a map of DNA copy number changes in tumor genomes. Collaborative project with the National Cancer Institute. (data submission instructions...) |
Gene Expression Tools |
Gene Expression Omnibus (GEO) - provides several tools to assist with the visualization and exploration of GEO data. Datasets may be viewed as hierarchical cluster heat maps, providing insight into the relationships between samples and co-regulated genes. Individual gene expression profiles showing significant differences between experimental subsets may be located using average subset rank value comparisons. Related gene expression profiles may be identified on the basis of sequence similarity, profile similarity, or homology. Indicators of dataset normalization quality are provided as distribution graphs, and by flagging outliers. Links to other
NCBI sequence, mapping and publication database resources are provided where
possible. (More information about GEO is provided in the Molecular Databases/Gene Expression section of this file.) |
SAGEmap - SAGEmap provides a tool for performing statistical tests designed specifically for differential-type analyses of SAGE (Serial Analysis of Gene Expression) data. The data include SAGE libraries generated by individual labs as well as those generated by the Cancer Genome Anatomy Project (CGAP, described above), which have been submitted to Gene Expression Omnibus (GEO, described above). Gene expression profiles that compare the expression in different SAGE libraries are available on the Entrez GEO Profiles pages. It is also possible to enter a query sequence in the SAGEmap resource to determine what SAGE tags are in the sequence, then map to associated SAGEtag records and view the expression of those tags in different CGAP SAGE libraries. (More information about SAGEmap is provided in the Molecular Databases/Gene Expression section of this file.) |
Cancer Genome Anatomy Project (CGAP) - an interdisciplinary program to identify the human genes expressed in different cancerous states, based on cDNA (EST) libraries, and to determine the molecular profiles of normal, precancerous, and malignant cells. CGAP is a collaboration among the National Cancer Institute, the NCBI, and numerous research labs. (Related resources are listed under human genome/cancer research.) The following tools are provided by the National Cancer Institute (NCI) through their CGAP web page:
|
UniGene DDD - Digital Differential Display - an online tool to compare computed gene expression profiles between selected cDNA libraries. Using a statistical test, genes whose expression levels differ significantly from one tissue to the next are identified and shown to the user. Additional information about UniGene is in the Molecular Databases/Genes section. |
Research at NCBI | Overview |
Computational Biology Branch Home Page - Overview of the research program in the Computational Biology Branch (CBB) of NCBI and a list of Senior Investigators. The research programs focus on theoretical, analytical, and applied approaches to a broad range of fundamental problems in molecular biology, including biomolecular structures, genome analysis, theory of sequence analysis, hardware design, software and database design, and text retrieval and document analysis. |
Senior Investigators in PubMed - publications written by senior investigators in the NCBI Computational Biology Branch and represented in the PubMed database. The PubMed records include links to publisher web sites and/or full text articles when available. |
Seminar Schedule - Seminars held at NCBI on a wide range of molecular biology and mathematical topics. These seminars are open to the NIH community and the general public, and are presented by NCBI staff as well as visiting scientists. |
Postdoctoral Fellowships - general information, application procedure |
SoftwareEngineering | Overview |
Information Engineering Branch Home Page - Overview of the functions of the Information Engineering Branch (IEB) of NCBI, which is responsible for designing and building NCBI's production software and databases. |
NCBI ToolBox - Supported software tools from IEB. Describes the three components of the ToolBox: data model, data encoding, and programming libraries. Provides access to documentation for the data model, C toolkit, C++ toolkit, NCBI Toolkit Source Browser, XML demo program, XML DTDs, and the FTP site. Additional information about the FTP site is provided below. |
R&D; Projects - The IEB Research and Development Area is a place for IEB projects and datasets which may never become fully supported NCBI resources. This includes early prototypes of software, results of early or one-off analyses, tools that a fully functional but not integrated into the main, public NCBI systems, or datasets that may have some value but do not fit well into the main NCBI pages. |
ASN.1 - The software in the NCBI ToolBox is primarily designed to read Abstract Syntax Notation 1 (ASN.1) format records, an International Standards Organization (ISO) data representation format. The readme files in the toolbox and toolbox/ncbi_tools directories of the FTP site contain more information about the toolbox and ASN.1. An ASN.1 summary is also available. The ToolBox can produce data as either ASN.1, as before, or as XML (more about XML). Additional information about the ToolBox, documentation, and demo programs are available on the NCBI ToolBox page. |
Education | Overview |
News - keeping up with the changes at NCBI |
NCBI News - announcements about new resources, enhancements to existing resources, staff publications, tutorials, FAQs |
What's New - recently released resources and enhancements to existing resources |
NCBI Announcements Email Lists - Receive announcements about changes and updates to a variety of NCBI services. In addition to a general NCBI-announce list, topic-specific e-mail lists are available for BLAST, GenBank, dbSNP, Genomes, LinkOut, RefSeq, Sequin, and Entrez Utilities (for making WWW Links to Entrez). Information on how to subscribe is provided. |
Books |
Coffee Break - a collection of short reports on recent biological discoveries. Each report incorporates interactive tutorials that show how bioinformatics tools are used as a part of the research process. |
Genes and Disease - introduction to the relationship between genetic factors and human disease. Summary information for ~60 genetic diseases with links to related databases and organizations. |
NCBI Handbook - an online book, written by NCBI staff, that discusses the many resources available at NCBI. Each chapter is devoted to one service; after a brief overview on using the resource, there is an account of how the resource works, including topics such as how data are included in a database, database design, query processing, and how the different resources relate to each other. |
Entrez Books - In collaboration with book publishers, the NCBI is adapting textbooks for the web and linking them to PubMed, the biomedical bibliographic database. The idea is to provide background information to PubMed, so that users can explore unfamiliar concepts found in PubMed search results. |
Glossaries |
NCBI Handbook Glossary - part of the NCBI Handbook, described above. Includes a variety of terms pertaining to biological data and bioinformatics. |
BLAST Tutorial Glossary of Terms - includes terms pertaining to BLAST sequence similarity searching. |
FieldGuide Glossary - developed for the Field Guide course described below. |
Human Genome Build Glossary - accompanies the document that describes the NCBI Genomic Sequence Assembly and Annotation Process. |
Mouse Genome Build Glossary - accompanies the NCBI Mouse Contig Assembly and Annotation Process. |
NHGRI Talking Glossary of Genetic Terms - by the National Human Genome Research Institute (NHGRI). |
Tutorials |
Science Primer - The science behind our resources. An introduction for researchers, educators and the public. Provides a plain language introductions to bioinformatics, genome mapping, molecular modeling, SNPs, ESTs, microarray technology, molecular genetics, pharmacogenomics, and phylogenetics. |
PubMed Tutorial - comprehensive instruction on using PubMed's various features |
Entrez Tutorial - show users how to make use of the full power of the Entrez data retrieval system. Using a human gene as an example, it demonstrates the variety of information that can be gathered for a single gene across a number of Entrez databases. |
BLAST tutorials for new and veteran users |
|
|
|
|
|
BLAST Statistics |
3-D Protein Structure Tutorial: Cn3D structure viewing program |
Map Viewer Exercises - a chapter within the NCBI Handbook (described above). |
Coffee Break - a collection of short reports on recent biological discoveries. Each report incorporates interactive tutorials that show how bioinformatics tools are used as a part of the research process. |
Courses |
Field Guide to GenBank and NCBI Resources - three-hour lecture plus two-hour optional hands-on computer lab designed for end users with a science background. Presented at universities across the United States as well as on-site at NLM. |
Introduction to Molecular Biology Information Resources - a three-day Medical Library Association (MLA) CE Course designed for librarians who have little or no experience with molecular biology databases and search systems, and who handle occasional questions about those resources at the reference desk. Course format combines lecture, demonstration, and hands-on experience. |
NCBI Advanced Workshop for Bioinformatics Information Specialists - a five-day workshop designed for library staff with a science background who have full-time bioinformatics support positions. This includes bioinformatics librarians as well as scientists who have been hired by libraries to establish training and user support programs. Applicants must already have some experience with molecular biology databases and software programs. Course format combines lecture, demonstration, and hands-on experience. |
Additional Resources |
Cancer Information - a wide range of accurate, credible cancer information brought to you by the National Cancer Institute (NCI). CancerNet information is reviewed regularly by oncology experts and is based on the latest research. It includes information selected and organized for patients, health professionals, and basic researchers. |
Human Genome Project - an international research effort to characterize the genomes of human and selected model organisms through complete mapping and sequencing of their DNA; to develop technologies for genomic analysis; to examine the ethical, legal, and social implications of human genetics research; and to train scientists who will be able to utilize the tools and resources developed through the HGP to pursue biological studies that will improve human health. This link leads to the information provided on the National Human Genome Research Institute (NHGRI) web site. |
NHGRI Educational Resources - the National Human Genome Research Institute (NHGRI) provides a range of educational resources, including glossaries, fact sheets, multimedia educational kits, genetic education modules for use by teachers, and a variety of online materials. |
NIH Office of Science Education |
FTP Site | Overview |
Download Databases |
BLAST databases - a collection of databases formatted for use with the BLAST software. A readme file provides database descriptions. |
GenBank and Daily Updates |
|
|
|
RefSeq - NCBI database of Reference Sequences. Curated, non-redundant set including genomic DNA contigs, mRNAs and proteins for known genes, mRNAs and proteins for gene models, and entire chromosomes. Accession numbers have the format of two letters, an underscore bar, and six digits, for example: NT_123456, NM_123456, NP_123456, NC_123456, NG_123456, XM_123456, XR_123456, XP_123456 (more info about accession numbers and access). |
LocusLink - a collection of files from the LocusLink project, which is a curated, collaborative effort among NCBI, Human Gene Nomenclature Committee, OMIM, and other groups to collect sequence data and descriptive information about genetic loci. LocusLink currently contains data for a number of species such as human, mouse, rat, zebrafish, nematode, fruit fly, cow, sea urchin, African clawed frog, and HIV-1. Organisms can be searched together or separately. (See additional information about LocusLink above). |
dbSNP - database of single nucleotide polymorphisms, small-scale insertions/deletions, polymorphic repetitive elements, and microsatellite variation |
Taxonomy - data from the NCBI Taxonomy database (described above). Includes a UNIX compressed tar file called "taxdump.tar.Z" that is updated daily and contains a dump of the taxonomy information from SyBase. Note that the *.dmp files are not human-friendly files, but can be uploaded into SyBase with the BCP facility. When you uncompress and untar the file, you will see several files, including a Readme file that contains more information. |
Repository of databases - This FTP directory contains a mix of NCBI databases (e.g., UniGene, GeneMap, dbEST, dbGSS, dbSTS, OMIM) and a number of externally developed databases (e.g., EPD, TFD). The external databases are made available on the FTP site as a service to the scientific community. They are contributed by outside scientists and maintained independently of NCBI. All the files in the FTP directory of a non-NCBI database are placed there and maintained by the developers of that database. Questions about non-NCBI databases should be directed to the contacts listed in the readme or other background files for the individual databases. Note that additional NCBI databases are also found in the root directory of the FTP site (under the database name, such as GenBank, Gene, RefSeq), or in the "pub" directory (usually under the name of the primary resource developer). |
Download Genomes |
Human Genome Project Data - the ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/ directory contains one folder for each chromosome, which includes genomic contigs (NT_* records) built from finished and unfinished sequence data. The contigs are available in various formats, described below. The contig assembly and annotation process is described in a separate document.
|
Other Genomes - such as bacteria, nematode, mouse, and others can be downloaded from one of two directories:
Note: In some cases, an organism might be listed in both directories. This can happen for several reasons: (1) there are two versions of the genome are available - one in GenBank, and one in RefSeq; or (2) the organism's data was assembled at NCBI and was available from the "/genbank/genomes/" directory before the new "/genomes/" directory was set up. In the latter case, the data now exists in the new "/genomes/" directory, but a symbolic link was preserved in the original directory to facilitate user access. |
Download Software |
BLAST Programs |
|
|
|
NOTE: Preformatted BLAST databases also available for downloading, in addition to the software listed above. A readme file provides database descriptions. |
Client/server programs |
|
|
|
Cn3D - "See in 3-D," a structure and sequence alignment viewer for NCBI databases. It allows viewing of 3-D structures and sequence-structure or structure-structure alignments. Cn3D can work as a helper application to your browser, or as a client-server application that retrieves structure records from MMDB (described above) directly over the internet. The Cn3D home page provides access to information on how to install the program, a tutorial to get started, and a comprehensive help document. |
NCBI Software ToolBox - set of software and data exchange specifications used by NCBI to produce portable, modular software for molecular biology. The software in the Toolbox is primarily designed to read Abstract Syntax Notation 1 (ASN.1) format records, an International Standards Organization (ISO) data representation format. The software is available to the public in the toolbox/ncbi_tools directory of NCBI's ftp site, and can be used in its own right or as a foundation for building tools with similar properties. The readme files in the toolbox and toolbox/ncbi_tools directories of the FTP site contain more information about the toolbox and ASN.1. An ASN.1 summary is also available. The ToolBox can produce data as either ASN.1, as before, or as XML (more about XML). Additional information about the ToolBox, documentation, and demo programs are available on the NCBI ToolBox page. Additional information about the Information Engineering Branch (IEB) of NCBI, which develops the ToolBox, is provided above, along with other items of interest to software developers. |
Software programs developed as personal projects by various NCBI scientists - /pub directory of FTP site contains programs such as MACAW (multiple sequence alignments) and e-PCR (description above). |
Help Desk | NCBI | NLM | NIH | Credits |
Revised October 1, 2004 |