Short Contents | Full Contents Other books @ NCBI

Part 1. The Databases

PDF Document 1. GenBank: The Nucleotide Sequence Database
Ilene Mizrachi.
Created: October 9, 2002, Updated: July 27, 2004

International Collaboration

Confidentiality of Data

Direct Submissions

Bulk Submissions: High-Throughput Genomic Sequence (HTGS)

Whole Genome Shotgun Sequences (WGS)

Bulk Submissions: EST, STS, and GSS

Bulk Submissions: HTC and FLIC

Submission Tools

Sequence Data Flow and Processing: From Laboratory to GenBank

Microbial Genomes

Third Party Annotation (TPA) Sequence Database


PDF Document 2. PubMed: The Bibliographic Database
Kathi Canese, Jennifer Jentsch, and Carol Myers.
Created: October 9, 2002, Updated: August 13, 2003
Data Sources

Electronic Data Submission

Database Management and Hardware


How PubMed Queries Are Processed

Using PubMed

Additional PubMed Features


Links from PubMed

How to Create Hyperlinks to PubMed

Customer Support

PDF Document 3. Macromolecular Structure Databases
Eric Sayers and Steve Bryant.
Created: October 9, 2002, Updated: August 13, 2003

Content of the Molecular Modeling Database (MMDB)

Content of the Conserved Domain Database (CDD)

Finding and Viewing Structures

Finding and Viewing Structure Neighbors

Finding and Viewing Conserved Domains

Finding and Viewing Proteins with Similar Domain Architectures

Links Between Structure and Other Resources

Saving Output from Database Searches


Frequently Asked Questions


PDF Document 4. The Taxonomy Project
Scott Federhen.
Created: October 9, 2002, Updated: August 13, 2003

Adding to the Taxonomy Database

Using the Taxonomy Browser

The Taxonomy Database: TAXON

Nomenclature Issues

Taxonomy in Entrez: A Quick Tour

The Common Tree Viewer

Indexing Taxonomy in Entrez

The Taxonomy Statistics Page

Other Relevant References

NCBI Taxonomists

Contact Us

Appendix 1. TAXON nametypes.

Appendix 2. Functional classes of TAXON scientific names.

Appendix 3. Other TAXON data types.

PDF Document 5. The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation
Adrienne Kitts and Stephen Sherry.
Created: October 9, 2002, Updated: August 13, 2003

Searching dbSNP

Submitted Content

Computed Content (The dbSNP Build Cycle)

Resource Integration

How to Create a Local Copy of dbSNP

Appendix 1. dbSNP report formats.

Appendix 2. Rules and methodology for mapping.

Appendix 3. 3D structure neighbor analysis.

PDF Document 6. The Gene Expression Omnibus (GEO): A Gene Expression and Hybridization Repository
Ron Edgar and Alex Lash.
Created: October 9, 2002, Updated: August 13, 2003
Site Description

Design and Implementation

Retrieving Data

Depositing Data

Search and Integration

Example of Retrieving Data

Future Directions

Frequently Asked Questions




PDF Document 7. Online Mendelian Inheritance in Man (OMIM): A Directory of Human Genes and Genetic Disorders
Donna Maglott, Joanna S. Amberger, and Ada Hamosh.
Created: October 9, 2002
Content and Access

Guide to OMIM Pages


Legal Statement


PDF Document 8. The NCBI BookShelf: Searchable Biomedical Books
Bart Trawick, Jeff Beck, and Jo McEntyre.
Created: October 9, 2002, Updated: August 13, 2003
Content Acquisition

How to Use the Books


The BookShelf Data Flow


Frequently Asked Questions

PDF Document 9. PubMed Central (PMC): An Archive for Literature from Life Sciences Journals
Jeff Beck and Ed Sequeira.
Created: October 9, 2002, Updated: August 13, 2003
A PubMed Central (PMC) Site Guide

Participation in PMC

Links to Other NCBI Resources

PMC Architecture

Data Flow: 1. SGML/XML Processing

Data Flow: 2. Loading the Database

Special Characters


Frequently Asked Questions

PDF Document 10. The SKY/CGH Database for Spectral Karyotyping and Comparative Genomic Hybridization Data
Turid Knutsen, Vasuki Gobu, Rodger Knaus, Thomas Ried, and Karl Sirotkin.
Created: October 9, 2002, Updated: August 13, 2003
Database Content

Data Analysis: Query Tools

Data Integration



PDF Document 11. The Major Histocompatibility Complex Database, dbMHC
Adrienne Kitts, Michael Feolo, and Wolfgang Helmberg.
Created: May 27, 2003, Updated: August 13, 2003

dbMHC Resources

Database Content

Integration with Other Resources


Part 2. Data Flow and Processing

PDF Document 12. Sequin: A Sequence Submission and Editing Tool
Jonathan Kans.
Created: October 9, 2002, Updated: August 13, 2003
Sequin: A Brief Overview

Sequence Submission

Packaging the Submissions

Viewing and Editing the Sequences

Computational Functions of Sequin

Advanced Topics


PDF Document 13. The Processing of Biological Sequence Data at NCBI
Karl Sirotkin, Tatiana Tatusova, Eugene Yaschenko, and Mark Cavanaugh.
Created: October 9, 2002

Data Flow Components

Data Flow Architecture

PDF Document 14. Genome Assembly and Annotation Process
Paul Kitts.
Created: October 9, 2002, Updated: August 13, 2003
Overview of the Genome Assembly and Annotation Process

The Input Data

Preparation of the Input Sequences

Alignment of Sequences to the Input Genomic Sequences

Genome Assembly

Annotation of Genes

Annotation of Other Features

Product Data Sets

Production of Maps That Display Genome Features

Public Release of Assembly and Models

Integration with Other Resources



Part 3. Querying and Linking the Data

PDF Document 15. The Entrez Search and Retrieval System
Jim Ostell.
Created: October 9, 2002, Updated: August 13, 2003
Entrez Design Principles

Entrez Is a Discovery System

Entrez Is Growing

How Entrez Works


PDF Document 16. The BLAST Sequence Analysis Tool
Tom Madden.
Created: October 9, 2002, Updated: August 13, 2003

How BLAST Works: The Basics

BLAST Scores and Statistics

BLAST Output: 1. The Traditional Report

BLAST Output: 2. The Hit Table

BLAST Output: 3. Structured Output


Appendix 1. FASTA identifiers.

Appendix 2. Readdb API.

Appendix 3. Excerpt from a demonstration program doblast.c.

Appendix 4. A function to print a view of a SeqAlign: MySeqAlignPrint.


PDF Document 17. LinkOut: Linking to External Resources from Entrez Databases
Kathy Kwan.
Created: October 9, 2002, Updated: August 13, 2003
How Is LinkOut Represented in Entrez?

How Does LinkOut Work?

Guides for LinkOut Providers

Communicating with LinkOut Providers

PDF Document 18. The Reference Sequence (RefSeq) Project
Kim D. Pruitt, Tatiana Tatusova, and James M. Ostell.
Created: October 9, 2002, Updated: August 13, 2003

Database Content: Background

Assembling and Maintaining the RefSeq Collection

Access and Retrieval

Related Reading

PDF Document 19. LocusLink: A Directory of Genes
Donna Maglott.
Created: October 9, 2002, Updated: August 12, 2003

How to Query LocusLink

A LocusLink Report: The Details

Maintenance and Reporting

Integration with Other Resources

More Information on LocusLink

PDF Document 20. Using the Map Viewer to Explore Genomes
Susan M. Dombrowski and Donna Maglott.
Created: October 9, 2002, Updated: August 13, 2003

Maintenance of Data

Methods of Access

Interpreting the Display

Customizing the Display

Associated Tools

Technical Details

Caveats for Using Evolving Data


PDF Document 21. UniGene: A Unified View of the Transcriptome
Joan U. Pontius, Lukas Wagner, and Gregory D. Schuler.
Created: October 9, 2002, Updated: August 13, 2003
Expressed Sequence Tags (ESTs)

Sequence Clusters

UniGene Cluster Browser

Protein Similarity Analysis

Digital Differential Display (DDD)



PDF Document 22. The Clusters of Orthologous Groups (COGs) Database: Phylogenetic Classification of Proteins from Complete Genomes
Eugene V. Koonin.
Created: October 9, 2002, Updated: August 13, 2003

Construction of the COGs

Phyletic Pattern Analysis in COGs

Description of the COGs Website

Future Directions

The COG Team


Part 4. User Support

PDF Document 23. User Services: Helping You Find Your Way
David Wheeler and Barbara Rapp.
Created: October 9, 2002, Updated: August 13, 2003
The User Services Team

Development of User Support Materials




PDF Document 24. Exercises: Using Map Viewer
David Wheeler, Kim Pruitt, Donna Maglott, Susan Dombrowski, and Andrei Gabrelian.
Created: November 4, 2002, Updated: August 13, 2003
1. How Do I Obtain the Genomic Sequence around My Gene of Interest?

2. If I Have Physical and/or Genetic Mapping Data, How Do I Use the Map Viewer to Find a Candidate Disease Gene in That Region?

3. How Can I Find and Display a Gene with the Map Viewer?

4. How Can I Analyze a Gene Using the Map Viewer?

5. How Can I Create My Own Transcript Models with the Map Viewer?

6. Using the Mouse Map Viewer

7. How Can I Find Members of a Gene Family Using the Map Viewer?

8. How Can I Find Genes Encoding a Protein Domain Using the Map Viewer?

Created: October 9, 2002, Updated: April 30, 2003

Copyright and Disclaimer