DOEGenomes.org
Human Genome Project Information  Genomics:GTL  Microbial Genome Program  home
-
skip navigation
Home Site Index Home
What's New
About the HGP Ethical, Legal, and Social Issues
Research Education Medicine Media
How Many Genes Are in the Human Genome?

 Subject Index

 Send the url of this page to a friend

News
 What's New
 Meetings Calendar
 Media Guide

Basic Information
 FAQs
 Glossary
 Acronyms
 Links
 Genetics 101
 Publications

About the Project
 What is it?
 Goals
 Progress
 History
 Ethical Issues
 Benefits
 Genetics 101

Medicine &
the New Genetics

 Home
 Gene Testing
 Gene Therapy
 Pharmacogenomics

 Disease Information
 Genetic Counseling

Ethical, Legal,
Social Issues

 Home
 Privacy Legislation

 Gene Testing
 Patenting
 Forensics
 Genetically Modified Food
 Behavioral Genetics
 Minorities, Race, Genetics
 Genetics in Courtroom

Education
 Teachers
 Careers
 Students
 Webcasts Audio/Video
 Images
 Videos
 Chromosome Poster
 Presentations
 Genetics 101
 
Genética Websites en Español

Research
 Home
 Sequencing
 Instrumentation
 Mapping
 Bioinformatics
 Functional Genomics
 ELSI Research
 Recent Abstracts
 US,Intl. Research Sites
 Funding

Publications
 Human Genome News
 Chromosome Poster
 Primer Molecular Genetics
 To Know Ourselves
 Your Genes, Your Choices
 List of All Publications

  ???Search This Site


 Contact Us
 Privacy Statement

 Site Stats and Credits

Although the completion of the Human Genome Project was celebrated in April 2003 and sequencing of the human chromosomes is essentially "finished," the exact number of genes encoded by the genome is still unknown. October 2004 findings from The International Human Genome Sequencing Consortium, led in the United States by the National Human Genome Research Institute (NHGRI) and the Department of Energy (DOE), reduce the estimated number of human protein-coding genes from 35,000 to only 20,000-25,000, a surprisingly low number for our species (7). Consortium researchers have confirmed the existence of 19,599 protein-coding genes in the human genome and identified another 2,188 DNA segments that are predicted to be protein-coding genes.

In 2003, estimates from gene-prediction programs suggested there might be 24,500 or fewer protein-coding genes (1). The Ensembl genome-annotation system estimates them at 23,299.

When analysis of the draft human genome sequence was published by the International Human Genome Sequencing Consortium on February 15, 2001, the paper estimated only about 30,000 to 40,000 protein-coding genes, much lower than previous estimates of around 100,000. This lower estimate came as a shock to many scientists because counting genes was viewed as a way of quantifying genetic complexity. With around 30,000, the human gene count would be only one-third greater than that of the simple roundworm C. elegans at about 20,000 genes (2).

Studies since the publication of the draft genome sequence have generated widely different estimates. An analysis by scientists at Ohio State University suggested between 65,000 and 75,000 human genes (3), and another study published in Cell in August 2001 predicted a total of 42,000 (4).

Although the number of human genes is still uncertain, a winner of GeneSweep was announced in May 2003. GeneSweep was an informal gene-count betting pool that began at the 2000 Cold Spring Harbor Laboratory Genome Meeting. Bets ranged from around 26,000 to more than 150,000 genes. Since most gene-prediction programs were estimating the number of protein-coding genes at less than 30,000, GeneSweep officials decided to declare the contestant with the lowest bet (25,947 by Lee Rowen of the Institute of Systems Biology in Seattle) the winner (1).

It could be years before a truly reliable gene count can be assessed. The reason for so much uncertainty is that predictions are derived from different computational methods and gene-finding programs. Some programs detect genes by looking for distinct patterns that define where a gene begins and ends ("ab initio" gene finding). Other programs look for genes by comparing segments of sequence with those of known genes and proteins (comparative gene finding). While ab initio gene finding tends to overestimate gene numbers by counting any segment that looks like a gene, comparative gene finding tends to underestimate since it is limited to recognizing only those genes similar to what scientists have seen before. Defining a gene is problematic because small genes can be difficult to detect, one gene can code for several protein products, some genes code only for RNA, two genes can overlap, and there are many other complications (5).

Even with improved genome analysis, computation alone is simply not enough to generate an accurate gene number. Clearly, gene predictions will have to be verified by labor-intensive work in the laboratory before the scientific community can reach any real consensus (6).


Related Web Sites

NCBI Human Genome - Release notes for the most current build of the human genome from the National Center for Biotechnology Information (NCBI) used in its genome browser called Map Viewer.
  • Homo sapiens Genome View - Browse the human genome using NCBI's Map Viewer. Access a tutorial (hyperlink "tutorial" to http://www.ornl.gov/hgmis/posters/chromosome/map.shtml ) introducing some basics for using Map Viewer.

UCSC Human Genome Browser Gateway - Genome browser maintained by the Genome Bioinformatics Group of the University of California, Santa Cruz. Human genome data based on the most recent build available from NCBI.

Ensembl Human Genome - The most current human genome release available from the European Bioinformatics Institute's human genome browser. The Ensembl release is derived from the NCBI human genome build.


Related Articles

Nature Web Focus: The Human Genome- Nature Publishing Group maintains this Web site that links to scientific articles reporting the finished genome sequence for each human chromosome.

Updated Summaries of Public Draft Human Genome Sequence

  • International Human Genome Sequencing Consortium. "Finishing the euchromatic sequence of the human genome." Nature 431, 931 - 945 (21 October 2004). Available online.
  • Schmutz J. et al.. Human genome: Quality assessment of the human genome sequence. Nature 429, 365-368 (27 May 2004). Available online.

Summary of Public Draft Human Genome Sequence
Eric S. Lander, et al. "Initial Sequencing and Analysis of the Human Genome." Nature 409, 860–921 (February 15, 2001). Available online.

Summary of Celera's Draft Human Genome Sequence
J. Craig Venter, et al. "The Sequence of the Human Genome." Science 291, 1304–1351 (February 16, 2001). Available online.

--. "The Nature of the Number." Nature Genetics 25, 127–28 (2000). (Editorial).

Samuel A. J. R. Aparicio. "How to Count...Human Genes." Nature Genetics 25, 129–30 (2000).

B. Ewing, and P. Green. "Analysis of Expressed Sequence Tags Indicates 35,000 Human Genes." Nature Genetics 25, 232–34 (2000).

Hugues Roest Crollius, et al. "Estimate of Human Gene Number Provided by Genome-Wide Analysis Using Tetraodon nigroviridis DNA Sequence." Nature Genetics 25, 235–38 (2000).

Feng Liang, et al. "Gene Index Analysis of the Human Genome Estimates Approximately 120,000 Genes." Nature Genetics 25, 239–40 (2000).


References

1. Elizabeth Pennisi. "A Low Number Wins the GeneSweep Pool." Science 300, 1484 (2003).

2. Jean-Michel Claverie. "Gene Number. What If There Are Only 30,000 Human Genes?" Science 291, 1255–7 (2001).

3. Helen Briggs. "Dispute Over Number of Human Genes." BBC News Online (2001).

4. Fred A. Wright, et al. "A Draft Annotation and Overview of the Human Genome." Genome Biology 2, 1–18 (2001).

5. Elizabeth Pennisi. "Gene Counters Struggle to Get the Right Answer." Science 301, 1040–1041 (2003).

6. Tom Hollon. "Human Genes: How Many?" The Scientist 15, 1 (2001).

7. Lincoln D. Stein. Human genome: End of the beginning. Nature 431, 915 - 916 (21 October 2004). Available online.

Send the url of this page to a friend


To read pdf files, download the free Acrobat Reader software.

Last modified: Wednesday, October 27, 2004

Home * Contacts * Disclaimer

Base URL: www.ornl.gov/hgmis

Office of Science Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program