Subject
Index Send the url of this page to a friend News Basic
Information About
the Project Medicine
& Ethical,
Legal, Education Research
Publications
Search This Site |
Although the completion of the Human Genome Project was celebrated in April 2003 and sequencing of the human chromosomes is essentially "finished," the exact number of genes encoded by the genome is still unknown. October 2004 findings from The International Human Genome Sequencing Consortium, led in the United States by the National Human Genome Research Institute (NHGRI) and the Department of Energy (DOE), reduce the estimated number of human protein-coding genes from 35,000 to only 20,000-25,000, a surprisingly low number for our species (7). Consortium researchers have confirmed the existence of 19,599 protein-coding genes in the human genome and identified another 2,188 DNA segments that are predicted to be protein-coding genes. In 2003, estimates from gene-prediction programs suggested there might be 24,500 or fewer protein-coding genes (1). The Ensembl genome-annotation system estimates them at 23,299. When analysis of the draft human genome sequence was published by the International Human Genome Sequencing Consortium on February 15, 2001, the paper estimated only about 30,000 to 40,000 protein-coding genes, much lower than previous estimates of around 100,000. This lower estimate came as a shock to many scientists because counting genes was viewed as a way of quantifying genetic complexity. With around 30,000, the human gene count would be only one-third greater than that of the simple roundworm C. elegans at about 20,000 genes (2). Studies since the publication of the draft genome sequence have generated widely different estimates. An analysis by scientists at Ohio State University suggested between 65,000 and 75,000 human genes (3), and another study published in Cell in August 2001 predicted a total of 42,000 (4). Although the number of human genes is still uncertain, a winner of GeneSweep was announced in May 2003. GeneSweep was an informal gene-count betting pool that began at the 2000 Cold Spring Harbor Laboratory Genome Meeting. Bets ranged from around 26,000 to more than 150,000 genes. Since most gene-prediction programs were estimating the number of protein-coding genes at less than 30,000, GeneSweep officials decided to declare the contestant with the lowest bet (25,947 by Lee Rowen of the Institute of Systems Biology in Seattle) the winner (1). It could be years before a truly reliable gene count can be assessed. The reason for so much uncertainty is that predictions are derived from different computational methods and gene-finding programs. Some programs detect genes by looking for distinct patterns that define where a gene begins and ends ("ab initio" gene finding). Other programs look for genes by comparing segments of sequence with those of known genes and proteins (comparative gene finding). While ab initio gene finding tends to overestimate gene numbers by counting any segment that looks like a gene, comparative gene finding tends to underestimate since it is limited to recognizing only those genes similar to what scientists have seen before. Defining a gene is problematic because small genes can be difficult to detect, one gene can code for several protein products, some genes code only for RNA, two genes can overlap, and there are many other complications (5). Even with improved genome analysis, computation alone is simply not enough to generate an accurate gene number. Clearly, gene predictions will have to be verified by labor-intensive work in the laboratory before the scientific community can reach any real consensus (6).
UCSC Human Genome Browser Gateway - Genome browser maintained by the Genome Bioinformatics Group of the University of California, Santa Cruz. Human genome data based on the most recent build available from NCBI. Ensembl Human Genome - The most current human genome release available from the European Bioinformatics Institute's human genome browser. The Ensembl release is derived from the NCBI human genome build.
Nature Web Focus: The Human Genome- Nature Publishing Group maintains this Web site that links to scientific articles reporting the finished genome sequence for each human chromosome. Updated Summaries of Public Draft Human Genome Sequence
Summary of Public Draft Human Genome Sequence J. Craig Venter, et al. "The Sequence of the Human Genome." Science 291, 1304–1351 (February 16, 2001). Available online. --. "The Nature of the Number." Nature Genetics 25, 127–28 (2000). (Editorial). Samuel A. J. R. Aparicio. "How to Count...Human Genes." Nature Genetics 25, 129–30 (2000). B. Ewing, and P. Green. "Analysis of Expressed Sequence Tags Indicates 35,000 Human Genes." Nature Genetics 25, 232–34 (2000). Hugues Roest Crollius, et al. "Estimate of Human Gene Number Provided by Genome-Wide Analysis Using Tetraodon nigroviridis DNA Sequence." Nature Genetics 25, 235–38 (2000). Feng Liang, et al. "Gene Index Analysis of the Human Genome Estimates Approximately 120,000 Genes." Nature Genetics 25, 239–40 (2000).
1. Elizabeth Pennisi. "A Low Number Wins the GeneSweep Pool." Science 300, 1484 (2003). 2. Jean-Michel Claverie. "Gene Number. What If There Are Only 30,000 Human Genes?" Science 291, 1255–7 (2001). 3. Helen Briggs. "Dispute Over Number of Human Genes." BBC News Online (2001). 4. Fred A. Wright, et al. "A Draft Annotation and Overview of the Human Genome." Genome Biology 2, 1–18 (2001). 5. Elizabeth Pennisi. "Gene Counters Struggle to Get the Right Answer." Science 301, 1040–1041 (2003). 6. Tom Hollon. "Human Genes: How Many?" The Scientist 15, 1 (2001). 7. Lincoln D. Stein. Human genome: End of the beginning. Nature 431, 915 - 916 (21 October 2004). Available online.
Send the url of this page to a friend |
To read pdf files, download the free Acrobat Reader software.
Last modified: Wednesday, October 27, 2004
Home * Contacts * Disclaimer
Base URL: www.ornl.gov/hgmis
Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program