Centers for Disease Control and Prevention
Centers for Disease Control and Prevention
Centers for Disease Control and Prevention CDC Home Search CDC CDC Health Topics A-Z    
Office of Genomics and Disease Prevention  
Office of Genomics and Disease Prevention


This article was published with modifications in Nature Genetics, 2004

The Case for a Global Human Genome Epidemiology Initiative

Muin J. Khoury
Office of Genomics and Disease Prevention
Centers for Disease Control and Prevention
Atlanta, Georgia

Address correspondence to Dr Khoury. Office of Genomics and Disease Prevention, Centers for Disease Control and Prevention, 1600 Clifton Road, Mailstop E82, Atlanta, Georgia, USA 30333
Email: mkhoury@cdc.gov

Download print version


Abstract

Several countries are currently considering or conducting large scale cohort studies to assess the role of genes in the causation of common human diseases. A cohort study is currently proposed for the United States. Here we make the case that what is urgently needed is a coordinated global initiative to standardize and integrate data from the many “cohort” studies currently conducted around the world; develop and apply systematic methods for integrating results obtained from all types of epidemiologic studies and biologic data; and develop and implement an evidence-based process that uses epidemiologic data for conducting clinical trials and assessing the value of genomic information in improving health and preventing disease. Such a global initiative should accelerate the translation of human genome discoveries into population health benefits well ahead of what an individual cohort study in one country can achieve.

Dr. Francis Collins recently presented a strong case for the conduct a very large population-based prospective cohort study in the United States to assess the role of genes and environment in the causation of common human diseases.(1) He argued that without such a study, the promise of the human genome project and genetic and environmental research for improving population health will remain out of reach. Although such a study is worthy of serious consideration, it will be expensive, will take years to plan and implement and in and of itself does not guarantee the desired benefit of translating human genome discoveries into population health benefits, Here we make the case that what is urgently needed right now is a coordinated global initiative to conduct and synthesize human genome epidemiologic research to influence health policy and practice. We present here the rationale for such a global initiative that includes 1) the standardization and integration of data from the many “cohort” studies that are currently been conducted or considered around the world; 2) the development and application of systematic methods for integrating results obtained from all types of epidemiologic studies and biologic data; and 3) the development and implementation of an evidence-based process that uses epidemiologic data for conducting clinical trials and assessing the value of genomic information in improving health and preventing disease. Such a global initiative can be implemented by a network of collaborating centers and supported by public-private partnerships. This coordinated effort should accelerate the translation of human genome discoveries into population health benefits well ahead of what an individual cohort study in one country can achieve.


The need for global collaboration in population genomic cohort studies

Several large scale epidemiologic cohort studies have been initiated in the pre-genomic era to study disease incidence and prevalence, natural history and risk factors. Investigators in these studies are now including genetic risk factors. (2) Examples of cohort studies that include genetics are the Framingham study, (3) the Atherosclerosis Research in Communities, (4) and the European Prospective Investigation on Cancer. (5) In addition, the genomics era is inspiring the development of very large longitudinal cohort studies, including the study proposed by Collins, and even studies of entire populations to establish repositories of biological materials (“biobanks”) for discovery and characterization of genes associated with common diseases. Such studies include large random samples of adult populations like the UK Biobank (N=500,000) (6) and the CartaGene project in Quebec (N=60,000), (7) to populations of entire countries such as Iceland (N=100,000) (8) and Estonia (N=1,000,000) (9) to studies of twins in multiple countries (GenomeEUtwin). (10)

In addition to promoting gene discovery, these biobanks will help quantify the occurrence of diseases in various populations and to understand their natural histories and risk factors, including gene-environment in teractions. Large cohort studies could also be used for case-cohort studies, nested case-control studies or even case-only studies. (2)

Collaboration across these cohort studies is crucial for at least two reasons: to allow validation of initial findings by minimizing false alarms as well as to increase statistical power to detect gene-environment interactions especially for rarer health outcomes. As expected, each one of these studies, could produce a large amount of false positive associations (type I errors) between various health outcomes with thousands of genetic variants, environmental factors and their interactions. Therefore, hypothesis testing across sites will have to be accomplished in hand as part of validation of results from hypothesis-generating studies.

The problem of type II errors or poor statistical power is equally if not more challenging. Consider for a moment the staggering implication of the interactions of numerous gene variants and their products, as well as epigenetic effects. Imagine that for a common disease only 10 genes contribute a substantial population attributable fraction. Even if variation at each locus can be classified in a dichotomous fashion (e.g., susceptible genotype vs. not), this classification will create 2 to the power 10, or over a thousand, possible strata. Dichotomous classification based on just 20 genes, will produce over a million strata. This is methodologially challenging especially when one considers interactions of these genes with other genes and environmental factors.(2) Emerging technology will allow us to study simultaneously hundreds and thousands of genome variations, gene expression profiles and protein patterns. Therefore, no single cohort study, no matter how large it is, will have adequate statistical power to detect gene-environment interaction for the numerous gene variants, especially for rarer health outcomes. Appropriate pooled analyses will increase the chance of finding true associations of relevance to public health. Thus, th e full potential of cohort studies to shed light on the occurrence, etiology, and natural history of complex diseases will likely only be realized by pooling and synthesis across multiple populations with different genetic, environmental and sociocultural factors. In order to allow such integration, it is a relatively urgent matter to begin a careful worldwide collaborative effort that addresses issues of harmonization and standardization across studies because t hese studies currently employ a variety of designs, including population-based and family-based subject selection, or multi-generation linkage. The studies also focus on different age groups and have variable sample size. Integrating data across these studies will require developing approaches for facilitating pooled analyses and synthesis. We are actually seeing the beginning of such a global movement across international boundaries with the establishment of P3G by Dr Bartha Knoppers and her colleagues (Public Population Project in Genomics, 2004), 11 which includes so far three international studies from Europe and North America. An extension of this collaboration will undoubtedly lead to a much quicker building of our knowledge base on the effects of genes and environments on human health than one individual epidemiologic study in one country can yield.

The need for systematic integration of all human genome epidemiology studies

In order to rapidly build our knowledge base on human genes and health, we need to be able to conduct epidemiologic studies in various populations and synthesize their results. This epidemiologic knowledge base should also be guided the myriad of biologic data on gene function, interactions and expression. Epidemiologic studies can be cohort, case-control, and cross-sectional in nature. Strengths and limitations of each epidemiologic study design are well known. (12) Cohort studies are often erroneously perceived as inherently superior to case-control studies. Given the large difference in costs and times needed to conduct cohort studies, every effort should be made to conduct valid case-control studies that are based on valid population sampling scheme of newly diagnosed cases in well defined communities and appropriately selected controls. Well-designed population-based incident case-control studies can even be nested in a larger population cohort or population under surveillance. A planned epidemiologic approach can support simultaneous gene discovery and population-based inference of risks. (2) For example, case-control studies of population-based incident disease cases and their families provide a platform for conducting family-based linkage and association studies to discover new genes, and permit inferences regarding the contribution of these genes to the burden of disease in the underlying population. (13) The National Cancer Institute (NCI) sponsors Cooperative Family Registries for Breast and Colorectal Cancer Research that reflect this philosophy. (14-15)  Population-based case registries can support a number of study designs, including extended family studies, case-only (2),case-parent trios (13), and case-control-family design. (16) Another example of a population-based case-control study in the United States is the ongoing CDC-sponsored National Birth Defects Prevention Study. (17) This study is conducted in 10 states to assess the role of genetic and environmental factors in the occurrence of major structural birth defects. Cases are ascertained from state-based birth defects surveillance systems. Controls are randomly selected from birth certificates or hospital medical records. This is the largest ongoing population-based collaborative study in the US that covers a base population of almost half a million births a year. (17) As of May 2004, the study included more than 12,000 cases and 4,000 controls (P. Honein, CDC, personal communication).

Integration of epidemiologic evidence should be based on the best science and methodology of meta analysis and systematic reviews. Recent examples of successful global collaborations in conducting meta analyses of gene-disease association are the association of the CYP17 gene polymorphism with the risk of prostate cancer, (18) effect of CCR5-[DELTA]32 heterozygosity on the risk of perinatal HIV-1 infection, (19) and the association of Leiden mutation in Factor V gene with hypertension in pregnancy and pre-eclampsia. (20)

In an attempt to develop a systematic approach to the integration of epidemiologic data on human genes, in 1998, the CDC launched a global collaboration is the Human Genome Epidemiology Network, or HuGENet™ (21). This is an ongoing collaboration of individuals and organizations committed to the assessment of the impact of human genome variation on population health. Through collaboration, systematic reviews, training, and information dissemination, HuGENet™ continues to develop and apply systematic approaches to build the global knowledge base on population characteristics of genes and their associations with various diseases. An important activity of HuGENet™ is the development of guidelines, recommendations and methods for the appraisal and integration of epidemiologic data on the human genome along the continuum from genetic research to genetic testing (22-24). As of May 2004, the network has about 700 collaborators from 40 different countries. (21) Its website featured 26 reviews of specific gene-disease associations, 13 fact sheets, 45 e-journal club entries discussing the latest findings from single published studies, 4 case studies for training purposes. 21 In addition, since October of 2000, HuGE Net has been continuously abstracting epidemiologic articles on human genes from the literature in an online searchable database, by gene, by health outcomes and interacting risk factors. As of May 3, the database has 10,964 articles referencing 1,387 genes, 460 factors (personal or environmental), and 1,609 health outcomes and diseases. (21) During the past 3 years, we saw an increasing number of published papers on the human genome, most of which (86%) were on gene-disease associations but an increasing numbers on gene-gene and gene-environment interactions. 2 Because of the methodologic differences between the studies and the tendency for publication bias, what is now needed more than ever is an ongoing serious systematic evaluation and reviews of the existing published and unpublished literature on genes and diseases. Systematic methods for conducting meta analysis have been developed as many collaborative efforts such as the Cochrane collaboration, (25) and will be developed further as part of the HuGENet™ movement. While no one disputes the importance of one cohort study in one country, there is no way around the painstaking process of systematic evaluation and synthesis of many types of epidemiologic studies across many countries to quickly build our knowledge base and raise new hypotheses for further research.

The need for evidence-based processes that use epidemiologic information

Epidemiologic information obtained from the synthesis of cohort and case-control studies along with biologic data needs to be used in an evidence-based process that assesses the value of genomic information for health care and disease prevention. Ideally, epidemiologic studies should provide a starting point for the conduct of controlled clinical trials to assess effectiveness of different interventions. As of May 2004, we have more than 1000 genetic tests that are either on the market or in various stages of research development. (26) While most of these tests are for the diagnosis of rare genetic conditions, (27) in the next decades, we will see the emergence of new tests that will be used for predicting the risk of common diseases in otherwise healthy people in order to guide decisions about preventive interventions or therapies. (28)

In spite of the cautionary notes about the value of genetic testing for predicting disease in the future and targeting interventions, currently, a number of companies in the United States and the United Kingdom are offering testing for multiple genetic polymorphisms as part of genomic profiling for susceptibility to various conditions including obesity, cardiovascular disease and susceptibility infectious diseases and autoimmunity. (28) Although testing for common genetic polymorphisms is currently not ready for clinical practice, (29) we need to anticipate that future use of such testing should be based on scientific evaluation of the test’s validity and utility. The framework for this evaluation has been established by two committees and includes the test’s analytic validity, clinical validity, clinical utility, and ethical, legal and social issues (ACCE). (30,31) Implementation of this framework has been done as part of the CDC sponsored ACCE project. (32)

The epidemiologic approach is especially relevant to pharmacogenomics, an emerging field that promises customized treatment or chemoprevention on the basis of genetic variation. (33) With the Food and Drug Administration (34) 2003 approval of the first DNA-based test for measuring variants in Factors V and II to provide management to persons at increased risk of deep vein thrombosis, we expect of the pace of pharmacogenomics development to accelerate. Epidemiologic parameters are important in clinical trials and cost-effectiveness analyses to determine the value-added of pharmacogenomic testing.

For example, an epidemiologic interaction between factor V Leiden and oral contraceptive use was found in a case-control study of risk factors for venous thrombosis. (35) While oral contraceptive use alone increases the risk of venous thrombosis about 4-fold and factor V Leiden alone about 7-fold, their joint effect was more than 30-fold increase. In spite of the high relative risks for such an interaction, the absolute risk was relatively low (about 28 per 10,000 person-years) among women with factor V Leiden and oral contraceptive use, because the incidence of venous thrombosis is relatively low in the population. The question of whether it is beneficial to screen women for factor V Leiden before prescribing oral contraceptives remains controversial. Venous thrombosis is relatively rare, and mortality from venous thrombosis is low in young women. (36) More than half a million women would need to be screened for factor V Leiden, and tens of thousands of women would be denied oral contraceptive use, to prevent a single death. In addition to medical and financial considerations, there are issues related to quality of life, and risk of morbidity and mortality from unwanted pregnancy. For healthy women contemplating oral contraceptive use, the risk-benefit equation would not currently favor screening. For asymptomatic women with family histories of multiple thromboses, there are no evidence-based guidelines, and decisions will have to be reached individually, without reliance on population-based recommendations.(37)

This example illustrates how it is essential that epidemiologic data continue to be collected and analysed to inform clinical trials and ultimately decision-making for health practice in the population. However, the current incompleteness of epidemiologic information on the relation of genes and diseases will surely impact on our ability to conduct an accurate evidence process for the evaluation of genetic tests for use in practice. This makes the case for global collaboration in integrating epidemiologic evidence even more urgent. While single cohort studies are been conducted around the world, we can begin to synthesize the incomplete epidemiologic knowledge base for use in policy and practice. Networks of collaborators can use established guidelines from groups such as the US Preventive Services Task Force (38) to conduct systematic reviews of the literature and use these reviews in interim practice guidelines on genetic tests. These reviews will also uncover gaps in our epidemiologic knowledge base that can be further filled by new research from ongoing studies.


Concluding remarks

We need to have a serious deliberation of the merits of a US based cohort study of genes and environment. Nevertheless, at best, one very large study in one country will be very costly, will take years to plan and implement, and per se, may not provide us with all the answers to understand the role of genes and environments in disease occurrence and to use this information for population health benefits. It is high time that we begin a serious discussion about the development of a global public health genomics initiative that builds on the currently fragmented efforts of population genomics research around the world. In particular, we need to begin to build a robust process that allows data from many cohort studies (biobanks) to be integrated and synthesized through common standardized platforms for data collection and joint analyses. Also, we need to develop and integrate data obtained from all valid epidemiologic study designs notably population-based incident case-control studies. Systematic reviews and synthesis of epidemiologic data for policy and guideline development take time and resources and need to be encouraged, valued and allocated sufficient resources. This global initiative should build on and consolidate existing collaborations such as P3G (11) and HuGENet™ (18) discussed earlier but will require additional thoughtful discussions and resources.

One way to implement this initiative is through the development of a network of collaborating centers worldwide supported by public-private partnerships. Centers will require input and expertise from involve from multiple disciplines (epidemiology, social science, medicine, medical genetics, molecular genetics, ethics, economics, etc…). These units could be based in academic medical or public health institutions, private organizations as well as governmental organizations worldwide. They could be part of larger centers for public health genetics (such as the Public Health Genetics Unit in Cambridge (39) ) which will translate results of clinical and epidemiologic work into policy recommendations, workforce development and clinical and public health practice. Some suggested activities of these human genome epidemiology collaborating centers are listed in Table 1.

While the discussion around the suggested US cohort study will occur over the next few years, we should pursue the development of a global partnership with collaborators from government, academic, community, professional, global and private organizations from around the world. We should initiate a dialogue through a series of international meetings around the development of a global public health genetics initiative. These two activities should be viewed as complementary and not competitive efforts. The time is right to take bold initial steps to launch a public health oriented genomics initiative that will take us a long way towards translating human genome discoveries into population health benefits for citizens of the 21st century.


Table 1: Suggested activities of collaborating human genome epidemiology centers

  1. Conduct primary population genomics research, singly and in collaboration. The strongest emphasis is on population characterization of genes in relation to the burden of disease (prevalence, gene-disease associations and interactions)
  2. Conduct systematic evaluations of published and unpublished literature using standardized analytic approaches
  3. Disseminate results of systematic evaluation using peer-reviewed literature, and through online information systems
  4. Conduct training and develop workshops to train professionals to conduct human genome epidemiology research and evaluation of genomic applications in population health
  5. Develop succinct summaries to “translate” results of primary research, synthesis and evaluation into information that could be usable by providers and the general public
  6. Collaborate with other units to conduct primary research and systematic evaluations of specific topics
  7. Attend periodic meetings with other centers to share primary research, systematic evaluation, teach or acquire new skills, and engage in networking and collaborations


References

  1. Collins FS. The case for a US prospective cohort study of genes and environment. Nature 2004;429:475-477.
  2. Khoury MJ, Millikan R, Little J, Gwinn M. The emergence of epidemiology in the genomics age. Int J Epidemiol 2004 (in press)
  3. National Heart, Lung, and Blood Institute. The Framingham Heart Study: 50 years of research success. Website accessed May, 2004, http://www.nhlbi.nih.gov/about/framingham/
  4. The ARIC investigators.The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. Am J Epidemiol. 1989 Apr;129(4):687-702.
  5. Riboli E, Hunt KJ, Slimani N, et al. European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection. Public Health Nutr. 2002; 5: 1113-24
  6. Wright AF, Carothers AD, Campbell H. Gene-environment interactions--the BioBank UK study. Pharmacogenomics J. 2002;2:75-82 .
  7. CARTaGENE project. Website accessed May, 2004 at http://www.cartagene.qc.ca/en/index.htm
  8. Hakonarson H, Gulcher JR, Stefansson K. deCODE genetics, Inc. Pharmacogenomics. 2003 Mar;4(2):209-15.
  9. Estonian Genome Project. Website accessed May, 2004 http://www.geenivaramu.ee/index.php?show=main&lang=eng
  10. GenomEUtwin project. Website accessed May, 2004 at http://www.genomeutwin.org/
  11. Public Population Project in Genomics. Website accessed May, 2004 at http://www.p3gconsortium.org/index.cfm
  12. Khoury MJ, Beaty TH, Cohen BH. Fundamentals of genetic epidemiology. Oxford University Press, New York , 1993.
  13. Thomas DC. Statistical issues in the design and analysis of gene-disease association studies. In Khoury MJ, Little J, Burke W (eds). Human genome epidemiology: a scientific foundation for using genetic information to improve health and prevent disease. Oxford University Press, New York, 2004: 92-110.
  14. Peel DJ, Ziogas A, Fox EA et al. Characterization of hereditary nonpolyposis colorectal cancer families from a population-based series of cases. JNCI 2000;92:1517-22.
  15. Daly MB, Offit K, Li F, et al. Participation in the cooperative family registry for breast and ovaian cancer studies: issues of informed consent. JNCI 2000;92:452-6.
  16. Hopper JL. Commentary: Case-control-family designs: a paradigm for future epidemiology research? Int J Epidemiol. 2003 Feb;32(1):48-50
  17. Yoon PW, Rasmussen SA, Lynberg MC, et al. The National Birth Defects Prevention Study. Public Health Rep. 2001;116 Suppl 1:32-40.
  18. Ntais C, Polycarpou A, Ioannidis JPA. Association of the CYP17 Gene Polymorphism with the Risk of Prostate Cancer. A Meta-Analysis. Cancer Epidemiol Biomarkers Prev   2003;12:120-6.  
  19. Contopoulos-Ioannidis D.G; O'Brien TR, Goedert, JJ et al. Effect of CCR5-[DELTA]32 Heterozygosity on the Risk of Perinatal HIV-1 Infection: A Meta-Analysis. JAIDS 2003;32:70-76.
  20. Kosmas, Ioannis P a,b; Tatsioni, Athina a; Ioannidis, John PA. Association of Leiden mutation in Factor V gene with hypertension in pregnancy and pre-eclampsia: a meta-analysis. Journal of Hypertension. 2003;21:1221-1228.
  21. Centers for Disease Control and Prevention. The Human Genome Epidemiology Network-HuGE Net TM Website accessed May, 2004 at http://www.cdc.gov/genomics/hugenet/default.htm
  22. Khoury MJ. Epidemiology and the continuum from genetic research to genetic testing. Am J Epidemiol. 2002;156:297-9.
  23. Little J, Bradley L, Bray MS., et al. Reporting, appraising, and integrating data on genotype prevalence and gene-disease associations. Am J Epidemiol. 2002;156:300-10.
  24. Burke W, Atkins D, Gwinn M, et al. Genetic test evaluation: information needs of clinicians, policy makers, and the public. Am J Epidemiol. 2002;156:311-8.
  25. The Cochrane collaboration. Website accessed May, 2004 at http://www.cochrane.org/index0.htm
  26. GeneTests website accessed May, 2004 at http://www.genetests.org/
  27. Yoon PW, Chen B, Faucett A, et al. Public health impact of genetic tests at the end of the 20th century. Genet Med. 2001;3:405-10.
  28. Khoury MJ. Genetics and genomics in practice: the continuum from genetic disease to genetic information in health and disease. Genet Med. 2003;5:261-8.
  29. Haga SB, Khoury MJ, Burke W. Genomic profiling to promote a healthy lifestyle: not ready for prime time. Nat Genet. 2003 ;34(4):347-50.
  30. NIH-DOE Task force on genetic testing: Promoting safe and effective genetic testing in the United States. Final report 1997 http://www.genome.gov/10001733
  31. Secretary’s Advisory Committee on Genetic Testing: Enhancing the Oversight of Genetic Tests: Recommendations of the SACGT, 2000 http://www4.od.nih.gov/oba/sacgt/gtdocuments.html
  32. Haddow JE, Palomoaki GE. ACCE: a model for evaluating data on emerging genetic tests. In. Khoury MJ, Little J, Burke W. Human genome epidemiology: a scientific foundation for using genetic information to improve health and prevent disease. Oxford University Press, New York , 2004; 217-233.
  33. Veenstra D. The interface between epidemiology and pharmacogenomics. In Khoury MJ, Little J, Burke W (eds). Human genome epidemiology: a scientific foundation for using genetic information to improve health and prevent disease. Oxford University Press, New York, 2004:234-46.
  34. Food and Drug Administration (FDA). News release, December 17, 2003 http://www.fda.gov/bbs/topics/NEWS/2003/NEW00998.html
  35. Vanddenbroucke JP, Koster T, Briet E, Reitsma PH, Bertina RM, Rosendaal FR. Increased risk of venous thrombosis in oral-contraceptive users who are carriers of factor V Leiden mutation. Lancet 1994;344:1453-7.
  36. Sass AE, Neufeld EJ. Risk factors for thromboembolism in teens: when should I test? Curr Opin Pediatr. 2002;14:370-8.
  37. Khoury MJ, McCabe LL, McCabe ER. Population screening in the genomics age. New Engl J Med 2003;348:50-58.
  38. United States Preventive Services Task Force. Website accesses May 2004 at http://www.ahrq.gov/clinic/uspstfix.htm
  39. Public Health Genetics Unit, Cambridge, United Kingdom. Website accessed in May 2004 at http://www.phgu.org.uk/index.php
Last Updated September 17, 2004