Genomics|Info|Books|21st Century: Genetic Epidemiology

Book

This chapter was published with modifications in Modern Epidemiology (2nd ed). Rothman K and Greenland S (editors); 1997

Genetic Epidemiology

by Muin J. Khoury

Introduction

From Genes to Public Health

Traditional Epidemiologic Studies

Non-Traditional Epidemiologic Studies

Epidemiologic Approaches to Family Studies

Tables

References

Introduction

Genetic epidemiology is a relatively new discipline that seeks to elucidate the role of genetic factors and their interaction with environmental factors in the occurrence of disease in populations (Khoury et al., 1993). The term genetic epidemiology appeared in the literature only recently (Figure). The surge in the field of genetic epidemiology has been accompanied by the start in 1984 of a new journal, the publication of several books (Khoury et al., 1993, Morton, 1982); the explosion in molecular techniques; the increasing sophistication of statistical methods; and the emergence of the field of molecular epidemiology (Schulte and Perera, 1993).

In this chapter, I review the central role of genetic epidemiology along the continuum of genetic technology to public health applications. I then address how traditional epidemiologic approaches such as case-control studies can be used to assess the role of genetic factors in disease causation. I also review how nontraditional methods such as case-only designs, case-parental control studies, and affected relative-pair studies can be used in the assessment of gene-environment interaction. Finally, I review epidemiologic approaches to family studies, with a special focus on linkage analysis.

From Genes to Public Health

A. Genetic technology and the Human Genome Project

Advances in molecular genetic technology have led to the Human Genome Project, a long-term initiative to map and sequence the human genome (Hoffman, 1994). In the next decade, most if not all of the estimated 50,000-100,000 human genes will be mapped and sequenced. So far, only a small fraction of these genes has been identified. Nevertheless, the number of detected human genes has more than quadrupled over the last 25 years (McKusick and Amberger, 1994). Although most identified human genes are associated with uncommon disorders, genes play important roles in the etiology and pathogenesis of most, if not all, human diseases. Genetic risk factors interact with the environment (broadly defined to include social, physical, chemical, biologic, and infectious agents) in causing various human diseases.

B. Genetic epidemiology and Core Public Health Functions

In its evaluation of the future of public health in the United States, the Institute of Medicine (1988) defined the core functions of public health: 1) assessment, 2) policy development, 3) and assurance. As shown in Table 1, the role of genetic epidemiology can be mostly viewed in carrying out the assessment core function of genetics in public health. The important core functions of policy development and assurance are discussed elsewhere (Khoury et al., in press).

Molecular technology is being used in family studies to identify disease genes. These studies are mostly based on high-risk families with multiply affected individuals. They rely on the use of genetic analysis methods such as linkage and segregation analyses. A notable example is the intense search for breast cancer genes in high-risk families. Using linkage analysis in families with multiply affected members (King et al., 1993) and more recently direct sequencing of the gene (Futreal et al., 1994; Miki et al., 1994), investigators have identified a gene on chromosome 17 (BRCA1). Women who inherit the BRCA1 mutation(s) have a more than 90% lifetime risk of developing either breast or ovarian cancer (Easton et al., 1995). Nevertheless, the contribution of the apparently numerous BRCA1 mutation(s) to the overall risk for breast and ovarian cancer in various populations is unclear.

Using population and family-based studies, genetic epidemiologists assess the impact of genes in the population on the occurrence of human diseases. The scope and examples of such studies are shown in Table 2 and further elaborated in Sections II through V.

Briefly, in population-based epidemiologic studies, researchers assess the prevalence of disease susceptibility genotypes in a particular population. For example, the frequency of different BRCA1 mutations associated with breast cancer risk is being evaluated in various populations and ethnic groups. Also genetic epidemiology can be used to assess risk factors for germ and somatic cell mutations. A notable example is the numerous epidemiologic studies of risk factors for trisomy 21 (Down syndrome), the most common form of aneuploidy present at birth (Khoury et al., 1993). Finally, an important function of genetic epidemiology is the conduct of association studies that examine the relation between alleles at different loci and disease risks in populations (Sections III and IV).

In family-based epidemiologic studies (section V), researchers assess the presence of familial aggregation of the disease, evaluate whether such aggregation is caused by the presence of environmental or genetic risk factors, and attempt to evaluate various modes of inheritance using segregation and linkage analyses. Details of such genetic analysis techniques are outside the scope of this chapter.

Traditional Epidemiologic Studies

Although cohort, cross-sectional, and case-control studies can all be used to assess genetic factors in disease, the case-control approach is particularly suited to genetic epidemiology. There are several reasons: 1) unlike biological markers of exposures (e.g., occupational, nutritional), genetic markers are stable indicators of host susceptibility; 2) case-control studies can provide an opportunity to do a "fishing expedition" for the effects of several genes, along with other risk factors and to look for gene-environment interaction; 3) case-control studies are suitable for many uncommon disease endpoints such as birth defects and specific cancers.

Studies that assess unmeasured genetic factors

In this section, we consider inbreeding and racial/ethnic admixture studies.

Inbreeding studies: The overall impact of inbreeding is to increase the likelihood of homozygosity at every autosomal locus and thus increase the frequency of deleterious recessive genetic traits in the population. Inbreeding is expected to increase the frequency of autosomal recessive disorders (such as cystic fibrosis and phenylketonuria). For disorders with unknown etiology, inbreeding studies can be used to evaluate a recessive genetic component for the disease even if such a component cannot be directly measured. Although the prevalence of inbreeding has been declining worldwide, some populations continue to have a relatively high prevalence (20%-50%) of consanguineous marriages and are thus still suitable for evaluating inbreeding effects (Khlat and Khoury, 1991).

In inbreeding studies, the "exposure" variable of interest is the inbreeding coefficient of the individual. The inbreeding coefficient is the probability that an individual carries at each autosomal locus two alleles that are identical by descent. For common patterns, inbreeding coefficients are known (e.g. 1/16 for offspring of first cousin marriages, 1/32 for offspring of first cousins once removed marriages, and 1/64 for offspring of second cousin marriages). More generally, inbreeding coefficients can be calculated from extended pedigrees by using path methods or iterative computation. In designing inbreeding case-control studies, researchers must ensure they have appropriate control groups, as inbreeding is associated with a host of demographic, cultural, religious, and geographic factors that could also be related to the disease of interest (and thus are potential confounders). While matching for such demographic variables may not be always necessary, appropriate stratification and adjustment is always warranted in analyzing inbreeding effects in case-control studies.

Inbreeding studies have been generally conducted on nonspecific outcomes such as mortality (Khoury et al., 1987) or morbidity (Freire-Maia and Elisbao, 1984). Such studies have found modest inbreeding effects on these health indicators. Because of the nonspecificity of such outcomes, inbreeding studies are best suited to assess specific disease entities.

Admixture studies: When the incidence of a disease varies in different racial/ethnic groups, admixture studies provide a useful epidemiologic tool with which to evaluate the relative importance of genetic factors. If two populations differ in the frequency of certain genetic traits, intergroup mating will lead to increased likelihood of heterozygosity in the offspring and, as a result, either disruption of previously adapted genotypes or positive effects. Admixture studies can be useful in pointing to important genetic determinants. For example, it has been proposed that genetic admixture may have played a role in the increased incidence of insulin-dependent diabetes mellitus (IDDM) among U.S. blacks compared with African blacks. To test this hypothesis, Reitnauer et al. (1982) conducted a case-control study of IDDM in which they estimated the degree of genetic admixture using nine polymorphic genetic markers (e.g., blood groups, serum proteins). They estimated that blacks with IDDM had a higher level of white genetic admixture than blacks without IDDM (21.4 percent vs. 17.9%). These results suggest that admixture with whites may contribute, in part, to the increase in IDDM rates seen among U.S. blacks. It should be noted that as in any case-control study, control subjects in admixture case-control studies should be chosen to reflect the population from which case subjects are derived. Admixture studies are also subject to the potential effect of confounding variables. This effect can arise if differing degrees of admixture are associated with different socioeconomic, cultural, or exposure factors that are also associated with the disease.

The extent of racial admixture (i.e., the "exposure" variable in epidemiologic studies) has been measured in a number of ways, including the use of self-reported ethnicity or place of origin of ancestors, surnames, and genetic markers. The increasing number of DNA markers available can improve estimates of racial admixture and may increase the popularity of these types of studies in epidemiology (Moy et al., 1989).

Studies that assess measured genetic factors

Strategies

Two types of genetic markers can be used in epidemiologic studies: markers based on gene products such as specific blood groups, HLA antigens, serum proteins and enzyme systems, and markers based on direct analysis of the DNA (Khoury et al., 1993). The evaluation of the role of genetic factors in disease etiology is generally guided by a "candidate gene" approach, whereby genetic variation is examined in loci known or suspected of playing some role in the pathogenesis of the disease.

For example, in a case-control study of cleft lip and palate, Ardinger et al. (1989) examined differences in DNA markers at several candidate genes selected because of their suggested role in palate formation in rodents. By comparing 80 nonsyndromic case subjects with cleft lip/palate and 102 control subjects, the authors found an association between clefting and genetic variation in the transforming growth factor alpha gene. Although this association is modest in magnitude (odds ratios of 2-3), it has now been reported in several populations, suggesting that this gene plays a role in the etiology of oral clefts in humans.

Another example is the evaluation of the association between Alzheimer's disease (AD) and the Apolipoprotein E (Apo-E) E4 allele. Mounting evidence suggests that the Apo-E E4 allele is strongly associated with both late-onset familial AD and the more common sporadic AD (Corder et al., 1993; Saunders et al., 1993; Tsai et al., 1994). Among families at high risk for late-onset AD, disease risk has been shown to increase with the number of E4 alleles; 47% of people heterozygous for the E4 allele and 91% of people homozygous for the E4 allele were shown to be affected (Corder et al., 1993). The risk ratios for heterozygotes and homozygotes were 2.8 and 8.1, respectively.

Methodologic issues

Confounding: A crucial consideration in genetic studies is the choice of an appropriate comparison group. The use of convenient comparison groups may lead to spurious findings as a result of confounding caused by unmeasured genetic and environmental factors. Race or ethnicity can be an important confounder in such studies. One example is the reported association between the genetic marker Gm3;5;13;14 and non-insulin dependent diabetes mellitus among the Pima Indians (Knowler et al., 1988). In this cross-sectional study, individuals with the genetic marker had a higher prevalence of diabetes than those without the marker (29% vs.8%). This marker, however, is an index of white admixture. When the analysis was stratified by degree of admixture, the association all but disappeared.

Genotypic misclassification: Indirect methods are often used to classify individuals' genotypes. For example, Cartwright et al. (1982) used dapsone loading followed by urinary measurements of different metabolites in a case-control study of bladder cancer. This method was used to classify subjects as slow or fast acetylators. Such indirect measures can lead to misclassification of the underlying genotypes. This misclassification is often nondifferential and thus often leads to a dilution of the relative risk toward unity. Occasionally, such misclassification may be differential if the measurement method is affected by disease status itself.

When a genotype is measured at the DNA level, misclassification can also be caused by linkage disequilibrium. Ideally, if the gene of interest has been sequenced, the presence of one or more mutations within the gene could be correlated with an altered gene product and case-control status. Many markers, however, reflect DNA variation in the general region of the gene. Investigators thus measure these markers instead of the disease susceptibility mutation itself. Marker alleles could be in linkage disequilibrium with disease alleles if the mutation has risen relatively recently or if there is selective advantages of specific haplotypes. After several generations, genetic recombinations lead to complete independence between a marker allele and a disease allele in the same region. In the meantime, under linkage disequilibrium, a marker allele and disease allele occur more often together. Thus, the use of a marker allele as a proxy for the disease allele in a case-control study presumably leads to nondifferential misclassification and a dilution of the odds ratio toward unity. The finding of an association of a certain magnitude between a DNA marker and disease may thus reflect an important etiologic role of the gene locus of interest but not of the marker itself.

Gene-environment interaction: A central theme of genetic epidemiology is that human disease is caused by interactions between genetic and environmental factors. Thus in the design and analysis of epidemiologic studies, such interaction needs to be explicitly considered. Examining the marginal association between a genotype and disease may mask the effect of biologic interaction between the genotype and other risk factors (Khoury et al., 1993).

To assess gene-environment interaction, researchers could display data in a 2-by-4 table (Table 3). For simplicity, we assume that an exposure is classified as being either present or absent, and that the underlying susceptibility genotype is also classified as present or absent. This genotype could reflect the presence of one or two alleles at one locus or a combination of alleles at multiple loci. Using unexposed subjects with no susceptibility genotype as the reference group, one can compute odds ratios for all other groups.

For example, in a recent case-control study, Hwang et al. (1995) assessed the effects of the interaction between maternal cigarette smoking and a transforming growth factor alpha (TGFA) polymorphism on the risk for oral clefts. They showed evidence of interaction between maternal smoking and the presence of TaqI polymorphism at this locus for cleft palate only. The results of crude analyses are shown in Table 4. The authors had grouped persons with 1 or 2 TaqI polymorphisms as + genotype. As can be seen, the odds ratios for the exposure alone or the genotype alone are close to unity, whereas the combined odds ratio for smoking and the genotype is 5.5 (95% C.I. 2.1-14.6), suggesting interaction between the exposure and the genotype.

From a study size perspective, case-control studies could be easily used to evaluate gene-environment interaction, and they are particularly useful for common exposures and genotypes. Sample size estimations do depend on the underlying model of interaction between the genotype and the exposure. In specific types of biologic interactions (e.g., when the exposure and the genotype alone do not increase disease risks per se), the sample sizes needed to assess the marginal effects of the exposure (in a 2-by-2 table) will be more than adequate to test for interaction between the exposure and the interacting genotype (Khoury et al., 1995).

Finally, in epidemiologic studies of genotype-disease associations involving genetic and nongenetic risk factors, small p values often occur solely due to chance. These errors become increasingly important in case-control studies involving multiple markers at multiple loci. As in other areas of epidemiology, disentangling spurious from causal associations depends on the consistency of the association across studies and on the presence of a biologically meaningful model underlying such associations. To reduce the impact of random errors, empirical Bayes-methods may be used (Greenland and Robins, 1991; Greenland and Poole, 1994).

Non-Traditional Epidemiologic Studies

Several nontraditional approaches have recently emerged in studies of genetic factors in disease. These approaches involve the use of an internal control group rather than an external one. Here, we will review 1) the case-only study, 2) the case-parental control study, and 3) the affected relative-pair study. Table 5 summarizes the features of these studies, including their assumptions, strengths and limitations.

Case-only studies

Increasingly, a case-series design has been promoted as an approach that can be used to evaluate gene-environment interaction in disease etiology (Piegorsch et al., 1994; Begg and Zhang, 1994). In this method, investigators use case subjects only to assess the magnitude of the association between the exposure of interest and the susceptibility genotype. The basic set up for analysis is a new 2-by-2 table (Table 3). Odds ratios and confidence intervals can be obtained by using standard crude analyses or logistic models after adjusting for other covariates. The odds ratio relating the exposure and the allele among case subjects only is a function of the odds ratios for the exposure alone, the genotype alone, and their joint effects in a standard case-control study.

As shown in Table 3, OR_ca = OR_ge/(OR_e.OR_g) * OR_co,

where OR_ca is the case-only odds ratio, OR_co is the odds ratio among control subjects relating the exposure and the susceptibility genotype. If the genotype and the exposures are independent in the source population from which cases arose, the expected value of OR_co becomes unity and the odds ratio obtained from a case-only study measures the departure from multiplicative joint effect of the genotype and the exposure. Under the null hypothesis of multiplicative effects, the OR_ca is expected to be unity; if the joint effect is more than multiplicative, OR_ca is expected to be more than 1, and if the joint effect is less than multiplicative (e.g., additive), OR_ca is expected to be less than 1.

This approach provides a simple tool with which to screen for gene-environment interaction in disease etiology. It can be used in the context of crude analysis of a 2-by-2 table or in the context of logistic models when other covariates need to be adjusted for. Investigators can also adjust for other potential confounding factors using logistic regression analysis.

To illustrate, let us apply the case-only analysis to the data of Hwang et al. on the association among oral clefts, maternal smoking and TGFA polymorphisms (Table 4). The case-only analysis shows a marked departure from a multiplicative relation between the genotype and the exposure. The OR_ca of 5.1 obtained from this analysis is comparable to OR₁₁/OR₁₀OR₀₁=6.1 that would be expected if the effects were multiplicative. Also, the assumption of independence between exposure and genotype among control subjects is reasonable.

There are several methodologic issues involved in applying the case-only approach. First, the choice of cases is still subject to the usual rules of case selection for any case-control study. For example, the use of population-based incident cases allow researchers to generalize their findings.

Second, researchers must assume independence between exposure and genotype in order to apply this method. This assumption may seem reasonable for a wide variety of genes and exposures. There are some genes, however, whose presence may lead to a higher or lower likelihood of the exposure on the basis of some biologic mechanisms. For example, genetic variations in alcohol and aldehyde dehydrogenases, the main enzymes involved in alcohol metabolism, are suspected risk factors for alcoholism and alcohol-related liver damage. However, individuals with delayed alcohol metabolism as a result of this genetic variation may have an increased flushing response after alcohol ingestion and thus be less likely to seek alcohol, possibly leading to a negative correlation between alcohol exposure and alcohol dehydrogenase polymorphisms in different populations (Sherman et al., 1994).

Third, the case-only approach does not allow the investigators to evaluate the independent effects of the exposure alone or the genotype alone, merely departure from multiplicative effects.

Fourth, as with a regular case-control study, associations may be due to linkage disequilibrium between the genetic marker and the true susceptibility allele(s) at a neighboring locus.

Finally, the measure obtained from this analysis can only be interpreted as a departure from a multiplicative relation whereas departure from additivity may be of greater interest. Nevertheless, many biologically plausible modes of gene-environment interaction do involve rather extreme positive departures from multiplicative effects (Khoury et al., 1995), which must also reflect even greater departure from additivity.

Case-parental control studies

In this approach, the parents of case subjects are used as a control group to look for genetic markers that could be associated with increased disease risk or be in linkage disequilibrium with alleles at a neighboring locus (Khoury et al., 1993; Spielman et al., 1993; Schaid and Sommer, 1993). The method requires the availability of genotypic information on the parents of case subjects. In its simplest form, the genotype of each case subject can be compared with the genotype of a fictitious control formed by the nontransmitted alleles from each parent. Because this is a matched analysis, one can construct a 2-by-2 table comparing case and control subjects with respect to the presence or absence of the allele (or genotype), as shown in Table 6. Odds ratios can be simply obtained, with the analysis following that of a matched pair design. This method can also be used to stratify case subjects according to the presence or absence of the pertinent interacting exposure, and odds ratios can be derived with or without the exposure (Table 6).

In spite of this method's simplicity, its main limitation is that the "control" group may or may not be representative of the underlying population at risk, especially given that certain parental genotypes may be associated with disease status that may interfere with reproduction. Flanders and Khoury have proposed a method of analysis for this type of study (in press). The method is noniterative and leads to a closed estimate of the risk ratio comparing risk among those with a specific genotype with the risk among those with a comparison genotype. Essentially, for each combination of parental genotypes, the observed distribution of the offspring (case) genotype is compared with the distribution expected on the basis of Mendelian transmission probabilities.

As with the case-only design, this approach does not allow one to assess the independent effect of the exposure, merely whether the effect of the genotype is different for persons with the exposure than for persons without the exposure. This effect variation is also measured on a multiplicative scale, as in the case-only design. Nevertheless, this approach is superior to the case-only design in that it permits one to assess the effect of the genotype (with and without the exposure), whereas the case-only approach does not.

Affected relative-pair studies

The third type of case-only design has been well known in the genetics literature as the affected sib-pair or affected relative-pair method (Risch, 1990; Knapp et al., 1994). The method allows investigators to look at the genotypic distribution among pairs of affected relatives. Most commonly applied to sibs, this method allows researchers to examine the number of alleles at any particular locus that are identical by descent between pairs of affected relatives. The set-up for the analysis for sibs is shown in Table 7. Under the assumption of no genetic linkage, the expected distribution of alleles shared by descent between two siblings is 25% for 0 allele, 50% for 1 allele, and 25% for 2 alleles. Departure from this distribution suggests linkage between the disease and the marker locus. This method has been used recently in the search for gene loci for Alzheimer's disease (Blossey et al., 1993), and non-insulin- dependent diabetes mellitus (Baroni et al., 1994).

To look for gene-environment interaction using this method, researchers can stratify the affected individuals by their exposure status. In contrast to the two other association methods, the affected relative-pair study is a preliminary test for linkage that can be used effectively as a fishing expedition for candidate gene loci that are associated with increased disease susceptibility. Because it requires the presence of two affected members in the family, however, its use may reduce the number of case subjects available for the study. It often requires testing of the parents to infer whether or not the alleles shared among case subjects are identical by descent. The affected relative-pair study does not assess the effects of specific alleles on disease susceptibility; rather, it assesses linkage at the locus. It also cannot assess independent effects of exposures. Furthermore, because the affected relative-pair study assumes Mendelian transmissions for expected distributions, any departure from independent segregation and random assortment could affect the results of this approach. Finally, selection factors, including survival, chronicity, and method of case ascertainment, may heavily affect the types of case subjects that could be available for this analysis.

Epidemiologic Approaches to Family Studies

Family studies are often considered the key to understanding genetic and environmental etiology of human diseases (Dorman et al., 1988). Although family studies have been outside the realm of traditional epidemiology, an increasing number of epidemiologic investigations are including a familial component (Phillips et al., 1991).

Familial aggregation in case-control studies

Although many case-control studies include an assessment of family history, analysis is typically done by collapsing family information into a yes/no variable and treating the variable as a risk factor. This casual approach, however, can lead to biased measurement of the extent of familial aggregation (Khoury and Flanders, in press). Family history is not a personal attribute of the subjects considered alone, as it involves several factors including family size, the age distribution of relatives and their genetic relationships to the subjects, and the risk factor characteristics of each relative.

Instead of treating family history as an "exposure" in case-control studies, a better approach is to transform the case-control format into a cohort format. Here, reconstructed cohorts of relatives of case and control subjects are assessed for the presence and absence of disease. Each relative could be assessed for the presence or absence of disease as well as for the presence or absence of disease risk factors (Table 8). Nevertheless, for late-onset diseases, a simple calculation of the frequency of disease among relatives will not be adequate. Ideally, estimates of lifetime risks (or risks up to the time of the study) should be computed by using life-table analysis or other survival analysis methods.

In these studies, the "exposure" variable of interest is relationship to case and control subjects. This exposure variable can be stratified according to genetic distance (e.g., siblings, offspring, first cousins). For example, the risk for disease among first-degree relatives of case subjects can be compared with the risk for disease among first-degree relatives of control subjects, and measures of association such as risk ratios can be derived. Risk ratios can be adjusted for the presence of other known disease risk factors by using modified logistic regression models that can account for the lack of independence among relatives (Khoury and Beaty, 1994). In addition, risk modeling among relatives of case and control subjects could also be incorporated into segregation analysis to fit various modes of inheritance to family data (Whitemore and Gong, 1994). The discussion of segregation analysis is outside the scope of this review, however.

Thus, the incorporation of a familial component into traditional case-control studies can document the presence of familial aggregation and assess the extent to which such aggregation can be explained by risk factors for the disease that could be shared among relatives.

Other approaches include the use of special clusters of relatives to assess, by design, the role of genetic and environmental factors. One such design is the twin study. The premise behind twin studies is that, because monozygotic twins have 100% of their genes in common while dizygotic twins have 50% of their genes in common, an excess disease concordance among MZ twins may reflect a greater role of genetic factors. Such inferences are tentative, however, given the possible confounding by shared environmental factors (intrauterine and postnatal) and selection factors, as well as the difficulty of conducting such studies. A more detailed review of the twin design and other study methodologies is available elsewhere (Khoury et al., 1993).

Linkage analysis in an epidemiologic study design

With the increasing availability of genetic markers on different chromosomes, linkage analysis and gene mapping techniques are increasingly used to evaluate whether a particular disease or trait cosegregates with a specific marker in high-risk families. Such studies can suggest that a disease susceptibility gene is located on a particular chromosome.

An alternative to formal linkage analysis is the affected sib pair method discussed in Section IV. This method can be formally extended to a family-based epidemiologic study. In such a study, probands (affected individuals) could come from registries of population-based incident cases (such as well- defined population registries of cancers and birth defects). The next step is to conduct cohort or nested case-control studies within families of probands.

In a cohort design, relatives of the proband (e.g., siblings) are followed for disease development. Disease risks are estimated according to the number of alleles identical by descent with the proband at a number of loci. Under the null hypothesis of no linkage between the disease and the locus, disease risks should be identical among the three groups (as shown in Table 9). Because of the age-dependence of many disease, cumulative risks should be computed rather than simple disease proportions. This approach has been used to assess the risk for insulin-dependent diabetes mellitus among siblings of affected probands according to the number of alleles identical by descent at the HLA region (Lipton et al., 1992). Lipton et al. found that the cumulative risk ratio was 1.5 for siblings sharing one allele by descent with the proband and 2.8 for those sharing 2 alleles by descent with the proband, suggesting genetic linkage between diabetes and a gene in the HLA region. This cohort design can evaluate for linkage at multiple genetic loci, incorporate other risk factors for disease, and assess for evidence of gene-environment interaction using the indicator variable (0,1,2). Alternatively, a nested case-control approach could be used. Instead of assessing all relatives, researchers can evaluate a sample of relatives. This approach could be cost-effective especially when the incidence of disease is low. The usual odds ratio analysis can be done as shown in Table 9, using the linkage indicator variable for each locus tested. With this approach, researchers can extend the affected sib-pair method to include unaffected relatives as well. This provides an opportunity to evaluate multiple risk factors for the disease within the context of a family study, as well as test for gene-environment interaction.

Tables

References

Ardinger, H.H., Buetow, K.W., Bell, G.I. et al. Association of genetic variation at the transforming growth factors alpha gene with cleft lip and palate. Am J Hum Genet 1989;45:348-353.
Baroni, M.G., Alcolado, J.C., Grgnoli, C. et al. Affected sib-pair analysis of the GLUT1 glucose transporter gene locus in non-insulin dependent diabetes mellitus (NIDDM): evidence for no linkage. Hum Genet 1994;93:675-680.
Begg, C.B., Zhang, Z.F. Statistical analysis of molecular epidemiology studies employing case-series. Cancer Epidemiol Biomarkers Prev 1994;3:173-175.
Blossey, H., Commenges, D., Olson, J.M. Linkage analysis of Alzheimer's disease with methods using relative pairs. Genet Epidemiol 1993;10:377-382.
Cartwright, R.A., Glashan, R.W., Rogers, H.J., et al. Role of N-acetyl transferase phenotypes in bladder carcinogenesis: a pharmacogenetic epidemiological approach to bladder cancer. Lancet 1982;2:842-846.
Corder, E.H., Saunders, A.M., Strittmatter, W.J., et al. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 1993;261:921-923.
Dorman, J.S., Trucco, M., LaPorte, R.E., et al. Family studies: the key to understanding the genetic and environmental etiology of chronic diseases? Genet Epidemiol 1988;5:303-310.
Easton, D.F., Ford, D, Bishop, D.T., et al. Breast and ovarian cancer incidence in BRCA1 mutation. Am J Hum Genet 1995;56:265-271.
Flanders, W.D., Khoury, M.J. Analysis of case-parental control studies (submitted).
Freire-Maia, N., Elisbao, T. Inbreeding effects on morbidity: a review of the world literature. Am J Med Genet 1984;18:391-400.
Futreal, P.A., Liu, Q, Shattuck-Eidens, D., et al. BRCA1 mutations in primary breast and ovarian carcinomas. Science 1994;5182:120-122.
Greenland, S., Robins, J.M. Empricial-Bayes adjustments for multiple comparisons are sometimes useful. Epidemiology 1991;2:244-251.
Greenland, S., Poole, C. Empricial Bayes and semi-Bayes approaches to occupational and environmental hazard surveillance. Arch Environ Health 1994;49:9-16.
Hoffman, E.P. The evolving genome project: current and future impact. Am J Hum Genet 1994;54:129-136.
Hwang, S.J., Beaty, T.H., Panny, S.R., et al. Association study of transforming growth factor alpha TaqI polymorphisms and oral clefts: indication of gene-environment interaction in a population-based sample of infants with birth defects. Am J Epidemiol 1995;141:629-36.
Institute of Medicine. The Future of Public Health. National Academy Press, Washington, DC, 1988.
Khlat, M., and Khoury, M.J. Inbreeding and diseases: demographic, genetic and epidemiologic perspectives. Epidemiol Rev 1991;13:28-41.
Khoury, M.J., Cohen, B.H., Chase, G.A., et al. An epidemiologic approach to the evaluation of inbreeding effects on prereproductive mortality. Am J Epidemiol 1987;125:251-262.
Khoury, M.J., Beaty, T.H., Cohen, B.H. Fundamentals of Genetic Epidemiology, Oxford Unievrsity Press, New York, 1993.
Khoury, M.J. Case-parental control method in the search for disease susceptibility genes. Am J Hum Genet 1994;55:414-415.
Khoury, M.J., Beaty, T.H. The case-control method in genetic epidemiology. Epidemiol Rev 1994;16:134-150.
Khoury, M.J., Beaty, T.H., Hwang, S.J. Detection of genotype-environment interaction in case-control studies of birth defects: how big a sample size? Teratology 1995;51:336-343.
Khoury, M.J. and the Genetics Working Group. From genes to public health: applications of genetics in disease prevention. Am J Publ Health (submitted).
Khoury, M.J., Flanders, W.D. Bias in using family history as a risk factor in case-control studies of disease. Epidemiology (in press).
King, M.C., Rowell, S,, Love, S.M. Inherited breast and ovarian cancer: what are the risks? what are the choices? JAMA 1993;269:1975-1980.
Knapp, M., Seuchter, S.A., Baur, M.P. Linkage analysis in nuclear families. 2. Relationship between affected sib-pair tests and lod score analysis. Hum Hered 1994;44:44-51.
Knowler, W.C., Williams, R.C., Pettit, D.J., et al. Gm3,5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am J Hum Genet 1988;43:520-526.
Lipton, R.B., Atchison, J., Dorman, J.S. et al. Genetic, immunologic, and metabolic determinants of risk for insulin-dependent diabetes mellitus in families. Diabet Md 1992;9:224-232.
McKusick, V.A., Amberger, J.S. The morbid anatomy of the human genome: chromosomal location of mutations causing disease. J Med Genet 1994;31:265-279.
Miki, Y., Swensen, J., Shattuck-Edens, D., et al. A strong candidate for the breast and ovarian cancer susceptibility gene. Science 1994;5182:66-71.
Morton, N.E. Outline of Genetic Epidemiology. S Karger, Basil, 1982.
Moy, C.S., LaPorte, R.E., Dorman, J.S., et al. Heritage research: the next generation of migrant studies. Am J Epidemiol 1989;130:819-820.
Phillips, P.H., Linet, M.S., Harris, E.L. Assessment of family history information in case-control cancer studies. Am J Epidemiol 1991;133:757-765.
Piegorsch, W.W., Weinberg, C.R., Taylor, J.A.. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Stat Med 1994;13:153-162.
Reitnauer, P.J., Go, R.C.P., Acton, R.T., et al. Evidence of genetic admixture as a determinant in the occurrence of insulin-dependent diabetes mellitis in U.S. blacks. Diabetes 1982;31:532-537.
Risch. N. Linkage strategies for genetically complex traits: II. The power of affected relative pairs. Am J Hum Genet 1990;46:229-241.
Saunders, A.M., Strittmatter, W.J., Schmechel, D., et al. Association of the apolipoprotein E allele E4 with late-onset familial and sporadic Alzheimer's disease. Neurology 1993;43:1467-1472.
Schaid, D.J., Sommer, S.S. Genotype relative risks: methods for design and analysis of candidate gene association studies. Am J Hum Genet 1993;53:1114-1126.
Schulte, P.A., Perera, F.P. (eds). Molecular Epidemiology: Principles and Practice, Academic Press, New York, 1993.
Sherman, D.I., Ward, R.J., Yoshida, A., et al. Alcohol and aldehyde dehydrogenase gene polymorphism and alcoholism. EXS 1994;71:291-300.
Spielman, R.S., McGinnis, R.E., Ewens, W.J. Transmission test for linkage disequilibrium: the insulin gene region and insulin dependent diabetes mellitus. Am J Hum Genet 1993;52:506-516.
Tsai, M.S., Tangalos, E.G., Petersen, R.C., et al. Apolipoprotein E: risk factor for Alzheimer disease. Am J Hum Genet 1994;54:643-649.
Whittemore, A.S., Gong, G. Segregation analysis of case-control data using generalized estimating equations. Biometrics 1994;50:1073-1087.

Address correspondence to Dr Khoury at
Office of Genomics and Disease Prevention
Centers for Disease Control and Prevention
4770 Buford Hwy, Mail Stop K28
Atlanta, Georgia 30341