Facing
the Challenge of Complex Genotypes and Gene-environment Interaction:
the basic epidemiologic units in case-control and case-only designs
Lorenzo D. Botto and Muin J. Khoury
tables | References
Introduction
In this chapter, we focus on fundamental units of
epidemiologic analysis of studies that relate health outcomes with complex
genotypes and gene-environment interaction. The goal is to offer a practical
perspective that researchers might find useful as they design, analyze,
and present their studies, with emphasis on case-control and case-only
designs. In the first part of the chapter, we focus on case-control
studies and their core information (1). In particular,
we illustrate ways in which such core information can be clearly presented
to provide the fundamental measures of effect and impact, including
the relative risks for the multiple factors under study (alone and jointly);
the interaction effects; the exposure frequencies; and the attributable
fractions.
In the second part of the chapter we discuss the
potential role of well-designed disease registries as adjuncts or antecedents
of case-control studies, and suggest that they might be particularly
useful in the study of complex genotypes and interaction. In particular,
we discuss the notion that a disease registry, approached through a
case-only perspective, might be scanned for complex genotypes ranked
by potential attributable fraction for the disease. Finally, we discuss
the advantages and challenges of these approaches and their possible
integration in studying the causation of common multi-factorial conditions.
We view the perspective presented in this chapter as complementary to
the discussion in other sections of the book, in which methodologic
aspects of the detection of joint effects and interaction are systematically
presented. The approaches discussed here, particularly those related
to the case-only analysis of disease registries, could enhance but not
replace other strategies for the study of complex genotypes and gene-environment
interaction
Investigating Interaction in
Epidemiology
Investigating genetic and gene-environment interaction in epidemiology
raises definitional, methodologic, and practical questions. The meaning,
measurement, and modeling of the effect of multiple factors, the biologic
significance of epidemiologic assessment of interaction, and the appropriateness
of specific study designs are but a few topics that continue to engender
considerable debate (2-4). We will note briefly only
two such issues for their relevance in this discussion of genetic factors
and interaction.
First, bias and confounding in case-control studies
(the type of study discussed here in some detail), though always a concern,
can likely be decreased more easily when assessing genetic factors compared,
for example, to environmental factors such as diet or lifestyle (4).
For example, genotype can in principle be measured more precisely and
objectively, compared for example to smoking or folic acid intake that
are commonly assessed based on a subject’s recall that may be
imprecise or biased by disease status. Thus, exposure misclassification,
both differential and non-differential, should decrease with a corresponding
improvement in the precision and validity of risk estimates. Also, the
stability of genotype over time is particularly valuable in case-control
studies in which the factors under study are measured months or years
after disease onset. Finally, genotypes for a given set of alleles are
likely to distribute randomly in the population (Mendelian randomization)
reducing the likelihood of spurious gene-environment or gene-gene associations
(at unlinked loci) (4). Genetic substructure in the
population remains a concern, but researchers have suggested strategies
that take such substructure into account, using for example a panel
of unrelated markers (5,6). These
considerations, combined the known statistical efficiency of case-control
studies, have revived the interest in case-control studies as powerful
tools for the study of the effect of genotype on disease risk (4)
and in part prompted our emphasis on such studies.
The second aspect of interaction that has discussed
extensively relates to which measures of effect are most informative
or useful. For example, in the case of two dichotomous factors one could
estimate the effect of each factor alone as well as the joint effect.
One could also estimate the departure of the joint effect from specific
models of interaction, (eg, additive or multiplicative). It can be useful
to note that the relation between individual and joint effects can take
different forms (7) which can depend on the biologic
mechanism underlying the interaction. However, it has been noted that
predicting the biologic mechanism from such epidemiologic data is difficult
and perhaps not productive (2).
With more than two factors under study, summary
measures of interaction and statistical models become more complicated,
and the ability to present the data and the primary measures of effect
acquires renewed value. The explosive growth of genetic technology and
the ever expanding catalogue of human genes (8,9)
is already leading to studies of increasing complexity. For example,
the risk for venous thrombosis is already being studied in relation
to variants of the Factor V, prothrombin, and 5,10 methylenetetrahydrofolate
reductase (MTHFR) genes, as well as to blood homocysteine levels and
oral contraceptive use (10-13). Similarly, the risk
for spina bifida is being studied in relation to variants of folate-related
genes (e.g., MTHFR, cystathione-beta-synthase, methionine synthase,
and methionine synthase reductase) and blood levels of selected vitamins
(folate, B12) (14-17). Even (and perhaps particularly)
in such complex settings, an appreciation of the basic analytic unit
of epidemiologic analysis should help researchers develop a consistent
starting point for data presentation and assessment.
Population-Based Case-Control Studies and the
2 X4 table
The simplest case of interaction is perhaps that of two dichotomous
factors (e.g., presence or absence of a genotype, use or non-use of
a pill). For illustration, we present data from case-control settings
in which we assume the ideal conditions of an unbiased, unconfounded,
population-based, incident-case study. We will further assume that the
study’s odds ratios are valid estimations of relative risks.
Data from such case-control study can be presented in a two-by-four
table (table 7-1). The same reference group is
used to compute three odds ratios (each factor alone and jointly). Such
odds ratios are the basic, direct measures of association.
Such presentation has several advantages (table 7-2).
The role of each factor is independently assessed both in terms of association
and of potential attributable fraction. In addition, the odds ratios
can be examined to assess their general relation (7)
and formally evaluated in terms of departure from specified models of
interaction (most commonly multiplicative or additive). The table also
provides the distribution of the exposures among controls, and helps
evaluate the dependence of factors in the underlying population (provided
the controls are representative of such population). Finally, a case-only
odds ratio can be easily derived and used as a comparison with findings
from case-only studies in the literature.
The two-by-four table approach to presenting genetic and gene-environment
interactions is appealing for several reasons.
- It is efficient: it summarizes, without loss of detail, seven two-by-two
tables, and generates a comprehensive set of effect estimates that
none of the latter, individually, can match.
- It highlights potential sample size issues: cell sizes are directly
presented and confidence intervals show their effect on statistical
power.
- It emphasizes effect estimation over model testing: the relative
risk estimates associated with the joint and individual exposures
are the primary elements of an interaction, whereas departures from
specific models of interactions are derived parameters and explicitly
labeled as such.
In summary, the table provides the simplest epidemiologic equivalent
of the general statement that all effects on human health are attributable
to the joint effect of genes and the environment. Indeed, it can be
argued that the two-by-four table (and not the two-by-two table) is
the fundamental unit of epidemiologic analysis.
A simple application of the two-by-four table
We illustrate the two-by-four table approach using data from a case-control
study of venous thromboembolism in relation to factor V Leiden and oral
contraceptive use (18). When the original data are
so rearranged (table 7-3), one can clearly appreciate
certain key aspects of the interaction:
- The marginal and joint effects. For example, the odds ratio associated
with Factor V Leiden and oral contraceptive use alone (6.9 and 3.7,
respectively) can be contrasted with that associated with the combined
exposure (34.7)
- The potential attributable fractions. Provided the associations
are causal, one can note the potential public health relevance of
the findings (the computation of attributable fractions for two or
more factors was developed by several authors and has been summarized
(19)). The relatively high frequency in the population
of the gene variant (2.4 percent among controls) and of the joint
exposure (1.2 percent) translates into considerable population attributable
fractions for thromboembolic disease (5.5 and 15.7 percent, respectively).
One can contrast to such presentation with a stratified analysis in
which the association between the oral contraceptive use and venous
thrombosis is assessed separately among those with and without the Factor
V Leiden polymorphism (table 7-4). The latter
approach does not provide immediately information on individual and
joint effects, and tends to emphasize departure from a specific (multiplicative)
model of interaction. The two-by-four table does not have such limitation
provides the data to test for other non-multiplicative models as well.
A further assessment of the data from the two-by-four table involves
the relation of the factors separately among cases and controls (table
7-4). Conceptually, one can split vertically the case-control study
into a case-only study and a control-only study and examine the respective
odds ratios. The case-only design in itself is an efficient and valid
approach to screening for interaction, provided that the fundamental
assumption of independence of exposure and genotype in the population
is justified (20, 21). The potential
role of such studies in the epidemiologic approach to complex diseases
has been reviewed (22,23) and will
be examined later in connection with the discussion of disease registries.
Also the association of risk factors among controls (control-only odds
ratio) can provide useful information, namely the dependencies of the
risk factors (genetic or environmental) in the underlying population.
Detecting such dependencies is important both as a clue for a biologic
relation between alleles at the loci under study and as a test of the
key assumption in the interpretation of case-only data.
Three Factors: The 2 X 8 table
The points underscored by the two-by-four table are even clearer for
three factors—three genes, three environmental factors, or a combination
of genetic and environmental factors. With three dichotomous factors,
the exposure combinations become 8 (23). Although
more complex, such a table still shows the primary epidemiologic parameters
(odds ratios and attributable fractions) associated with each factor
and combination of factors. Because all refer to the same reference
group, the relations between these measures are immediately evident;
if needed, one can also assess which model of interaction best fits
the data. Methodologic issues, such as sample size and exposure dependencies
among the controls, can also be assessed with relative ease. The contrast
with classic stratified analysis is even greater than in the case of
two factors. To present such stratified analysis, a minimum of four
tables is needed; because they have different reference groups, the
four odds ratios would not be directly comparable; and the overall interpretation
of the study is less immediately clear.
Increasing Complexity
The two-by-four or the two-by-eight table, though simple, may adequately
summarize some, but not all epidemiologic relations. Issues that come
into play in more complex situations include the following.
- The number of factors can increase. Even for dichotomous factors,
the number of exposure combinations grows quickly (2n for n factors)
and the corresponding table rapidly becomes unwieldy.
- The relation between exposure and outcome can be other than dichotomous.
For example, the relation can be graded or continuous (dose-response)
as occurs with smoking and lung cancer or with obesity and hypertension.
In the general case of n exposures each with its dose-response curve,
the response surface is best described as a general n-dimensional
manifold which may not be meaningfully summarized by few discrete
odds ratios.
- As more factors are involved, their interaction may not adequately
described by simple multiplicative or additive models.
These limitations highlight two issues that will increasingly confront
epidemiologists as they try to unravel the web of interaction in disease
causation. First, new or improved epidemiologic methods may be needed
to deal with such complex situations. For example, researchers have
suggested using a variety of regression models, including hierarchical
models, and neural networks, traditionally used in modeling the probability
of clinical outcomes (24,25), to
the study multiple factors and interaction (26-29).
So far, these approaches have limitations: the output of regression
models, for example, is model-dependent; neural networks, though in
general less dependent on prior model specification (26-28),
may be limited in their ability explicitly to estimate dependencies
among risk factors (26,27).
The second issue relates to sample size. As the number of factors under
study increases so do the strata that have to be defined within the
study. With a fixed total number of subjects, increasing the number
of factors quickly reduces per-stratum size and the associated statistical
power. Thus, negative findings should be carefully interpreted. Strategies
to deal with this issue include conducting well-designed collaborative
studies that increase sample size but also deal effectively with extraneous
genetic heterogeneity.
In conclusion, researchers are challenged to apply epidemiologic methods
to increasingly complex data on multiple factors and interaction. Carefully
conducted collaborative studies may provide adequate sample size. A
clear presentation and analysis of the core elements of these interactions
(the data distribution and the primary measures of association) may
increase the information that can be extracted from the data. In this
sense, the two-by-four table and its immediate extensions are fundamental,
simple, and useful tools to documenting and studying gene-environment
interaction.
Disease Registries and Case-Only
Designs
Population-based case-control studies are fundamental tools in etiologic
studies, particularly for their ability to provide key parameters of
the human genome epidemiology of many conditions (4, 30). The challenges
of case-control studies, particularly the recruitment of an adequate
set of control subjects, and the refinement of case-only approaches
suggests novel approaches in studying the role of complex genotypes
in disease etiology. The availability of well-designed disease registries
provides a practical setting for case-only studies of common conditions
such as certain cancers and birth defects. Such case-only approaches
cannot replace but rather enhance traditional case-control (or cohort)
studies, particularly in three key areas:
- Scanning for genotypes that potentially contribute the most to
disease in a population.
- Evaluating etiologic heterogeneity and genotype-phenotype correlations
among subsets of cases.
- Detecting supra-multiplicative effects of interacting alleles.
Scanning Genotypes By Potential
Contribution To Disease In The Population
Studying the role of complex genotypes, i.e., the interaction of multiple
alleles at multiple loci, presents numerous challenges, including the
large number of possible allele combinations. In theory, m alleles at
n loci can generate mncombinations (haplotypes): with 10 loci, two alleles
can generate in excess of 1,000 combinations, and three alleles nearly
60,000 combinations.
Given their potentially large number, which allele
combinations should one look at first? One approach is to focus first
on allele combinations that potentially contribute to the largest proportion
of disease in a population or, in epidemiologic terms, on those with
the highest potential population-attributable fraction. It is easy to
show that even though one cannot determine relative risks in case-only
studies, one can estimate the upper limit of a genotype’s attributable
fraction. Assuming causality, such potential maximum attributable fraction
is simply the frequency of the genotype among cases. This relation is
intuitively obvious, since if x percent of a random series of cases
has a particular exposure, then at most x percent can be caused by that
exposure. The formal relation Fc = AF*(OR /OR-1) derives
directly from Miettinen’s formula for attributable fraction (31).
Thus, attributable fraction (AF) is, at most, as
high as the fraction of cases with the exposure—in this case the
genotype—of interest (Fc) but never higher, regardless
of how high the odds ratio or relative risk. The equation also illustrates
the non-linear relation between odds ratio and attributable fraction,
implying that v{ariations in the upper range of odds ratios translate
into progressively smaller changes in attributable fraction (for variations
in the odds ratio between 10 and 1000, the fraction of exposed cases
differs from the attributable fraction by less than one part in 10).
One might argue that when the genotype frequency
in the population is unknown, little should be inferred from genotype
frequencies among cases. However, in the case of complex genotypes such
relation becomes interesting because, under the hypothesis of no effect,
few subjects are expected to have any given (complex) genotype, defined
as a certain combination of alleles at a number of loci. More precisely,
that number decreases multiplicatively with the number of loci considered
concurrently (Figure 1). For example, with five
loci and one common variant allele per locus with a frequency in the
population of 10 percent, one would expect that by chance alone, the
genotype with the five variant alleles would be found in 0.105 or one
in 100,000 people. The practical usefulness of such consideration is
that researchers can expect that complex genotypes observed with some
frequency among cases, even in a small percent of cases, might be likely
candidates for further study. Thus complex genotypes are one specific
scenario where examining case-only frequencies might help focus the
search for allele combinations with a potentially significant role in
disease causation.
Examining Homogeneous Subset and Determinants
Of Severity Or Phenotype
The case-only approach to the analysis of complex genotypes could also
help define smaller, more homogenous subsets distinguished by phenotype,
disease progression, or severity (20). These more
homogeneous subsets can be compared with respect to genotypes to study
the possible relation between genotype and outcome. For example, one
might separate cases of first occurrence of venous thrombosis from those
of recurrence, or cases of myocardial infarction by age of onset. Such
analyses can provide clues to the genetic heterogeneity underlying common
disorders and help relate genotypic variation to clinically relevant
differences in outcome.
Searching For Supra-Multiplicative Interactions
The analysis of disease registries in a case-only fashion can provide
some indication of interaction among alleles using, for example, log-linear
models. Log-linear models have been used to test for higher order associations
in a multiplicity of settings, including associations between structural
anomalies in the same baby (32), between maternal
and fetal genotypes and disease (33-35), and between
genotype markers and disease (36,37).
In conjunction with prior information of linkage between alleles, the
results of log-linear modeling, for example, can provide some indication
of whether the joint effect of certain allele combinations differs from
that expected under a multiplicative null hypothesis, i.e., whether
the joint effect equals the product of each allele’s effect alone.
In this respect, such approach is a natural extension of the well-known
case-only odds ratio (21), which measures the deviation
from simple multiplicative effects of two factors, and is subject to
similar interpretations and limitations (22,38).
Among the limitations of log-linear modeling are its sensitivity to
sparse data, which is a real concern in the analysis of complex genotypes,
and its assumption of a log-linear relation between factors. Moreover,
marginal effects of each allele cannot be measured. Nevertheless, the
context of complex genotypes with relatively common susceptibility alleles
is precisely where one might expect to find significant, supra-multiplicative
interactions if such genotypes contribute to disease.
Limitation of the Scanning
Disease Registries Using A Case-Only Approach
The main thrust of this discussion of case-only designs (table
7-5) is that, in the context of the study of complex genotypes in
disease etiology, well-designed disease registries can be informative
and relatively inexpensive resources that could complement and enhance
the value of traditional case-control or cohort studies. Provided the
key assumptions of case-only studies hold, such assessment of disease
registries could provide researchers with clues on the health effects
of certain complex genotype, including their potential contribution
to disease in the population, their involvement in significant supra-multiplicative
interaction, and their relation to case subgroups with distinctive etiology
and severity of outcome. It should be noted that such approach does
not pursue gene discovery in the manner of a genome scan. Rather, it
uses known allelic variation at candidate loci as a starting point to
examine the potential contribution to disease etiology.
However, the assessment of disease registries using case-only methods
is not an alternative to traditional studies that use population controls.
Its limitations, which stem from the limits of the case-only design,
should be recognized clearly:
- Case-only studies offer no information about marginal risks for
specific genotypes.
- They assess only deviations from purely multiplicative interactions,
which is only one of the possible scenarios in which different alleles
at different loci interact to modulate disease risk (7).
Important genetic effects, such as strong effects from single gene
variants, might not generate a signal. Other complex scenarios that
defy facile conclusions include interactions of gene variants that
increase disease risk with others that reduce risk.
- The validity of interaction assessment in case-only studies is exquisitely
sensitive to independence assumptions for the factors in the population
(39). One might imagine combinations that could
be expected to violate that independence, whether among genotype and
environmental factors (eg, cigarette smoking and genes involved in
detoxification, due to selective attrition in the population), or
else among different genes. Also, independence among gene combinations
might be violated if population stratification induces correlation
between genes (even if the genotypes occur independently within each
subpopulation), though this problem might be solved by appropriate
stratification. Alternatively, dependencies between loci might occur
if the two loci are on the same chromosome, even in populations with
random mating, if mutations are relatively recent. So far, these appear
mostly to be theoretical concerns, for lack of empirical evidence
that such dependencies. Recent data for example suggests that this
is not a problem for the more commonly studied metabolic genes (40).
- Case-only studies also do not provide full information on the attributable
fraction for gene combinations, and they only estimate their upper
limit.
- Case-only studies require that cases represent a random or unselected
series of cases, as could be assembled by a population-based registry.
Series assembled from tertiary centers might be subject to selection
forces that might preclude valid inferences.
At the same time, one should note the potential advantages in speed,
efficiency, and precision of the two-tiered approach that begins with
case-only studies and uses their findings to design further studies.
Such advantages include the following:
- Researchers could complete their studies more rapidly, by examining
existing or easily developed case groups, as might be derived from
population-based disease registries. Several such registries, for
example, already exist for many conditions including cancers and birth
defects.
- The resources otherwise used to enroll and study convenient controls
could be used instead to expand the spectrum of candidate genes and
alleles among increasing numbers of cases.
- The effect estimates could gain in precision (no variance associated
with controls) and validity (no population stratification).
- Subsequent studies might be more efficient. For example, case-control
studies might use evidence from case-only studies to decide on reasonable
sample sizes (for cases and controls) that might vary by ethnicity
or disease subgroup.
In a broader perspective, one should realistically approach such screening
of case-only studies within the well-known expectations of a screening
process. Along with valid results both false positives and false negatives
will occur, the former for example if linkage disequilibrium unrelated
to disease were present.
Final Considerations
It is tempting to speculate to what extent the conceptual framework
presented here can be transferred from genomics to proteomics. Whereas
the ability to detect multiple genetic variants with a single functional
test would appear to increase researchers’ ability to examine
an ever-widening web of metabolic networks, the independence assumptions
might be commonly violated with proteomics, because of feedback regulation
systems governing the transcription of genes into proteins.
Recruiting sufficient numbers of study participants
remains a basic issue. Although analytic techniques such as multi-factor
dimensionality reduction (41) are being suggested
as possible enhancements in the study of complex genotypes, sample size
requirements remain an inescapable challenge for researchers.
Finally, case-only approaches in no way diminish
but in fact underscore the need to tackle and solve the complex legal,
ethical, social, and practical issues of selecting, recruiting, and
testing representative samples of the population for genetic studies.
As researchers realize the synergy between traditional
and non traditional studies, we should encourage a concerted effort
at developing, on the one hand, well-designed disease registries, and
on the other, representative samples from well-defined populations that
are large and accessible.
Summary
- In the study of interaction, it is useful to evaluate and present
information on both the marginal and joint exposures (gene-environment
combinations). Departure from specified models of interaction can
be informative but should not be the sole focus of the analysis. Key
information for each term of the interaction includes the frequency
of the gene-environment exposure (or the complex genotype) in the
reference population, and the disease-associated relative risks and
attributable fractions.
- The appreciation of certain epidemiologic units of analysis can
facilitate the systematic assessment and clear presentation of data
on multiple factors and interaction. In population-based case-control
studies, particularly in their simplest forms (with two dichotomous
factors), one such unit of analysis is the two-by-four table. Though
more complex situations can require more complex approaches, the two-by-four
table in many situations can provide a useful starting point for data
assessment, presentation, and analysis.
- Population-based disease registries can be important research resources.
Using analytic approaches derived from case-only methods, researchers
could scan such registries for complex genotypes and other exposure
combinations associated with the highest attributable fraction for
disease. Such analysis could also provide clues on the presence of
supra-multiplicative interaction, as well as of determinants of disease
severity and phenotype among population subgroups.
- Case-control and case-only studies are best viewed as complementary
rather than alternative approaches to the assessment of interaction.
Appreciating the basic units of epidemiologic analysis within each
study design, and using both designs synergically, can contribute
to the efficient and systematic assessment of the role in disease
etiology of multiple factors, complex genotype, and interaction.
- Botto LD, Khoury MJ. Commentary: facing the challenge
of gene-environment interaction: the two-by-four table
and beyond. Am J Epidemiol 2001;153:1016-20.
- Thompson WD. Effect modification and the limits
of biological inference from epidemiologic data. J Clin Epidemiol
1991;44:221-32.
- Greenland S, Rothman KJ. Concepts of interaction.
In: Greenland S, Rothman KJ, eds. Modern Epidemiology:
Lippincott-Philadelphia, 1998:329-342.
- Clayton D, McKeigue PM. Epidemiological methods
for studying genes and environmental factors in complex
diseases. [see comments.]. Lancet 2001;358:1356-60.
- Pritchard JK, Stephens M, Rosenberg NA, Donnelly
P. Association mapping in structured populations. Am J Hum
Genet 2000;67:170-81.
- Satten GA, Flanders WD, Yang Q. Accounting for unmeasured
population substructure in case-control studies of
genetic association using a novel latent-class model. Am J Hum Genet
2001;68:466-77.
- Khoury MJ, Adams MJ, Jr., Flanders WD. An epidemiologic
approach to ecogenetics. Am J Hum Genet 1988;42:89
95.
- Hamosh A, Scott AF, Amberger J, Valle D, McKusick
VA. Online Mendelian Inheritance in Man (OMIM). Hum Mutat
2000;15:57-61.
- Collins FS, Patrinos A, Jordan E, Chakravarti A,
Gesteland R, Walters L. New goals for the U.S. Human Genome
Project: 1998-2003. Science 1998;282:682-9.
- Gerhardt A, Scharf RE, Beckmann MW, et al. Prothrombin
and factor V mutations in women with a history of
thrombosis during pregnancy and the puerperium [see comments]. N Engl
J Med 2000;342:374-80.
- Akar N, Akar E, Akcay R, Avcu F, Yalcin A, Cin
S. Effect of methylenetetrahydrofolate reductase 677 C-T, 1298 A-C,
and 1317 T-C on factor V 1691 mutation in Turkish deep vein thrombosis
patients. Thromb Res 2000;97:163-7.
- Martinelli I, Taioli E, Bucciarelli P, Akhavan
S, Mannucci PM. Interaction between the G20210A mutation of the
prothrombin gene and oral contraceptive use in deep vein thrombosis.
Arteriosclerosis, Thrombosis & Vascular
Biology 1999;19:700-3.
- Cattaneo M, Chantarangkul V, Taioli E, Santos JH,
Tagliabue L. The G20210A mutation of the prothrombin gene in
patients with previous first episodes of deep-vein thrombosis: prevalence
and association with factor V G1691A,
methylenetetrahydrofolate reductase C677T and plasma prothrombin levels.
Thromb Res 1999;93:1-8.
- Botto LD, Yang Q. 5,10-Methylenetetrahydrofolate
reductase gene variants and congenital anomalies: a HuGE
review. Am J Epidemiol 2000;151:862-77.
- Christensen B, Arbour L, Tran P, et al. Genetic
polymorphisms in methylenetetrahydrofolate reductase and
methionine synthase, folate levels in red blood cells, and risk of
neural tube defects. Am J Med Genet
1999;84:151-7.
- Shaw GM, Rozen R, Finnell RH, Wasserman CR, Lammer
EJ. Maternal vitamin use, genetic variation of infant
methylenetetrahydrofolate reductase, and risk for spina bifida. Am
J Epidemiol 1998;148:30-7.
- Wilson A, Platt R, Wu Q, et al. A common variant
in methionine synthase reductase combined with low cobalamin
(vitamin B12) increases risk for spina bifida. Mol Genet Metab 1999;67:317-23.
- Vandenbroucke JP, Koster T, Briet E, Reitsma PH,
Bertina RM, Rosendaal FR. Increased risk of venous thrombosis in
oral-contraceptive users who are carriers of factor V Leiden mutation.
Lancet 1994;344:1453-7.
- Rockhill B, Newman B, Weinberg C. Use and misuse
of population attributable fractions. Am J Public Health
1998;88:15-9.
- Begg CB, Zhang ZF. Statistical analysis of molecular
epidemiology studies employing case-series. Cancer Epidemiol
Biom Prev 1994;3:173-5.
- Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical
logistic models and case-only designs for
Assessing susceptibility in population-based case-control studies.
Stat Med 1994;13:153-62.
- Khoury MJ, Flanders WD. Nontraditional
epidemiologic approaches in the analysis of gene-environment interaction:
case-control studies with no controls. Am J Epidemiol 1996;144:207-13.
- Yang Q, Khoury MJ. Evolving methods in
genetic epidemiology. III. Gene-environment interaction in epidemiologic
research. Epidemiol Rev 1997;19:33-43.
- Ioannidis JP, McQueen PG, Goedert JJ, Kaslow
RA. Use of neural networks to model complex immunogenetic
associations of disease: human leukocyte antigen impact on the progression
of human immunodeficiency virus
infection. Am J Epidemiol 1998;147:464-71.
- Marchevsky AM, Patel S, Wiley KJ, et al.
Artificial neural networks and logistic regression as tools for prediction
of
survival in patients with Stages I and II non-small cell lung cancer.
Mod Pathol 1998;11:618-25.
- Duh MS, Walker AM, Ayanian JZ. Epidemiologic
interpretation of artificial neural networks. Am J Epidemiol
1998;147:1112-22.
- Tu JV. Advantages and disadvantages of
using artificial neural networks versus logistic regression for predicting
medical outcomes [see comments]. J Clin Epidemiol 1996;49:1225-31.
- Warner B, Misra M. Understanding neural
networks as statistical tools. The American Statistician 1996;50:284-293.
- Aragaki CC, Greenland S, Probst-Hensch
N, Haile RW. Hierarchical modeling of gene-environment interactions:
estimating NAT2 genotype-specific dietary effects on adenomatous polyps.
Cancer Epidemiol Biom Prev
1997;6:307-14.
- Khoury MJ, Little J. Human genome epidemiologic
reviews: the beginning of something HuGE. Am J Epidemiol
2000;151:2-3.
- Miettinen OS. Proportion of disease caused
or prevented by a given exposure, trait or intervention. Am J Epidemiol
1974;99:325-32.
- Beaty TH, Yang P, Khoury MJ, Harris EL,
Liang KY. Using log-linear models to test for associations among congenital
malformations. Am J Med Genet 1991;39:299-306.
- Wilcox AJ, Weinberg CR, Lie RT. Distinguishing
the effects of maternal and offspring genes through studies of
"case-parent triads". Am J Epidemiol 1998;148:893-901.
- Shields DC, Kirke PN, Mills JL, et al.
The "thermolabile" variant of methylenetetrahydrofolate
reductase and neural
tube defects: An evaluation of genetic risk and the relative importance
of the genotypes of the embryo and the
mother. Am J Hum Genet 1999;64:1045-55.
- Shields DC, Ramsbottom D, Donoghue C, et
al. Association between historically high frequencies of neural tube
defects and the human T homologue of mouse T (Brachyury). Am J Med
Genet 2000;92:206-11.
- Huttley GA, Wilson SR. Testing for concordant
equilibria between population samples. Genetics 2000;156:2127-35.
- Khamis HJ, Hinkelmann K. Log-linear-model
analysis of the association between disease and genotype. Biometrics
1984;40:177-88.
- Yang Q, Khoury MJ, Sun F, Flanders WD.
Case-only design to measure gene-gene interaction. Epidemiology
1999;10:167-70.
- Albert PS. Limitations of the case-only
design for identifying gene-environment interactions. Am J Epidemiol
2001;154:687-93.
- Garte S, Gaspari L, Alexandrie AK, et al.
Metabolic gene polymorphism frequencies in control populations. Cancer
Epidemiol Bioma Prev 2001;10:1239-48.
- Ritchie MD, Hahn LW, Roodi N, et al. Multifactor-dimensionality
reduction reveals high-order interactions among
estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet
2001;69:138-47.
|