|
Draft Genetic Test Review Cystic
Fibrosis ANALYTIC VALIDITY Question
8: Is the test
qualitative or quantitative? Question 8: Is
the test qualitative or quantitative? In prenatal screening for cystic fibrosis, the aim is to identify couples in which both the mother and her partner have identifiable cystic fibrosis mutations. Their offspring have a 1 in 4 risk of having cystic fibrosis and definitive diagnostic testing is available. The DNA test results are qualitative (e.g., a specific mutation is reported as present or absent). Question 9: How often is a test positive when a mutation is present? Question
10: How often is the test negative when a mutation is not
present?
Definitions Analytic
specificity is the proportion of negative test results when no
detectable mutation is present. Analytic
specificity can also be expressed in terms of the analytic false
positive rate. This would
be the proportion of positive test results when no detectable mutations
are present (1-analytic specificity).
Another way of expressing analytic specificity would be the true
negatives divided by the sum of the true negatives and false positives.
False positive results could be due to technical errors in the
analytic phase (e.g., errors in placement, contamination, expired
reagents, or non-specific reactions) or to administrative/clerical
errors in the pre-analytic or post-analytic phases (e.g., mislabeling of
samples, wrong interpretation of correct results, or copying results
incorrectly). Wrong
mutations are a third type of error, along with false negative and
false positive results. These
occur when a mutation is present, but is incorrectly identified. For purposes of this review, wrong mutations will be
considered false positive results, since there is an opportunity for
correcting them by confirmatory testing.
Wrong mutations occurring in the pre-analytic, analytic or
post-analytic phases are all included in the analysis. Optimal source(s) of data · over-representation of ‘difficult’ samples, due to the educational nature of the proficiency testing program · mixing of screening and diagnostic exercises · few challenges which do not contain a detectable mutation · reporting summary results in ways that do not allow a straightforward computation of analytic sensitivity and specificity · an important proportion of laboratories participating in the ACMG/CAP program are from outside the United States · artificial nature of sample preparation, shipping and handling to ensure stability ·
some participating laboratories involved with research or
manufacturing rather than clinical activities One additional consideration might be that laboratories perform differently when testing proficiency testing samples than when testing clinical samples on a routine basis. This difference might take the form of less good performance because the sample is handled outside of the laboratory routine. Alternatively, the performance might be better because extra attention might be paid to obtaining a reliable result. Future analyses should be aimed at providing reliable method- and, possibly, mutation-specific analytic performance estimates. One approach for collecting such data might include the following steps: · An independent body would develop a standard set of samples, most of which would be randomly selected from the general population. Included in the standard set, however, would also be additional, less common genotypes (e.g., rarer heterozygotes, homozygotes and compound heterozygotes). Sub-cloned samples are inadequate for this use. The group that collects and administers these samples and the subsequent analyses could be under the auspices of the FDA, ACMG, or CAP, or be a non-profit institution such as Coriell Institute for Medical Research (Camden, NJ). This effort would need grant support to begin the process. · The sample set would then be available for method validation. Correct genotypes would be arrived at by consensus, or, if disagreements emerged, by a reference method (e.g., sequencing). The current validation practice of having a laboratory (or manufacturer) run a series of samples with unknown genotype (as is often the case for computing specificity) is inadequate, because there has been no comparison to a ‘gold standard’ (e.g., sequencing). For example, how can a laboratory running an unknown sample determine whether a positive finding is a true, or a false, positive or, whether a negative finding is a true, or false, negative? ·
Ideally, this blinded sample set would be available to
manufacturers as part of the pre-market approval process, with the
understanding that multiple laboratories using these commercial reagents
would be asked by the manufacturer to analyze portions of the sample set
independently. This initial
assay validation process is distinct from assay control samples that are
discussed later (Question 11). Appropriate
sample size for determining analytic specificity can be derived by
choosing an acceptable target specificity and an acceptable lower limit
that should be excluded in the 95 percent confidence interval. The higher the specificity chosen and the tighter the
confidence interval, the larger is the sample size that will be
necessary to provide a definitive answer.
For example, if a laboratory chose a target specificity of 98
percent and wanted to rule out a specificity of 90 percent, it would
need to correctly identify at least 49 of 50 known negative samples
(estimated using the binomial distribution).
On the other hand, a target specificity of 99.5 percent and a
desire to rule out a specificity of 98 percent would require correctly
identifying at least 398 of 400 known negative samples.
The determination of even higher analytic specificity with
tighter confidence intervals may not be economically feasible for an
individual laboratory. However,
this could be attained by a consortium of laboratories using the same
methodology, or by a manufacturer that forms a consortium of
laboratories using its reagents. Appropriate sample
size for determining the analytic sensitivity (detection rate) could be
derived using similar analyses. If
a laboratory chose a target sensitivity of 95 percent and wanted to rule
out a sensitivity of 80 percent, it would need to correctly identify at
least 38 of 40 chromosomes with known mutations.
A higher sensitivity estimate of 98 percent that rules out a rate
of 95 percent would require the correct identification of at least 196
of 200 chromosomes with known mutations.
If mutation-specific detection rates are desired, each would need
the same number of challenges. Again,
however, this may not be feasible for individual laboratories but may be
possible for a consortium or manufacturer, especially for the more
common mutations. The
analytic performance (analytic sensitivity and specificity) could then
be determined for each methodology, along with an estimate of
between-laboratory, within-method variability.
Further, estimates could be made for specific racial/ethnic
groups, based on the mutation-specific performance and the frequency of
each mutation within that group. Overall,
the analytic performance for laboratories in the United States could be
estimated, given the mix of methodologies for established screening
laboratories. All of these
analyses could be done using a 2x2 table, and all rates could be
accompanied by 95 percent confidence intervals (CI).
Published method comparisons focus on technical errors in the
analytic phase and usually do not deal with the pre- and post-analytic
phases of the laboratory testing process. The ACMG/CAP external proficiency
testing scheme The
present analysis, which utilizes the ACMG/CAP data, initially examines
the rates of these three types of errors independently, by chromosome
(e.g., the results on one chromosome are counted separately from the
results reported for the other). Gap
in Knowledge: How should the finding of a wrong mutation influence
computation of the analytic performance? The
relationship between the third type of error (wrong mutation) and
analytic performance has not yet been formally addressed.
In this document, a wrong mutation will be considered an
incorrect result, since this type of error could cause harm.
For example, diagnostic testing in the fetus might target the
mutations reported in the couple and not identify the correct mutation
in the fetus. Also, family
members would not receive correct information.
Further, a wrong mutation finding will treated as a false
positive in this document. Confirmatory
testing of positive results will provide the opportunity to correct this
type of error Table 2-1. CFTR Mutation Testing: Results of the ACMG/CAP MGL Survey Table 2-2 makes use of
the ACMG/CAP MGL Survey data (Appendix A) to compute a preliminary
estimate of analytic sensitivity and specificity.
The apparent improvement in performance over time may be real, or
due to differences in the types of challenges.
For example, no wild/wild mutation challenges were included prior
to 2000, while 8 of 12 challenges since then were wild/wild.
It is not possible, because of the small numbers, to stratify the
results by methodology or to provide separate estimates of performance
for most of the mutations tested. Complicating factors in interpreting these results
An additional aim of these external challenges was education.
For that reason, it may not be appropriate to use these data to
determine analytic performance without taking into account the design of
these exercises. For
example, 14 percent (3/21) of the challenges required that participating
laboratories distinguish between the delI507 and delF508 mutations.
All of these challenges occurred in the first three years of the
survey. The delI507
mutation occurs in less than 1 in 2500 non-Hispanic Caucasians tested (1
percent of 1/25). This rare
and difficult laboratory circumstance is emphasized because of the
educational and laboratory-improvement focus of the ACMG/CAP MGL Survey.
An additional complicating feature arises because it is not
always clear whether some ‘false negatives’ might be due to
laboratories not testing for the mutation.
The present analysis attempts to take this into account (Appendix
A). The opportunity for a
laboratory to identify a wrong mutation is considerably greater in
proficiency testing exercises than in practice, due to the high
frequency of mutations. For that reason, the rate of wrong mutations in proficiency
testing needs to be adjusted downward in order to simulate performance
in routine clinical practice. A more reliable approach to estimating analytic sensitivity and
specificity It is
possible to recompute the previous analysis using only challenges that
do not involve delI507. Separate
estimates can then computed for the four challenges involving delI507. These two stratified estimates of analytic performance are
shown in Table 2-3, along with the summary estimate from Table 2-2.
The analytic specificity for identifying the delI507 mutation is
poorer than for the other mutations.
The sensitivity is actually better, since some mutation was
reported in all instances where a delI507 mutation was present.
A better estimate of overall performance that would be expected
in the real world is found when challenges involving the delI507
mutation are not counted (the bolded row in Table 2-3). 1
95 percent CI Table 2-4
shows the analytic performance estimates by year for challenges without
delI507. No trend is
evident for improvement in analytic sensitivity, and the overall rate of
97.9 percent appears reasonable. The
upper and lower confidence intervals could be taken to model the most
pessimistic (96.8 percent) and optimistic (98.7 percent) estimates of
analytic sensitivity. A
standardized mutation panel is now becoming widely adopted, as a result
of ACMG recommendations (Grody WW, 2001).
As a result, manufacturers are now marketing reagents (under the
rule for Analyte Specific Reagents – ASR) that have been subjected to
good manufacturing processes. Analytic
performance may improve as a consequence.
The present analysis establishes a ‘baseline’ estimate of
analytic sensitivity and specificity, against which to assess that
possibility. Analytic
specificity is more difficult to interpret.
Thirteen of 15 errors occurred during one distribution (1997-B).
Some of these might be explained by sample mix-up, but at least
half appear not to be due to this cause.
The European Concerted Action on Cystic Fibrosis reported that
commercial kits were found to have problems identifying G551D and R553X.
The majority of errors in the 1997 ACMG/CAP survey occurred when
challenging these two mutations. 1 A more reliable estimate of analytic specificity is provided later in the next section Gap in Knowledge: Method- and mutation-specific
analytic performance estimates
Tables 2-2 through 2-4 present
the best available data for estimating analytic performance. These analyses should not be interpreted as being complete or
robust. For example, the
problems identified by the delI507/delF508 challenges are
method-specific, but no attempt is made in this report to analyze
laboratory performance by specific method.
The results here are for the mix of methodologies presently being
used in the United States and, as such, represent the average laboratory
performance a clinician might expect when ordering such testing.
To generate more reliable analytic performance estimates, large
numbers of specimens with known genotypes will need to be run using
specific methodologies. For
example, Gasparini et al. (1999) used the PCR/OLA methodology to identify 114 newborns
with a mutation; all of these were subsequently confirmed by DNA
sequencing. Although this
rules out false positives, it does not provide an estimate of analytic
sensitivity, since only a small random subset of negative results was
similarly sequenced and the possibility of false negative results
exists. Until more refined
performance estimates are available, the existing information is useful
in estimating clinical performance.
Gap in Knowledge: Analytic performance estimates
are available for only a small number of mutations. Only a small
number of mutations (10) has been subjected to external proficiency
testing (delF508, delI507, G542X, 621+1G>T, G85E, W1282X, G551D,
R553X, 1717-1G>T, and R117H ). The
majority of the mutations in the recommended panel have not have been
subjected to external proficiency testing.
This is an important consideration because performance may vary
according to laboratory methodology.
Gap in Knowledge:
Analytic performance and mutation panel size. It is possible
that analytic performance will differ, depending on the numbers of
mutations tested, even when the same methodology is employed. Panels utilizing a higher number of mutations might be more
robust because of automation or, conversely, the larger number of
analytic steps might be more prone to errors.
Sensitivity and specificity by person
rather than by chromosome External proficiency testing in Europe Interpretation of the results. This survey also attempted to determine the cause of errors,
including sample contamination and clerical errors. In general, laboratories would have been able to correct
their false positive results, if their policy had been to reanalyze
samples with positive results. This
indicates that the original sample was neither contaminated nor
incorrectly labeled. Clerical
errors/reporting mistakes/incorrect interpretations were estimated to be
responsible for 90 percent of the errors.
The error rate was not associated with the numbers of samples
processed by the laboratory. Comparing error rates for DNA-based cystic fibrosis testing with
biochemical testing for Down syndrome A similar proficiency testing program (Survey FP) for maternal serum Down syndrome markers serves as one source for comparing error rates in non-DNA testing. In that survey (jointly sponsored by the Foundation for Blood Research and CAP), participating laboratories are asked to measure three biochemical markers, to combine these measurements with a pre-assigned maternal age, and then calculate a Down syndrome risk. Five challenges are distributed, three times each year. The proportion of laboratories with one or more outlying Down syndrome risk estimates on a given distribution is routinely reported to all participants each year (FBR/CAP FP Survey Participant Summary Report, 2000, FP-C). This proportion has remained relatively constant between 1998 and 2000 at about 5 percent. Assuming that the laboratory will have only one (or two) of the five risks classified as being an outlier, the actual error rate per sample distributed is closer to 1 or 2 percent. This is similar to the error rate for the ACMG/CAP MGL survey found in Table 2-1. This analysis is limited to data prior to 2001, since a problem with sample preparation was identified in 2001 and corrected in 2002. Appendix A. Data used to
calculate analytic sensitivity and specificity Table 2-7. Computations for the ACMG/CAP Proficiency Testing Surveys Response
and commentary of the CAP/ACMG Biochemical and Molecular Genetics
Resource Committee |
||
Updated on August 13, 2004