Centers for Disease Control and Prevention
Centers for Disease Control and Prevention
Centers for Disease Control and Prevention CDC Home Search CDC CDC Health Topics A-Z    
Office of Genomics and Disease Prevention  
Office of Genomics and Disease Prevention

 

Draft Genetic Test Review

Cystic Fibrosis
Analytic Validity
Print Version


ANALYTIC VALIDITY 

Question   8:  Is the test qualitative or quantitative?  
Question   9:  How often is a test positive when a mutation is present?  
Question 10:  How often is the test negative when a mutation is not present?
Question 11:  Is an internal QC program defined and externally monitored?
Question 12:  Have repeated measurements been made on specimens?
Question 13.  What is the within- and between-laboratory precision?
Question 14:  If appropriate, how is confirmatory testing performed to resolve false positives in a timely manner?
Question 15:  What range of patient specimens has been tested?
Question 16:  How often does the test fail to give a useable result?
Question 17:  How similar are results obtained in multiple laboratories using the same, or different, technology?


ANALYTIC VALIDITY 

Question 8:  Is the test qualitative or quantitative? 

In prenatal screening for cystic fibrosis, the aim is to identify couples in which both the mother and her partner have identifiable cystic fibrosis mutations.  Their offspring have a 1 in 4 risk of having cystic fibrosis and definitive diagnostic testing is available.  The DNA test results are qualitative (e.g., a specific mutation is reported as present or absent).


ANALYTIC VALIDITY 

Question 9:  How often is a test positive when a mutation is present?

Question 10:  How often is the test negative when a mutation is not present? 

Summary

External proficiency testing schemes are the only major reliable source currently available for computing analytic sensitivity and specificity.  The following caveats should be kept in mind, however, when examining these estimates.  First, external proficiency testing schemes are designed to be educational.  For that reason, ‘difficult’ samples are over-represented.  Also, laboratories from outside the U.S. are included, and both research and clinical laboratories participate.  In spite of these shortcomings, this source of data can be useful in establishing a baseline of performance for laboratories.

Based on data from the American College of Medical Genetics and the College of American Pathologists (ACMG/CAP) Molecular Genetics Survey Set MGL

  • The analytic sensitivity is 97.9% (95 percent CI 96.9 to 98.7%), after removing challenges involving delI507
  • The analytic specificity is 99.4% (95 percent CI 98.7 to 99.8%), after removing challenges involving delI507 and adjusting for the rate of wrong mutations
  • The analytic sensitivity and specificity are essential constant between 1996 and 2001

Based on data collected by the European Concerted Action on Cystic Fibrosis

  • The overall raw error rate is 2.8% (95 percent CI 2.4 to 3.4%), consistent with raw error rates in the ACMG/CAP MGL Survey (3.0%, 95 percent CI 2.4 to 3.9%)
  • Although all errors were reported (raw error rate), the reports did not distinguish between the type of error (e.g., false negative or false positive).  For that reason, analytic sensitivity and specificity could not be determined
  • Over three years of the program, performance steadily improved
  • Over three years of the program, only 48 percent of laboratories made no errors
  • Most (about 90 percent) of the errors occurred during the analytic phase of testing

Definitions
Analytic performance is summarized by the sensitivity and specificity of the detection system.  Analytic sensitivity is the proportion of positive test results, when a detectable mutation is present (i.e., the test is designed to detect that specific mutation).  The analytic sensitivity may also be called the analytic detection rate.  Another way of expressing analytic sensitivity would be the true positives divided by the sum of the true positives and false negatives.  False negative results could be due to technical errors in the analytic phase (e.g., sample placement, contamination, expired reagents and cross-reactivity) or to administrative/clerical errors in the pre-analytic or post-analytic phases (e.g., incorrect interpretation of correct analytic result, sample mislabeling and incorrectly copying a correct result).

Analytic specificity is the proportion of negative test results when no detectable mutation is present.  Analytic specificity can also be expressed in terms of the analytic false positive rate.  This would be the proportion of positive test results when no detectable mutations are present (1-analytic specificity).  Another way of expressing analytic specificity would be the true negatives divided by the sum of the true negatives and false positives.  False positive results could be due to technical errors in the analytic phase (e.g., errors in placement, contamination, expired reagents, or non-specific reactions) or to administrative/clerical errors in the pre-analytic or post-analytic phases (e.g., mislabeling of samples, wrong interpretation of correct results, or copying results incorrectly). 

Wrong mutations are a third type of error, along with false negative and false positive results.  These occur when a mutation is present, but is incorrectly identified.  For purposes of this review, wrong mutations will be considered false positive results, since there is an opportunity for correcting them by confirmatory testing.  Wrong mutations occurring in the pre-analytic, analytic or post-analytic phases are all included in the analysis. 

Optimal source(s) of data
Few data sources exist for estimating analytic validity.  Published reports of method comparisons and screening experiences provide limited information on only a few testing methodologies.  The data are derived from a small number of laboratories and the “true” genotypes of the tested samples are often uncertain (e.g., not confirmed by another methodology, laboratory consensus or sequencing).  External proficiency testing programs (e.g., ACMG/CAP molecular Surveys and the European Concerted Action for Cystic Fibrosis) provide a source of data that have several advantages.  They include a large proportion of clinical testing laboratories that represent the range of methodologies presently being used.  In addition, the samples distributed have confirmed genotypes.  However, basing analytic performance estimates on external proficiency testing also has drawbacks, including:

·        over-representation of ‘difficult’ samples, due to the educational nature of the proficiency testing program

·        mixing of screening and diagnostic exercises

·        few challenges which do not contain a detectable mutation

·        reporting summary results in ways that do not allow a straightforward computation of analytic sensitivity and specificity

·        an important proportion of laboratories participating in the ACMG/CAP program are from outside the United States

·        artificial nature of sample preparation, shipping and handling to ensure stability

·        some participating laboratories involved with research or manufacturing rather than clinical activities 

One additional consideration might be that laboratories perform differently when testing proficiency testing samples than when testing clinical samples on a routine basis.  This difference might take the form of less good performance because the sample is handled outside of the laboratory routine.  Alternatively, the performance might be better because extra attention might be paid to obtaining a reliable result.  Future analyses should be aimed at providing reliable method- and, possibly, mutation-specific analytic performance estimates.  One approach for collecting such data might include the following steps:

·        An independent body would develop a standard set of samples, most of which would be randomly selected from the general population.  Included in the standard set, however, would also be additional, less common genotypes (e.g., rarer heterozygotes, homozygotes and compound heterozygotes).  Sub-cloned samples are inadequate for this use.  The group that collects and administers these samples and the subsequent analyses could be under the auspices of the FDA, ACMG, or CAP, or be a non-profit institution such as Coriell Institute for Medical Research (Camden, NJ).  This effort would need grant support to begin the process.

·        The sample set would then be available for method validation.  Correct genotypes would be arrived at by consensus, or, if disagreements emerged, by a reference method (e.g., sequencing).  The current validation practice of having a laboratory (or manufacturer) run a series of samples with unknown genotype (as is often the case for computing specificity) is inadequate, because there has been no comparison to a ‘gold standard’ (e.g., sequencing).  For example, how can a laboratory running an unknown sample determine whether a positive finding is a true, or a false, positive or, whether a negative finding is a true, or false, negative?

·        Ideally, this blinded sample set would be available to manufacturers as part of the pre-market approval process, with the understanding that multiple laboratories using these commercial reagents would be asked by the manufacturer to analyze portions of the sample set independently.  This initial assay validation process is distinct from assay control samples that are discussed later (Question 11). 

Appropriate sample size for determining analytic specificity can be derived by choosing an acceptable target specificity and an acceptable lower limit that should be excluded in the 95 percent confidence interval.  The higher the specificity chosen and the tighter the confidence interval, the larger is the sample size that will be necessary to provide a definitive answer.  For example, if a laboratory chose a target specificity of 98 percent and wanted to rule out a specificity of 90 percent, it would need to correctly identify at least 49 of 50 known negative samples (estimated using the binomial distribution).  On the other hand, a target specificity of 99.5 percent and a desire to rule out a specificity of 98 percent would require correctly identifying at least 398 of 400 known negative samples.  The determination of even higher analytic specificity with tighter confidence intervals may not be economically feasible for an individual laboratory.  However, this could be attained by a consortium of laboratories using the same methodology, or by a manufacturer that forms a consortium of laboratories using its reagents.  

Appropriate sample size for determining the analytic sensitivity (detection rate) could be derived using similar analyses.  If a laboratory chose a target sensitivity of 95 percent and wanted to rule out a sensitivity of 80 percent, it would need to correctly identify at least 38 of 40 chromosomes with known mutations.  A higher sensitivity estimate of 98 percent that rules out a rate of 95 percent would require the correct identification of at least 196 of 200 chromosomes with known mutations.  If mutation-specific detection rates are desired, each would need the same number of challenges.  Again, however, this may not be feasible for individual laboratories but may be possible for a consortium or manufacturer, especially for the more common mutations. 

The analytic performance (analytic sensitivity and specificity) could then be determined for each methodology, along with an estimate of between-laboratory, within-method variability.  Further, estimates could be made for specific racial/ethnic groups, based on the mutation-specific performance and the frequency of each mutation within that group.  Overall, the analytic performance for laboratories in the United States could be estimated, given the mix of methodologies for established screening laboratories.  All of these analyses could be done using a 2x2 table, and all rates could be accompanied by 95 percent confidence intervals (CI).  Published method comparisons focus on technical errors in the analytic phase and usually do not deal with the pre- and post-analytic phases of the laboratory testing process.  

The ACMG/CAP external proficiency testing scheme
Background and definitions  As part of ACMG/CAP external proficiency testing in the United States, purified DNA from established cell lines (derived from human cells with known mutations http::/locus.umdmj.edu/nigms/qc/dnaqc.html) is distributed to enrolled laboratories.  The majority of these laboratories are likely to be providing clinical services, but reagent manufacturers and research laboratories also participate.  In late 2001, there were 45 participants reporting cystic fibrosis results.  A false positive result occurs when the laboratory reports finding a mutation in the sample, when none is present.  A false negative result occurs when a laboratory reports no mutation, but a mutation for which it tests is, in fact, present in the sample.  A third type of error occurs when the laboratory accurately identifies that a mutation is present, but it is not the correct mutation (e.g., a laboratory that is able to separately identify delF508 and delI507 reports finding delF508 when only the delI507 mutation is present).  The three types of errors all are included in the analysis and encompass all three phases of testing. 

The present analysis, which utilizes the ACMG/CAP data, initially examines the rates of these three types of errors independently, by chromosome (e.g., the results on one chromosome are counted separately from the results reported for the other).  

Gap in Knowledge: How should the finding of a wrong mutation influence computation of the analytic performance?  The relationship between the third type of error (wrong mutation) and analytic performance has not yet been formally addressed.  In this document, a wrong mutation will be considered an incorrect result, since this type of error could cause harm.  For example, diagnostic testing in the fetus might target the mutations reported in the couple and not identify the correct mutation in the fetus.  Also, family members would not receive correct information.  Further, a wrong mutation finding will treated as a false positive in this document.  Confirmatory testing of positive results will provide the opportunity to correct this type of error

  Error rates for the ACMG/CAP external proficiency testing scheme  Table 2-1 shows the number of alleles tested and the results from the ACMG/CAP Molecular Genetics Laboratory (MGL) Survey from 1996 to 2001.  Overall, 3.0% (95 percent CI 2.4 to 3.9%) of the alleles were incorrectly identified.  For all data between 1996 and 2001, 2,131 of 2,198 chromosomes 97.0 percent were correctly identified (95 percent CI 96.1 to 97.6%).  Appendix A contains a complete listing of the sample challenges, the responses along with the type of error (e.g., false positive), and any other adjustments made during the analysis (e.g., laboratory did not test for a mutation included in the challenge).  More errors (56) occurred between 1996 and 1998 than between 1999 and 2001 (11).  However, the composition of challenges in the earlier time period explains much of this excess and is taken into account in analyses that are presented later in this section.

Table 2-1.  CFTR Mutation Testing:  Results of the ACMG/CAP MGL Survey

Year, # of labs, Alleles Tested, Correct N(%), Incorrect N (%), Type of Incorrect result

Table 2-2 makes use of the ACMG/CAP MGL Survey data (Appendix A) to compute a preliminary estimate of analytic sensitivity and specificity.  The apparent improvement in performance over time may be real, or due to differences in the types of challenges.  For example, no wild/wild mutation challenges were included prior to 2000, while 8 of 12 challenges since then were wild/wild.  It is not possible, because of the small numbers, to stratify the results by methodology or to provide separate estimates of performance for most of the mutations tested. 

Table 2-2.  Analytic Performance for Identifying All Cystic Fibrosis Mutations According to Data from the ACMG/CAP Molecular Genetics Survey

Year, Analytic sensitivity (%), (95% CI), Analytic Speciticity, (95% CI)

Complicating factors in interpreting these results  An additional aim of these external challenges was education.  For that reason, it may not be appropriate to use these data to determine analytic performance without taking into account the design of these exercises.  For example, 14 percent (3/21) of the challenges required that participating laboratories distinguish between the delI507 and delF508 mutations.  All of these challenges occurred in the first three years of the survey.  The delI507 mutation occurs in less than 1 in 2500 non-Hispanic Caucasians tested (1 percent of 1/25).  This rare and difficult laboratory circumstance is emphasized because of the educational and laboratory-improvement focus of the ACMG/CAP MGL Survey.  An additional complicating feature arises because it is not always clear whether some ‘false negatives’ might be due to laboratories not testing for the mutation.  The present analysis attempts to take this into account (Appendix A).  The opportunity for a laboratory to identify a wrong mutation is considerably greater in proficiency testing exercises than in practice, due to the high frequency of mutations.  For that reason, the rate of wrong mutations in proficiency testing needs to be adjusted downward in order to simulate performance in routine clinical practice. 

A more reliable approach to estimating analytic sensitivity and specificity  It is possible to recompute the previous analysis using only challenges that do not involve delI507.  Separate estimates can then computed for the four challenges involving delI507.  These two stratified estimates of analytic performance are shown in Table 2-3, along with the summary estimate from Table 2-2.  The analytic specificity for identifying the delI507 mutation is poorer than for the other mutations.  The sensitivity is actually better, since some mutation was reported in all instances where a delI507 mutation was present.  A better estimate of overall performance that would be expected in the real world is found when challenges involving the delI507 mutation are not counted (the bolded row in Table 2-3).  

Table 2-3.  Analytic Performance With and Without delI507 Mutation Challenges Based on the ACMG/CAP Molecular Genetics Survey Data

Mutation Group, Challenges, Sensitivity (%), Speciticity (%)

1 95 percent CI
2
A more reliable estimate of analytic specificity is provided later in this section. 

Table 2-4 shows the analytic performance estimates by year for challenges without delI507.  No trend is evident for improvement in analytic sensitivity, and the overall rate of 97.9 percent appears reasonable.  The upper and lower confidence intervals could be taken to model the most pessimistic (96.8 percent) and optimistic (98.7 percent) estimates of analytic sensitivity.  A standardized mutation panel is now becoming widely adopted, as a result of ACMG recommendations (Grody WW, 2001).  As a result, manufacturers are now marketing reagents (under the rule for Analyte Specific Reagents – ASR) that have been subjected to good manufacturing processes.  Analytic performance may improve as a consequence.  The present analysis establishes a ‘baseline’ estimate of analytic sensitivity and specificity, against which to assess that possibility. 

Analytic specificity is more difficult to interpret.  Thirteen of 15 errors occurred during one distribution (1997-B).  Some of these might be explained by sample mix-up, but at least half appear not to be due to this cause.  The European Concerted Action on Cystic Fibrosis reported that commercial kits were found to have problems identifying G551D and R553X.  The majority of errors in the 1997 ACMG/CAP survey occurred when challenging these two mutations.  

Table 2-4.  Analytic Performance for Cystic Fibrosis Mutations According to Data from the ACMG/CAP Molecular Genetics Survey (Excluding delI507 Mutation Challenges)

year, Analytic sensitivity (%), (95% CI), Analytic Specificity (%), (95% CI)

1 A more reliable estimate of analytic specificity is provided later in the next section

A final estimate for analytic specificity  As stated earlier, the definition being used in this analysis for false positives (1-specificity) are composed of two types of errors: false positive results and wrong mutations.  Finding a ‘false positive’ can occur whenever a detectable mutation is not present; a common situation in screening.  The finding of a ‘wrong mutation’ can only occur when a mutation is present; a relatively uncommon common situation in screening.  However, it is common in proficiency testing samples.  There have been a total of 949 mutation challenges and 922 wild challenges (after ignoring all delI507 samples).  Thus, a mutation being tested for is present in about 50 percent of the chromosomes.  Conversely, only about 1.8 percent of chromosomes in the general pregnancy population will have a mutation identified (1/25 non-Hispanic Caucasians are carriers and about 90 percent of the mutations on the mutated chromosome can be detected).  For this reason, the rate of wrong mutations must be ‘discounted’ by a factor of about 28 (50/1.8).  Thus, although Table 2-1 shows a ratio of 10 false positive results to 34 wrong mutations, the expected ratio in the general population would be more like 10 false positives to 1 or 2 wrong mutations (34/28).  After samples have been removed that included delI507 and after the rate of ‘wrong mutation’ in the general population has been taken into account, the revised estimate of analytic specificity is 99.4% (95 percent CI 98.7 to 99.8%).

Gap in Knowledge: Method- and mutation-specific analytic performance estimates

Tables 2-2 through 2-4 present the best available data for estimating analytic performance.  These analyses should not be interpreted as being complete or robust.  For example, the problems identified by the delI507/delF508 challenges are method-specific, but no attempt is made in this report to analyze laboratory performance by specific method.  The results here are for the mix of methodologies presently being used in the United States and, as such, represent the average laboratory performance a clinician might expect when ordering such testing.  To generate more reliable analytic performance estimates, large numbers of specimens with known genotypes will need to be run using specific methodologies.  For example, Gasparini et al. (1999) used the PCR/OLA methodology to identify 114 newborns with a mutation; all of these were subsequently confirmed by DNA sequencing.  Although this rules out false positives, it does not provide an estimate of analytic sensitivity, since only a small random subset of negative results was similarly sequenced and the possibility of false negative results exists.  Until more refined performance estimates are available, the existing information is useful in estimating clinical performance. 

 

Gap in Knowledge: Analytic performance estimates are available for only a small number of mutations.  Only a small number of mutations (10) has been subjected to external proficiency testing (delF508, delI507, G542X, 621+1G>T, G85E, W1282X, G551D, R553X, 1717-1G>T, and R117H ).  The majority of the mutations in the recommended panel have not have been subjected to external proficiency testing.  This is an important consideration because performance may vary according to laboratory methodology. 

 

Gap in Knowledge:  Analytic performance and mutation panel size.  It is possible that analytic performance will differ, depending on the numbers of mutations tested, even when the same methodology is employed.  Panels utilizing a higher number of mutations might be more robust because of automation or, conversely, the larger number of analytic steps might be more prone to errors.

Sensitivity and specificity by person rather than by chromosome
It is possible to compute analytic sensitivity and specificity according to whether a person's genotype has been correctly classified, rather than whether an individual chromosome has been correctly classified.  That is, the genotype is correct or incorrect when detectable mutations are present (analytic sensitivity) or the genotype is correct or incorrect when no detectable mutations are present (analytic specificity).  Table 2-5 shows the results of this analytic approach, stratified by the year that proficiency testing results were obtained.  All three samples containing a delI507 mutation have been removed from the analysis.  According to these data, the overall estimate for analytic sensitivity is 95.9% (95 percent CI 93.3 to 97.1%).  This is lower than shown in Table 2-4 (97.9 percent), where the analysis is by chromosome rather than by person.  When the analysis is performed by person, wrong mutations are included in the computation of analytic sensitivity.  Once the eight instances of wrong mutations are accounted for, analytic sensitivity is corrected upward to 97.2 percent.  This estimate is now similar to that found when the analysis was by chromosome.  Table 2-5 also shows an analytic specificity of 99.7% (95 percent CI 98.4 to 99.9%), consistent with that found in Table 2-4 (99.4 percent). 

Table 2-5.  Analytic Sensitivity and Specificity based on the ACMG/CAP MGL Survey, Classified According to Whether a Person's Genotype is Correctly Identified

Detectable mutation present, Correct N (%), Incorrect N (%), Totals

External proficiency testing in Europe 
Results of the proficiency testing survey conducted by the European Concerted Action for Cystic Fibrosis.  Table 2-6 shows the results of that study.  Because that study’s report did not distinguish between false positive, false negative and incorrect mutations, it is not possible to compute an analytic sensitivity or specificity.  However, the overall rate of 2.8 percent incorrectly classified chromosomes (95 percent CI 2.4 to 3.4%) is similar to the overall 3.0 percent error rate found in the ACMG/CAP survey reported earlier in this section.  This study also reported that 48 percent of 114 participants had correct responses for all challenges.  Another 39 percent committed one error, while 2 percent failed all challenges.  

Interpretation of the results.  This survey also attempted to determine the cause of errors, including sample contamination and clerical errors.  In general, laboratories would have been able to correct their false positive results, if their policy had been to reanalyze samples with positive results.  This indicates that the original sample was neither contaminated nor incorrectly labeled.  Clerical errors/reporting mistakes/incorrect interpretations were estimated to be responsible for 90 percent of the errors.  The error rate was not associated with the numbers of samples processed by the laboratory.   

Table 2-6.  Survey Results from the European Concerted Action for Cystic Fibrosis, According to Whether the Chromosome was Correctly Classified

Year, Alleles tested, correct N (%), Incorrect N(%)

Comparing error rates for DNA-based cystic fibrosis testing with biochemical testing for Down syndrome

A similar proficiency testing program (Survey FP) for maternal serum Down syndrome markers serves as one source for comparing error rates in non-DNA testing.  In that survey (jointly sponsored by the Foundation for Blood Research and CAP), participating laboratories are asked to measure three biochemical markers, to combine these measurements with a pre-assigned maternal age, and then calculate a Down syndrome risk.  Five challenges are distributed, three times each year.  The proportion of laboratories with one or more outlying Down syndrome risk estimates on a given distribution is routinely reported to all participants each year (FBR/CAP FP Survey Participant Summary Report, 2000, FP-C).  This proportion has remained relatively constant between 1998 and 2000 at about 5 percent.  Assuming that the laboratory will have only one (or two) of the five risks classified as being an outlier, the actual error rate per sample distributed is closer to 1 or 2 percent.  This is similar to the error rate for the ACMG/CAP MGL survey found in Table 2-1.  This analysis is limited to data prior to 2001, since a problem with sample preparation was identified in 2001 and corrected in 2002.

References

Appendix A.  Data used to calculate analytic sensitivity and specificity 

Table 2-7.  Computations for the ACMG/CAP Proficiency Testing Surveys

Response and commentary of the CAP/ACMG Biochemical and Molecular Genetics Resource Committee

 

Updated on August 13, 2004