FDA Logo links to FDA home pageCenter for Devices and Radiological Health, U.S. Food and Drug AdministrationHHS Logo links to Department of Health and Human Services website

FDA Home Page | CDRH Home Page | Search | CDRH A-Z Index | Contact CDRH U.S. Food and Drug Administration Center for Devices and Radiological Health [Skip navigation]
horizonal rule

Meeting Summary
FDA CADx Open Public Workshop

see related

 

Parklawn Building Conference Room D
5600 Fishers Lane, Rockville, MD
January 26, 1996

9:00
Opening Remarks
Elizabeth Jacobson, PhD
Deputy Director for Science, Center for Devices and Radiological Health (CDRH)

This is the computer aided diagnosis workshop, and we are basically here today to hear from you. Today marks the beginning of FDA's effort to write guidance in this area, and the first step is that we want to hear from you - particularly your ideas on the relevant characteristics of these devices and how they should be evaluated. We'll have a number of speakers this morning and then plenty of time for discussion this afternoon. We're going to start off with a couple of introductory talks about device regulation and general software policy, followed by several speakers from the Center who'll be talking on CADx, and then we'll get to the main part of the program. Obviously we can't cover everything today, so later on as ideas occur to you, please submit them to us in writing.

9:05
Overview of Medical Device Regulation
Robert Chissler
Director, Programs Operations Staff, Office of Device Evaluation (ODE), CDRH

Medical devices are regulated under the authority of the medical device amendments to the Federal Food, Drug, and Cosmetic Act of 1938 [21 Code of Federal Regulations (CFR)]. The 1938 Act required devices to be safe but placed the burden of proof to remove unsafe products on the government. The 1976 Medical Device Amendments and 1990 Safe Medical Devices Act established a comprehensive scheme of regulation. They defined the term "device", provided for classification of all medical devices into three classes, required device manufacturer registration and listing of products, and set up procedures for clinical investigations (Investigational Device Exemption -- IDE), premarket notification (510(k)), and premarket approval (PMA). In addition they included the basic prohibition on misbranding and adulteration, required adherence to good manufacturing practices (GMP), and provided for post market surveillance (of selected devices). The device definition was written with such generality as to include a wide range of products--including computer software--within its scope: "...an instrument, apparatus, implement, machine, contrivance, implant, in vitro reagent, or other similar or related article, including any component, part, or accessory, which is - (1) recognized in the official National Formulary, or the United States Pharmacopeia, or any supplement to them, (2) intended for use in the diagnosis of disease or other conditions, or in the cure, mitigation, treatment, or prevention of disease, in man or other animals, or (3) intended to affect the structure or any function of the body of man or other animals, and [which is not a drug]..." [Section 201(h)]. Further information concerning these regulations can be obtained through the Center's Division of Small Manufacturers Assistance (DSMA) at 800-638-2041.

9:20
Software Policy
Harvey Rudolph, PhD
Acting Deputy Director, Office of Science and Technology (OST), CDRH

An FDA software policy is under development at this time. No such policy currently exists, although there has been a "draft" policy for several years. With regard to this policy, the first question normally asked is how software can be a medical device anyway. The previous speaker gave the answer to that question, indicating that much software is a device since it is a component or accessory to a device, and other software would be covered (probably as a "contrivance") under the very broad device definition. The second question with regard to software policy is why such a policy is needed. A policy is needed because any device is subject to all of the requirements of the Food Drug and Cosmetics Law as amended, including registration and listing, GMPs, and premarket review, unless specifically exempted by regulation. If we rigorously applied these provisions to all software medical devices, it would represent a tremendous burden on both the Agency and on the medical community. The software policy is risk/exemption based. We are trying to assess the risks of medical software devices, decide on appropriate exemptions, and write classification regulations to implement the exemptions. Thus, our first task is to define criteria for assessing the impact of product failure on the patient and apply them rationally to known products. Here are some of the criteria which seem reasonable: (1) Seriousness of the disease to be diagnosed or treated, (2) Time frame for use of the information (3) Concordance with accepted medical practice, (4) Format of data and its presentation, (5) Individualized vs. Aggregate patient care recommendations, and (6) Clarity of the algorithm. We are planning upon holding a public workshop to discuss the policy. A Federal Register notice announcing the workshop and laying out some of the details of such a policy is under preparation and should be published in a few months [Registrants at this workshop will be informed of details of the Software Policy meeting]. Finally, how do CADx devices fit into this picture? CADx systems are either accessories or have been determined to have significant impact on patient care and thus need to be regulated via premarket review. Thus, they are at the high impact/risk end of the medical device software spectrum. The question is not how they should be regulated, but what sort of information is necessary to make good decisions on clearing these products.

9:30
Workshop Goals and Definition of CADx Devices
David G. Brown, PhD
Chief Scientist, Division of Electronics and Computer Science (DECS), OST

The Center has begun receiving premarket approval submissions for devices with CADx features. These are devices which use modern data analysis techniques to carry out some portion of the decision making process previously provided by the physician or other health care professionals. Thus, CADx refers to computer-aided-diagnosis devices, decision support products, and not to computer-aided diagnostic-devices, computerized devices used to provide basic input to the diagnostic process (e.g., CT or MRI systems). Examples of CADx devices include image analysis products used to identify potential abnormalities, ECG analysis programs, and in-vitro diagnostic test devices which flag "out-of-bounds" test results and/or provide some more sophisticated data synthesis. The Center believes that in order to perform intelligent and timely reviews of these devices with consistency across product lines, it is appropriate at this time to develop reviewer guidance for them. To that end the Center has established a CADx Working Group, composed of reviewers and other technical professionals from all CDRH components. Today's public workshop is an effort to obtain input from the public at the initial stage of this project. In particular, we pose two questions to you: (1) How should CADx devices be categorized?--i.e., What device attributes are relevant to the degree of regulatory oversight exercised by the Center over a particular CADx device? And (2) What evaluation methodologies are appropriate to the assessment of the performance of these devices? We look forward to your comments today and to any written comments which you are warmly invited to contribute to us in the future.

9:40
Cleared Devices - Clinical Laboratory Devices
Max Robinowitz, MD
Division of Clinical Laboratory Devices, ODE

In 1995, the FDA approved the first two computer-assisted devices for evaluation of Papanicolaou (Pap) smear slides. These devices are limited to rescreening of Pap smear slides that have been previously screened by manual microscopy and were diagnosed as negative (within normal limits) (WNL). As yet, no computer-assisted devices have been approved for primary screening of Pap smear slides. FDA regulates computer-assisted Pap smear readers as in-vitro diagnostic medical devices. FDA's premarket approvals for the NeoPath, Inc. AutoPap 300 ZC Automatic Pap Rescreener System and the Neuromedical Systems, Inc. PAPNET Testing System were based on the evaluation by FDA staff and panel of consultants of the design of each device, and the manufacturers' pre-clinical and clinical testing data that demonstrated the effectiveness and safety of the devices for their intended use and intended populations for use. The approved intended uses and indications for use are published in the FDA-approved package insert for each device.

9:50
Cleared Devices - Computer Interpreted ECGs
Mark Massi, M.Eng., MS
Division of Cardiovascular, Respiratory, and Neurological Devices, ODE

The purpose of this talk is to briefly discuss evaluations of diagnostic cardiovascular devices, and to point out areas of concern. Electrocardiographs with computer interpretation are devices that acquire the diagnostic-quality electrocardiogram (ECG), extract various measurements or features from the signal, and apply those features to some deterministic or probabilistic decision-making algorithm to arrive at an interpretation. If the algorithm uses patient information and traditional ECG measurements, and the output is over read by an appropriate physician, then we would be less concerned with algorithm performance. Since the devices are used by the general clinical community, however, we routinely request performance statistics for each possible interpretation. In rare cases, when we determined that stand-alone software packages are not accessories to other classified devices, we also exempted the devices from premarket notification. In addition to devices that mimic clinical decision making, the Division also conducts reviews of devices with advanced digital signal processing of the ECG. Heart rate variability analysis is illustrative of how further processing of the data elicits further consideration of these submissions. These issues involve not only clinical application of the information but also providing the user with an understanding of how the data were generated. If the measurements have some basis in traditional electrocardiography, and a reasonable approach to validation is taken to document the reliability of the information, then we are likely to continue to clear devices for market, with restricted labeling, until such time when a manufacturer is able to provide clinical data to support specific diagnostic indications. This strategy may not completely alleviate our concerns for potentially unreliable data, that can not be verified by the user, and the impact of the data on clinical decision making and, ultimately, on patient safety.

Presentations from the Public

10:00
Defining the CADx Domain
Harry Burke, MD, PhD
New York Medical College

Computer-aided Diagnostic (CADx) devices will be an indispensable part of the future practice of clinical medicine. Computer-aided diagnostic devices must be distinguished from computer-based diagnostic devices. A diagnosis is a prediction. An important implication of CADx's as prediction devices is that CADx software is assessed in terms of its accuracy, not its efficacy. There are at least three prediction methods: statistical, expert systems, and empirical formulae. Each method requires a somewhat different evaluation approach. Prediction methods can be either general methods applied to medical problems, or unique-to-the-medical-problem methods. All three methods can be applied to: (1) the generation and analysis of diagnostic test information (including laboratory tests such as a SMAC or genetic screening, functional tests such as an ECG, and radiographic tests such as CT, MRI, mammogram) and (2) the integration of diagnostic information. Three device categories can be defined: devices marketed to the public, devices marketed to physicians involving peer-reviewed statistical methods, and devices marketed to physicians that have not been peer-reviewed or devices that involve expert system methods. The creation of CADx guidelines is currently being performed by an internal FDA CADx Working Group. In order to obtain (i) a non-regulatory perspective and (ii) additional CADx expertise, non-FDA CADx experts (who are not currently associated with a CADx device) should be invited to join the CADx Working Group. The view that the FDA is an obstacle to innovation in medicine may no longer be correct. In the CADx domain it may be that rather than trying to protect everyone from everything, the FDA is adopting the view that its job is to make sure that companies that wish to market CADx devices to physicians provide: (i) the FDA with sufficient information so that it can determine that the device meets its functional and accuracy claims and (ii) physicians with sufficient information so that he/she can determine if the device will be medically useful in his/her specific clinical situation.

10:10
Expert Consultants and Potential Conflicts of Interest
Dorothy L. Rosenthal, MD, FIAC
Johns Hopkins Medical Institution

I propose that experts with potential conflicts be included in the review process. Their potential conflict status should be considered when the panel makes the final advisory decision; however, their educated and experienced opinions should be utilized to the fullest. Too much is at stake to lose the expertise of highly qualified individuals merely because they have perhaps in the past represented industrial developers with a financial interest. This applies to all developers and any consultants who are working toward a common goal. The American public will have confidence in FDA scientists and their consultants if they adhere to principles of scientific integrity including full disclosure, and all will benefit from these devices, especially the patients for whom they are designed.

10:20
QEEG as an Adjunct to Psychiatric Practice
Duane Shuttlesworth, PhD
NxLink, Ltd.

Quantitative analyses of the electroencephalograph (EEG) have been available for over 50 years. During the past several decades, researchers at New York University Medical Center's Brain Research Laboratories have developed the neurometric method of analysis of the EEG. Digitized EEG recordings are subjected to a Fast Fourier Transform to extract information on power, frequency, and phase. These measures are log-transformed to approximate gaussianity, age-regressed to account for variations in EEG variable distribution as a function of age, and compared to an extensive normative database to derive Z-score estimates of deviation from normal. The Z-score matrix provided by the 1200+ extracted variables is subjected to a multivariate analysis that corrects for intercorrelations between and within measures to provide accurate estimates of the difference between patient Z-score values and those of the normal population. Discriminant analyses are used to identify variables that contribute to differentiating the patient from normal (normal vs. abnormal comparison), and correlate the profile with that of various empirically defined clinical groups. The likelihood that the profile matches profiles of groups consisting of individuals with known disorders is stated in probabilistic terms. Test sensitivity and specificity is evaluated by using ROC curves. Statistical tables summarize the results of the analysis. Data is further transformed into topographic maps that visually depict the extent of the deviation of the patient from the normal reference group. The neurometric method is based on widely accepted statistical procedures, and has been replicated in a variety of laboratories around the world. Neurometrics provides an empirical test of brain function and structure, and is useful as a diagnostic aid for patient evaluation, treatment planning, and treatment monitoring, enhancing the quality of patient care in psychiatry, neurology, and related disciplines.

10:45
Evaluation of Clinical Performance of a Classification Device
John W. Kennedy
MedStat Consulting Group

With the rapid expansion in the speed and capabilities of computer software and hardware, we are now beginning to have the capabilities of simulating some of the human decision processes involved in differential diagnosis and image pattern recognition. While certainly issues of validation and hazard analysis of software systems intended for human diagnosis are a significant part of the evaluation process, the fundamental part of assessing effectiveness of such devices will remain as the clinical evaluation of the accuracy of the classification process the device claims to perform. The behavior of the binomial distribution, and the associated issues of statistical rigor in experimental design, will be absolutely critical in understanding the procedures for performance evaluation of computer-assisted diagnosis devices. The binomial distribution dictates how devices which classify patients (images, samples, assays etc.) into one of two categories (normal/abnormal) will behave, and this behavior can often be counter-intuitive. Evaluating such a device requires particularly careful attention to the standard clinical design issues of poolability, cross-over evaluations on the same samples/patients with and without the device in place in the diagnostic process, statement of all assumptions involved in testing, statement of the correct hypotheses, collection of correctly random and unbiased samples from the specified target population, and separation of performance measures into those independent of prevalence assumptions and those which explicitly or implicitly depend on prevalence (such as predictive value). Most importantly of all, perhaps, is the need to include the entire range of difficulties of classification into the evaluation process. None of these factors can be ignored in designing or reviewing submissions on such classification devices.

10:50
Methods for Validating Cytological Screener Performance
Carl Youngmann, PhD, RAC
NeoPath, Inc.

NeoPath, Inc. has been engaged for six years in the development of an automated cytological screener for the analysis of Pap smears. Last year, the NeoPath AutoPap 300 QC System was granted PMA approval following the methods presented below. It is NeoPath's position that the basic clinical testing of a cytological screening device must include well-controlled, scientifically valid studies to establish a quantitative baseline of device performance. This baseline provides a means for FDA reviewers to assess the initial safety and efficacy of a device as well as evaluate device enhancements and future devices. For either primary screening or QC rescreening of Pap smears in accord with the Bethesda system of cervical cytology four interlocking studies are needed: prospective intended use, historical and current sensitivity, multi-run precision-reproducibility, and historical consistency.

11:00
Evaluation Criteria for Cytological Screening Devices
Laurie J. Mango, MD
Neuromedical Systems, Inc.

The incidence of and mortality from invasive cervical cancer has been increasing in the United States since 1986 in the purportedly well screened population of white women under 50 years of age. This disturbing trend is thought to be attributable, at least in part, to the spread of the Human Papilloma virus (HPV), which has now reached near epidemic proportions in young women throughout the world. Thus, the factors contributing to the development of cervical cancer are apparently so widespread that more women are developing this preventable cancer despite screening. In addition, there are estimated to be over 50 million cervical smear tests performed each year in the United States. Therefore, it is paramount that FDA assure that any new automated device to be used as a substitute for conventional microscopic screening be very rigorously tested to assure that even rare cytopathology or unusual presentations of abnormalities are detected, as even "rare" cases can affect tens of thousands of American women at a national level. There are three primary degrees of freedom that must be considered when assuring that all presentations have been sampled and included in the clinical trial: (1) Diagnostic variations (all categories of The Bethesda System, including various types of adenocarcinomas); (2) Patient variations (prevalence of abnormal cells, size of abnormal cells, smear patterns - must include various patient demographics); (3) Laboratory variations (staining color and intensity, coverslip bubbles, artifacts - must include a wide variety of laboratories). The clinical trial should simulate the device's intended use as closely as possible. In addition, in terms of establishing standards for comparison with conventional screening, bias should be minimized by utilizing historic screening records and applying exhaustive microscopic searching and automated rescreeners to ensure that no significant abnormality is missed by the substitutive test. In conclusion, given the public health threat represented by the HPV epidemic, the rise in incidence in some populations and the potential for prevention of cervical cancer, the objective of automated cervical smear screening should be increasing the accuracy of the test, and not serving as a labor substitute at the expense of sensitivity.

11:10
PAP Smear Automation--Clinical Trial Design
David J. Zahniser, PhD
Cytyc Corporation

Reasonable standards have already been established for clinical trials for medical devices within the FDA and the medical community. CADx Pap smear medical devices are essentially no different from other devices, and careful trial design should be followed. In general, there are four issues that need to be addressed. First, the device must be tested in its intended use. This means that levels of disease prevalence used in the trial should reflect prevalence in routine use. Too high a level of disease in a trial can affect vigilance of the participants. Second, a reference standard must be established. Essentially, we want to compare the discriminatory level of a Pap smear screening device to that of humans. Using standards, one can evaluate sensitivity and specificity, preferably using an analytical method such as Receiver Operator Characteristic curves. Such performance standards and comparisons are necessary for educating potential users in how a Pap smear device may work in their laboratories. It will also help users to compare the device to other alternatives; for example it may be desirable to compare the accuracy and cost effectiveness of a double screening by humans to the combined use of humans and a machine. Third, given the subjective nature of cytology, and the difficulty with borderline diagnoses, a method of adjudicating the difference between the reference and the CADx result must be developed. There are many methods that can be applied to the Pap smear, and any one of these should prove acceptable. They include the use of an independent pathologist, a panel review, biopsy, Human Papilloma Virus testing, and patient follow-up. Finally, in a trial, vigilance must be controlled since taking part in any trial elevates ones attention and performance. This demands the use of a two armed clinical trial so that both the CADx arm and the standard (e.g. human) arm have elevated levels of vigilance. One should also consider how vigilance might be raised, or even lowered, in actual use when use when using a CADx system. In summary, careful clinical trial design is critical for evaluating Pap smear methodology and the results of trials should be presented in such a way that the potential users of a system will be able to comprehend potential performance in their own laboratories.

11:20
The Method of Dual Gaussians for Characterization of CADx Devices
Stanley Lapidus
Tufts University School of Medicine

The main points I would like to leave with you are: (1) Discriminating power is the underlying measure of performance of a CADx device. (2) As a minimum, a single measurement of both sensitivity AND specificity is necessary to establish discriminating power. This would also allow an ROC analysis to be conducted. A well-designed study should also generate sensitivity AND specificity results for a human reviewing the same material without the CADx device. This would yield two-armed results and should form the basis for the product's evaluation. (3) If a CADx device, by itself, has greater discriminating power that a human, then approval should be forthcoming. (4) If a CADx device does not, by itself, have greater discriminating power than a human, then this information should be made clear in the labeling. Without this caution clearly in the labeling, users WILL mistakenly assume such a device is better than a human--after all the FDA approved it. (5) A device with lower discriminating power may still provide benefit if it is cheaper than a human, and is used for back-up purposes only. The FDA should make information available to allow a user to calculate cost-effectiveness. This information is either the underlying discriminating power or the sensitivity/specificity results.

11:30
Innovations in PAP Smear Testing: Device Assessment from the Clinical Laboratory Perspective
Paul Krieger, MD, MIAC
Corning Clinical Laboratories

As one of the nation's leading providers of cervical cytology testing services, Corning Clinical Laboratories has been working with each of the developers of new technologies that promise to improve Pap testing accuracy. There is a risk, however, that vigorous marketing by these developers and media exposure will create pressure to adopt these new technologies before problems inherent in their use are fully resolved and before complete data regarding their efficacy is available. In particular, our concerns include (1) likely loss of positive predictive value of abnormal results during a months- or years-long pathologist and cytotechnologist "learning curve" (the two recently FDA-approved devices, PapNet and AutoPap achieve higher sensitivity by "flagging" cells or slides for manual re-review, leading possibly to negative cases being misinterpreted as abnormal because of device created biases), (2) possible loss of situation awareness among those who read Pap smear slides and who may develop complacency and decreased detection rates, (3) the risk that new standards of care will be created "by default" in the face of a dearth of clinical outcome studies, and (4) and difficult post-FDA-approval period marked by unclear regulations, a lack or reporting format standards and inconsistent reimbursement policies. Perhaps public health interests plus the very near approach by some of these new technologies to actual diagnostic processes, warrant an FDA paradigm shift, namely to require technology assessment that goes beyond the normal purview. For example, the FDA could require post-approval market surveillance that includes rigorous training requirements and measurement of training outcomes, and post-approval clinical specificity studies. Other agencies and professional organizations could play a more active role than they have been, as well. These concerns notwithstanding, our company believes the FDA-approved technologies plus others still in development promise to significantly improve the accuracy of cervical cytology testing.

11:55
The PAP Test: A Framework for Assessing its Effectiveness
David Garner, PhD
Oncometrics Imaging Corp.

The primary goal of the Pap screening test is to eliminate death and suffering that result from invasive cancer of the cervix, at an acceptable cost. This is accomplished by identifying and treating pre-invasive cancerous lesions. Other uses of the Pap test, such as the detection of ovarian cancer, sexually transmitted diseases, etc., are, at best, secondary goals. The sensitivity of the Pap test to the detection of STDs and other conditions is poor. The performance of the existing Pap test should be well understood by those who assess a computer assisted Pap test. The conventional Pap test leads to treatment of very many women for conditions which, if left untreated, would never develop into invasive cancer. For every woman with a truly pre-cancerous lesion, at least 30 and possibly more than 50 receive treatment. Computer assisted diagnostic Pap screeners should be assessed in the context of an objective assessment of the current conventional system. For an imperfect test, accuracy is most completely characterized by the ROC (receiver-operator characteristic) curve. The test accuracy describes the ability of the test to separate overlapping true positive and true negative populations. The notion of positive predictive value combines test accuracy with disease incidence. For the Pap test process to achieve a positive predictive value of only 10% would require test accuracy corresponding to a separation of the true positive and true negative populations by 5-6 standard deviations. This problem is fundamental to the current Pap test. It does not result from the occasional failure of a screener to detect a "needle in a haystack;" it results from the facts that few "haystacks with needles" have the potential to become invasive cervical cancer, and we can't tell which ones they are with the current approach.

12:05
Evaluating the Quality and Utility of Digital Images
Richard Olshen, PhD
Stanford University School of Medicine

My presentation will focus on evaluating the quality and utility of digital images. I will summarize some of the principles developed in ongoing collaborations with Stanford colleagues Robert M. Gray, PhD, Professor of Electrical Engineering; Debra Ikeda, MD, Section Chief of Breast Imaging; and many others. Our research involves compression and enhancement of digital medical images and the applications of these technologies to computer-aided diagnosis. We study CT images of the lung and mediastinum, MR chest images taken for the purpose of measuring major vessels in the chest, and many aspects of mammography. However, when computational interventions affect what radiologists see, it is imperative that these interventions be evaluated by carefully designed clinical experiments. Experimental protocols should simulate ordinary clinical practice to the extent possible. A nearly full range of examples should be included. Findings should be reportable using the American College of Radiology Standardized Lexicon. Statistical analyses should be based upon assumptions that are faithful to the clinical scenario and tasks. The numbers of studies and radiologists should be sufficient to ensuresatisfactory size and power for the principal statistical tests of interest. "Gold standards" must be defined clearly and be consistent with experimental hypotheses. Sources of bias should be recognized and minimized. To the extent possible, I will deal with all these issues in my 10 minutes, not least some statistical techniques that we feel are particularly relevant here.

12:15
Computer-Aided Diagnosis (CAD) in Mammography
Robert Schmidt, MD
The University of Chicago

Mammography has become the standard for detection of early, more curable breast cancer. Increasing numbers of mammograms are being performed, and reading screening mammograms is a repetitive task that requires high attention to minute detail. While mammography is the best method for early breast cancer detection, radiologists interpreting the mammograms are fallible, and an estimated 30% of breast cancers are present but missed on mammograms. A second human observer can detect up to 15% more cancer, but having a second reader is time-consuming and probably impractical, being done in only about 5% of practices in the US. This type of problem is one that lends itself to automation, through computer-aided diagnosis. The detection process of flagging potential abnormalities for the radiologist can be accomplished by CAD, using digitized mammograms. CAD can be defined as a diagnosis made by a radiologist using computer output to improve his or her decision, with a goal of making radiographic interpretation easier and more accurate. Work over the last decade has developed mammographic CAD programs to a level where about 85% of breast cancers are detected by the computer, at a reasonably low false positive rate of 1 or 2 per image. The detection programs work for both calcifications and masses, the two prime signs of breast cancer on mammogram. At the University of Chicago, we have been running CAD in our clinical mammography area for over a year, on more than 5,000 mammograms. Analysis of the first 1,149 patients shows that CAD performed as expected, identifying 86% of the screening-detected cancers. We have also been greatly encouraged by our studies showing that retrospective CAD can correctly identify approximately 50% of lesions clinically missed by radiologists (observation errors). The current level of development is appropriate for clinical introduction, acting as a second reader to aid the radiologist, who retains the final decision on whether or not potential areas on the mammogram are suspicious enough to warrant further work up. I believe that introduction of technology of this type is inevitable, as the results to date have been very promising. Radiologist access to CAD in the clinical setting should act to significantly improve patient care.

12:25
Deconstructing CADx: Hierarchical Considerations in Examining Safety and Effectiveness
Roger H. Schneider
Consultant

Many different types of systems are categorized as "computer aided diagnosis (CADx)." The essential characteristics of these systems that have relevance for assessments of safety and effectiveness can be analyzed into three groups: (1), type of system design, of which three are identified here, (2), type of information base incorporated in the system, of which three are also identified, these three not having a one to one correspondence with the three types of system design, and (3), type, or level of certainty, of system output, of which four were identified, again without any direct correspondence to the preceding six classes. Within each of the above three groups, a hierarchy of the types was identified, each level having more serious implications for evaluation of safety and effectiveness than the preceding. Thus, there is, hypothetically at least, the possibility of 36 distinct combinations of the levels of the three groups of essential characteristics within the set of CADx systems, each having a different convolution of the challenges and opportunities for assessment of safety and effectiveness represented within the groups. This suggests that CADx systems cannot be thought of as a single type of entity and that a single regulatory policy cannot be successful for all. Rather, a regulatory scheme that recognizes each of the identified subgroups and establishes policies for them which account for the various ways in which they may be combined is required. Some systems will require extensive clinical testing. Others may be fully evaluated through engineering testing alone.

12:35
Lunch Break
1:30
Public Discussion of CADx Categories and Evaluation Methods
Moderators:
Mary P. Anderson, ScD
Chief, Medical Imaging and Computer Applications Branch (MICAB), DECS, OST
David G. Brown, PhD
Chief Scientist, DECS

1. Richard Eaton, NEMA presented a list of questions from his organization: (i) General Questions/Issues: How do computer-aided devices differ from computer-controlled devices? Are there additional requirements for 510(k) applications? What are the 510(k) and postmarket surveillance requirements which will be associated with these types of devices? Are there different levels of concerns for each of these types of devices? Will recalls be required if there is a "glitch" in the software? How do user errors influence regulation of these devices? (ii) Issues pertaining to transmission of data over line: We have concerns over what happens when data is sent over a line: How is validation handled when data is sent over a line, as opposed to on-site validation? What about patient confidentiality issues? (iii) Issues relating to a device "acting as a physician:" Will there need to be a duplicative diagnosis done by a physician if the device itself "is acting as a physician" and thus renders a diagnosis? (iv) Sufficiency of electronic signatures: Is there an "equivalent" for the doctor's signature, an "electronic OK" which is needed before the diagnosis data can be transmitted across the lines? (v) Effect of favorable FDA approval of class III device upon manufacturer's product liability exposure, and use of favorable decision as a defense: If FDA determines that a computer-aided device is a class III device, and thus a PMA would be required, would an FDA approval of an application serve to reduce the manufacturers' product liability exposure, such that FDA's approval could be used at least as a partial defense to an action against a manufacturer?

2. Several participants suggested that they would like an explanation of how a CADx decision was reached. This would better allow the physician to judge the reliability of the CADx output, which would otherwise be just emerging from a "black box." Others, however, indicated that this is overly simplistic. For any difficult problem, it is very hard to provide a simple explanation of the CADx system "reasoning." It was further suggested that what the user of a CADx system really needed was an indication of the consistency and accuracy of the diagnostic information, not a description of how the decision was reached.

3. Numerous speakers cited the usefulness and validity of receiver operating characteristic (ROC) analysis. The ROC curve is a plot of the variation of the true positive fraction as a function of the false positive fraction (sensitivity vs. one minus specificity). The ROC curve is obtained by varying the threshold criterion for deciding between positive and negative diagnoses from more conservative to less conservative. It therefore includes information on all system operating points (sensitivity/specificity pairs) and is independent of disease prevalence. A particular benefit of the method is that it allows the separation of technology assessment from practice-of-medicine issues. One participant was concerned with Gaussian assumptions (not fundamental to ROC theory but made by many ROC analysis programs). Discussion also ensued concerning the variability of human observer performance and the difficulty this causes for the evaluation of a machine "observer." The complication that diagnostic tasks are not typically binary (as required for conventional ROC analysis) but have multiple possible outcomes (diagnoses) was also raised.

4. The suggestion was made that the evaluation of commercial CADx devices should be similar to that of the scientific peer review process. The machine algorithm and representative data should be available for outside professionals to carry out disinterested confirmation of the manufacturer's results. The fear of compromising trade secret or other proprietary information seemed to temper the enthusiasm of commercial participants to this suggestion.

5. Questions were raised concerning the availability of guidance on other software matters. It was noted that in addition to the overall software policy, there are Center groups considering policy with regard to commercial off the shelf (COTS) software and developing design control guidance as part of the GMP revisions efforts. All of these efforts will be soliciting public comment.

6. It was noted that CADx algorithms may be very sensitive to the particular sensors used in obtaining training data. Great care must be exercised in determining the range of input sensors for which the device functions accurately. Furthermore, it was noted that often in the evaluation of CADx devices there is a commingling of the training and testing sets. This must be avoided in order to obtain an unbiased performance estimate.

7. Compression was mentioned as a source of performance degradation for CADx devices. When large data sets are needed, (lossy) compression may be required. Its effect must be examined carefully.

8. One participant noted that a liberal interpretation of the medical device definition would result in clinical guidelines being considered as medical devices. Despite their ubiquitous presence in the field, very few have been properly validated.

9. The presentation of CADx results and the labeling of CADx devices in terms of probabilities was discussed. This was felt by many participants to be desirable; however, it was suggested that the clinician "user" population was not sufficiently sophisticated to understand data presented in that way.

10. The problem of CADx false positives was raised. CADx "attention getting" systems typically point to many areas where no abnormality exists. This was felt to be a natural attribute of these systems and an aspect which should be addressed through user training and experience. As long as these systems are only "aiding" in the diagnosis, they should not be held to the same standards as a device actually making the diagnosis.

4:00
Closing Remarks and Adjournment
Mary P. Anderson, ScD
Chief, MICAB

Thank you for participating in the computer-aided diagnosis device workshop. We have heard today a few ideas on the categories of CADx devices and even more on evaluation methods, especially the use of the receiver operating characteristic curve. Further written comments are solicited and may be faxed to us at (301) 443-9101. This input will aid us in preparing reviewer guidance for the premarket clearance of these devices. As a reminder to the speakers, please get to me a copy of your overheads and, if possible, your talk. I will submit these to Dockets Management for docket number 95N-0363 where they will be available to the public. In addition, if the speakers will provide me with a brief summary of their talks we will compile a meeting summary which we will mail to all persons who have registered for this workshop. In addition, the summary will be available on the World Wide Web at the URL http://www.fda.gov.

(March 6, 1996)

horizonal rule

CDRH Home Page | CDRH A-Z Index | Contact CDRH | Accessibility | Disclaimer
FDA Home Page | Search FDA Site | FDA A-Z Index | Contact FDA | HHS Home Page

Center for Devices and Radiological Health / CDRH