|
||||||||||||
|
Scientific Data Documentation
Public Information Data (1991)AIDS: Public Information Data (1991) ABSTRACT Summary Public health surveillance represents an ongoing and regular collection, analysis, interpretation, and application of health data for disease prevention and control. AIDS surveillance, like other national surveillance efforts, depends on health-care providers and the state and local health departments and, thus, requires a balance between information needs versus practical limitations. AIDS surveillance in the United States has achieved a high degree of completeness relative to other notifiable diseases. In addition, the surveillance system has been modified as understanding of AIDS and HIV infection has grown. Users of the AIDS Public Information Data Set should be familiar with the characteristics of public-health surveillance in general as well as with the evolution of AIDS surveillance. General Information The AIDS Public Information Data Set is created twice a year by the Division of HIV/AIDS, National Center for Infectious Diseases, Centers for Disease Control (CDC) and consists of a data file containing 44 variables extracted from CDC's national AIDS surveillance data base and a documentation file which contains cross tabulations of 8 of these variables. The documentation file contains one set of tables for the entire United States, one set for each state, and one set for each Metropolitan Statistical Area (MSA) with 500,000 or more population. This data set is distributed using software called SETS, developed by CDC's National Center of Health Statistics. SETS is menu-driven and allows you to create cross tabulations without using a statistical software package such as SAS or dBASE. It also incorporates the metropolitan area and state tables previously distributed on microfiche. Users who want to continue using these data with a statistical or data management package must first load SETS, and then use the SETS export feature to create and ASCII data file. See Appendix A or the online documentation for more information. This manual describes the data set. It is divided into three sections and two appendices. On-line help screens provide additional documentation for SETS. Section 1, AIDS Surveillance in the United States, describes the data collection process and the effect changes in this process may have on data analysis and interpretation. The section reviews the source of AIDS surveillance data and describes which patients are included in the Centers for Disease Control (CDC) definition for AIDS. It also discusses reporting delays and reporting completeness. Section 2, Data File Variables and Coding Schemes, lists the variables included on the data file and describes each variable's coding scheme. Section 3, MSA and State Tables, describes the frequency tables and cross tabulations included on the documentation file. Appendix A: Loading the SETS Software, describes how to load and run SETS on your computer. It also suggests computer hardware and software you can use to analyze the data. Appendix B: Metropolitan Statistical Areas lists the MSAs included in the data set. Assurance of Confidentiality The data and documentation files on the enclosed diskettes contain informa- tion abstracted from acquired immunodeficiency syndrome (AIDS) case reports received by CDC. These data have been reported voluntarily to CDC by state and local health departments, and are protected under the Assurance of Confidentiality (Sections 306 and 308(d) of the Public Health Service Act, 42 U.S.C. 242k and 242m(d)), which prevents disclosure of any information that could be used to directly or indirectly identify patients or establishments. The statistical data contained in the AIDS Public Information Data Set are being released for public use in accordance with the Assurance and do not identify patients directly, nor do they contain information that can identify patients indirectly.BACKGROUND In 1981, after early reports of Pneumocystis carinii pneumonia, Kaposi's sarcoma, and other opportunistic infections in young homosexual men in Los Angeles, New York City, and San Francisco, CDC began surveillance for a newly recognized constellation of diseases, now termed the acquired immunodeficiency syndrome (AIDS). CDC developed a surveillance case definition for this syndrome and initially received case reports directly from health- care providers and state and local health departments. As the epidemic became more widespread, state and local health departments began to assume the responsibility for AIDS surveillance, and by 1985 all states had regulations requiring physicians and other health-care providers to report AIDS cases directly to state or local health departments. These health departments then share the reports with CDC, which produces the national AIDS surveillance data set. The goals of AIDS surveillance have been to monitor both trends in AIDS cases and the scope of severe morbidity due to infection with the human immunodeficiency virus (HIV). Advances in the understanding of the epidemiology and manifestations of HIV infection and changing diagnostic practices, however, present multiple challenges to those analyzing and interpreting the AIDS surveillance data. The following are a few examples: - A wide variety of persons are at risk for AIDS, including homosexual or bisexual men, intravenous drug users, transfusion or tissue transplant recipients, heterosexual partners of infected persons (including persons born in "Pattern-II" countries - certain Caribbean and central African countries where heterosexual transmission predominates), children born to infected mothers, and persons with mucous membrane or percutaneous exposure to blood or body fluids of infected persons (e.g., health-care workers). Because homosexual/bisexual males comprise such a large proportion of the total number of AIDS cases, trends in this subgroup will overshadow those in other groups unless the data are examined separately. Analysis of data, without regard to specific subgroups, may conceal information or lead to misinterpretation of the data. - The etiologic agent of AIDS, HIV, has been identified, and diagnostic tests for infection with this virus have been developed. As a result, the surveillance of AIDS, initially dependent on the presence of certain indicator diseases specific for the infection, has been expanded to include additional diseases (perhaps less specific for HIV infection) in the presence of laboratory evidence for infection. Introduction of these diagnoses has affected trends in AIDS cases. - Diagnostic practices have changed over time and vary geographically. AIDS is now a common diagnosis in many hospitals and clinics, and definitive diagnostic tests for manifestations of HIV infection (e.g., Pneumocystis carinii pneumonia or esophageal candidiasis) may not be done. HIV testing is not performed on all patients. Geographic variations in diagnostic practices and changes over time could markedly affect trends in AIDS surveillance.DESCRIPTION OF POPULATION Source of AIDS Surveillance Data CDC maintains national surveillance of AIDS through the receipt of AIDS case reports submitted by individual state and local health departments. Health departments either submit the case report forms directly or they report cases electronically through a CDC developed microcomputer system. All 50 states, the District of Columbia, and U.S. territories and possessions (including Puerto Rico, the Virgin Islands, Guam, and certain Pacific islands) report AIDS cases to CDC. Although state and local health departments share AIDS surveillance data with CDC, the responsibility and authority for AIDS surveillance rests with the individual health departments. Like any reportable disease, the completeness of AIDS reporting reflects the aggressiveness with which these health departments solicit case reports. Health departments may depend on health- care providers to know and comply with reporting requirements. Alternatively, health departments may regularly contact and interact with health-care facilities or individual providers to stimulate disease reporting. CDC has developed guidelines to assist health departments in stimulating AIDS case reporting and has encouraged them to take an active rather than passive approach to AIDS surveillance. Through surveillance cooperative agreements supported by CDC, health departments are encouraged to identify health-care facilities that serve AIDS patients and work closely with these facilities to promote reporting. They are also encouraged to send newsletters to health- care providers and attend professional organization meetings, and to use existing alternative data sources to identify AIDS cases, including death certificates, laboratory reports, and tuberculosis and tumor registries. States vary widely in the structure and organization of their surveillance systems and, therefore, in the completeness of their case reporting. Case Definition AIDS surveillance does not encompass all manifestations of infection with HIV, but only severe, life-threatening diseases highly specific for the infection, as delineated in the CDC AIDS case definition. Before HIV was identified as the etiologic agent for AIDS, CDC defined a case of AIDS as a disease, at least moderately indicative of a defect in cell-mediated immunity, occurring in a person with no known cause for diminished resistance to the disease. Such diseases included Pneumocystis carinii pneumonia, Kaposi's sarcoma, and many other serious opportunistic infections (see American Journal of Medicine, March 1984, pages 493-500). With identification of HIV as the causative agent for AIDS and the availability of laboratory tests to detect HIV antibody, the case definition was expanded to reflect an increased understanding of HIV infection. The case definition was revised in 1985 (see CDC's Morbidity and Mortality Weekly Report, June 28,1985, pages 373-375) and again in 1987 (see Morbidity and Mortality Weekly Report, August 14,1987, Supplement, pages 3S-15S). These revisions applied to persons with laboratory evidence for HIV infection. Among diseases added in 1985 were disseminated histoplasmosis, chronic isosporiasis, and certain non-Hodgkin's lymphomas. Among those added in 1987 were extrapulmonary tuberculosis, HIV encephalopathy, and HIV wasting syndrome. In children, recurrent, serious bacterial infections were also added. In addition, the 1987 revision allowed certain indicator diseases to be diagnosed on a presumptive rather than confirmed basis. While the reported incidence of AIDS increased only 3 to 4 percent as a result of the 1985 revision, roughly one fourth of all cases that were both diagnosed and reported in the year following the 1987 revision met only the additional criteria included in the 1987 revision. Furthermore, the proportion of cases meeting only the new criteria was higher in Hispanics and non-Hispanic blacks than in non-Hispanic whites, higher in heterosexual intravenous drug users, and lower in men who have sex with men. Due to the large number of cases meeting only the revised case definition and to the inconsistent use of the revised case definition in different populations, analyses of trends in AIDS cases must take these revisions into account.VARIABLES AND THEIR CATEGORIES Data File Variables and Coding Schemes The data file included in the AIDS Public Information Data Set conatins one line of data for each AIDS case reported to CDC. Each line contains 62 columns. The columns contain 44 variables extracted from CDC's national AIDS data set. Column Variable Description 1 age Age group at diagnosis of the first AIDS-indicator opportunistic disease 2 sexclass Sexual classification of patient 3 race Race of patient 4 msa Region of residence 5-8 dxdate Month of diagnosis of first AIDS-indicative opportunistic disease 9-12 repdate Date when CDC first received information about the case 13 death Vital status of the patient 14-17 deathqtr Quarter of death for patients reported dead 18-19 ptgroup Patient grouping by mode of exposure to HIV 20 nir No Identified Risk. Status of investigations for patients reported without known risk of exposure to HIV 21 multrisk Indicates if patient had more than one risk of exposure to HIV 22 birth Country of birth 23 categ Indicates which of the CDC AIDS case revisions the patient meets 24 bact Bacterial infections, multiple or recurrent (including Salmonella septicemia). Applicable in pediatric cases only. 25 burkl Lymphoma, Burkitt's (or equivalent term) 26 candesop Candidiasis, esophageal 27 candlung Candidiasis, bronchi, trachea, or lungs 28 cmv Cytomegalovirus disease (other than in liver, spleen, or nodes); onset at > 1 month of age 29 cmvret Cytomegalovirus retinitis (with loss of vision) 30 cocci Coccidioidomycosis, disseminated or extrapulmonary 31 cryptoco Cryptococcosis, extrapulmonary 32 cryptosp Cryptosporidiosis, chronic intestinal 33 dementia HIV encephalopathy 34 histo Histoplasmosis, disseminated or extrapulmonary 35 HS Herpes simplex: chronic ulcer(s) (>1 month duration); or bronchitis, pneumonitis, or esophagitis 36 ibl Lymphoma, immunoblastic (or equivalent term) 37 iso lsosporiasis, chronic intestinal (> 1 month duration) 38 KS Kaposi's sarcoma 39 lip Lymphoid interstitial pneumonia and/or pulmonary lymphoid hyperplasia. Applicable in pediatric cases only. 40 mavium Mycobacterium avium complex or M.kansasii, disseminated or extrapulmonary 41 myco Mycobacterium, of other species or unidentified species, disseminated or extrapulmonary 42 pc Pneumocystis carinii pneumonia 43 plb Lymphoma, primary in brain 44 pml Progressive multifocal leukoencephalopathy 45 sals Salmonella septicemia. Applicable in adult cases only. 46 tb M.tuberculosis, disseminated or extrapulmonary 47 tp Toxoplasmosis of brain, onset at > 1 month of age 48 wasting Wasting syndrome due to HIV 49 s_bi Sex with a bisexual man (women only) 50 s_iv Sex with an IV drug user 51 s_other Sex with a person with hemophilia, a person born in a Pattern-II country, or a transfusion recipient 52 s_hiv Sex with a person known to be infected with HIV or to have AIDS 53-56 deathrep Date when death was reported to CDC 57-62 adjwgt Reporting delay adjustment weight Each of these variables is coded numerically. For example, column 13 contains either "0" or "1". These numbers represent the variable death. The number "0" in this column indicates that CDC has not received a death notification for this case. A value of "1" indicates that CDC has been notified that this patient died. The codes used in the AIDS Public Information Data Set are printed below. Age (column 1) This variable contains the patient's age when he or she was first diagnosed with an AIDS-indicator disease. 0 = Less than 1 year old 1 = 1 to 12 years old 2 = 13 to 19 years old 3 = 20 to 24 years old 4 = 25 to 29 years old 5 = 30 to 34 years old 6 = 35 to 39 years old 7 = 40 to 44 years old 8 = 45 to 49 years old 9 = 50 years old or older Sexclass (column 2) Adult/adolescent males are classified according to their sexual orientation. 1 = Adult/adolescent homosexual male 2 = Adult/adolescent bisexual male 3 = Adult/adolescent heterosexual male or pediatric male 4 = Female (both adult/adolescent and pediatric) Race (column 3) 1 = White (not Hispanic) 2 = Black (not Hispanic) 3 = Hispanic 9 = Asian/Pacific Islander, American Indian/Alaskan Native, or unknown MSA (column 4) Region of residence is identified for adult/adolescent patients who live in MSAs with more than 1 million population, according to the 1990 census. Residence is defined as place of residence at onset of illness suggestive of AIDS. The MSA variable is coded as: 0 = Not in an MSA, Population less than 50,000. 1 = Northeast Bergen-Passaic, N.J.; Boston, Mass.; Hartford, Conn.; Nassau-Suffolk, N.J.; New York, N.Y.; Newark, N.J.; or Rochester, N.Y. 2 = Central Chicago, Ill.; Cincinnati, Ohio; Cleveland, Ohio; Columbus, Ohio; Denver, Colo.; Detroit, Mich.; Indianapolis, Ind.; Kansas City, Mo.; Milwaukee, Wis.; Minneapolis-Saint Paul, Minn.; or Saint Louis, Mo. 3 = West Anaheim, Calif.; Los Angeles, Calif.; Oakland, Calif.; Phoenix, Ariz.; Portland, Oreg.; Riverside-San Bernardino, Calif.; Sacramento, Calif.; Salt Lake City, Utah; San Diego, Calif.; San Francisco, Calif.; San Jose, Calif.; or Seattle, Wash. 4 = South Atlanta, Ga.; Charlotte, N.C.; Dallas, Tex.; Fort Lauderdale, Fla.; Fort Worth, Tex.; Houston, Tex.; Miami, Fla.; New Orleans, La.; Orlando, Fla.; San Antonio, Tex.; San Juan, P.R.; or Tampa, Fla. 5 = Mid-Atlantic Baltimore, Md.; Norfolk, Va.; Philadelphia, Pa.; Pittsburgh, Pa.; or Washington, D.C. 9 = In an MSA with population less than 1 million, but greater than 50,000. Dxdate (columns 5 through 8) This variable contains the year and month in which the first AIDS-indicator disease was diagnosed. Columns 5 and 6 contain the year; columns 7 and 8 contain the month. Cases diagnosed before 1982 are coded as "8199". Repdate (columns 9 through 12) This variable contains the year and month in which CDC received the case report. Columns 9 and I0 contain the year; columns 11 and l2 contain the month. Cases reported during 1981 are coded as "8199". Death (column 13) 0 = CDC has not received a death notification for this case 1 = CDC has been notified that this patient died Deathqtr (columns 14 through 17) For patients whose death has been reported to CDC, this variable contains the year and quarter of death. Columns 14 and 15 contain the year; columns 16 and 17 contain the quarter. For example, the value "8803" indicates that the patient died in July, August, or September, 1988. Patients who are known to have died, but whose date of death is unknown are coded as "9999." Ptgroup (columns 18 and 19) For surveillance purposes, AIDS patients are grouped into a hierarchy of exposure categories. Persons with more than one reported mode of exposure to HIV are counted in the exposure category listed first in the hierarchy, except for persons with a history of both homosexual/bisexual contact and intravenous drug use. They are counted in a separate category. Persons with multiple reported modes of exposure are indicated in the variable multrisk. "Pattern II" is a term adopted by the World Health Organization, and refers to countries with a distinctive pattern of HIV transmission. It is observed in areas of central, eastern, and southern Africa and in some Caribbean countries. In these countries, most of the reported cases occur in heterosexuals; the male to female ratio is approximately 1 to 1; and perinatal transmission is more common than in other areas. Intravenous drug use and homosexual transmission either do not occur or occur at low levels. "Other/undetermined" cases are in persons with no reported history of exposure to HIV through any of the routes listed in the hierarchy of exposure categories. Undetermined cases include persons who are currently under investigation by local health department officials; persons whose exposure history is incomplete because of death, refusal to be interviewed, or loss to follow-up; and persons who were interviewed or for whom other follow-up information was available and no exposure mode was identified. Adult/adolescent exposure categories 1 = Male homosexual/bisexual contact 2 = Intravenous (IV) drug use (female and heterosexual male) 3 = Male homosexual/bisexual contact and IV drug use 4 = Hemophilia/coagulation disorder 5 = Heterosexual contact with a person with, or at increases risk for, HIV infection 6 = Born in Pattern-II country 7 = Receipt of transfusion of blood, blood components, or tissue 8 = Other/undetermined Pediatric exposure categories 9 = Hemophilia/coagulation disorder 10 = Mother with, or at risk for, HIV infection 11 = Receipt of transfusion of blood, blood components, or tissue 12 = Other/undetermined NIR (column 20) NIR (no identified risk) is coded only for patients whose mode of exposure to HIV is coded as undetermined in ptgroup. 1 = Patient currently under investigation 2 = Patient died, red interview, or is lost to follow-up 3 = Patient investigation complete but no mode of exposure was identified Multrisk (column 21) Multrisk is coded only for adult/adolescent patients (13 years old or older) and indicates if the patient has risk(s) of exposure to HIV other than the one indicated by ptgroup. 0 = Patient's only mode of exposure to HIV is that indicated by ptgroup 1 = Patient has additional risk(s) of exposure Birth (column 22) 1 = Patient was born in the United States or its dependencies and possessions, or place of birth was not specified 2 = Patient was born in Pattern-II country 3 = Patient was born in a foreign country which is not Pattern II Categ (column 23) This variable reflects changes made over time to the CDC surveillance definition for AIDS. Only cases meeting the current (1987) surveillance definition are included in this data set. Categ indicates whether the patient also met the pre-1985 or 1985 surveillance definition, and whether the diagnosis, if it meets only the 1987 definition, was definitive or presumptive. Cases that meet more than one of these surveillance definitions are classified into the definition category listed first. For more information about the 1987 definition, see Morbidity and Mortality Weekly Report, August 14,1987, Supplement, pages 3S-15S. 1 = Case meets the pre-1985 surveillance definition 2 = Case meets the 1985 surveillance definition 3 = Case meets the 1987 surveillance definition and was diagnosed definitively 4 = Case meets the 1987 surveillance definition and was diagnosed presumptively AIDS-indicator opportunistic diseases (columns 24 through 48) Columns 24 through 48 contain information about each of the AIDS-indicator diseases listed on the AIDS confidential case report form. Each of these variables is one character long and is coded as follows: 0 = AIDS-indicator opportunistic disease was not diagnosed 1 = AIDS-indicator opportunistic disease was diagnosed definitively 2 = AIDS-indicator opportunistic disease was diagnosed presumptively Heterosexual risk information (columns 49 through 52) These variables (s_bi, s_iv, s_other, and s_hiv) contain additional risk information for patients infected heterosexually. All 4 variables are coded as follows: 0 = no 1 = yes 9 = missing/unknown The variable s_bi is coded only for women (for men, the variable contains a blank). All 4 variables contain "9" (missing/unknown) for patients with hemophilia, regardless of whether the risk information is in fact unknown. This restriction is neck in order to comply with the Assurance of Confidentiality on page 5. Of the 1,535 AIDS cases reported through June 1991 among adults/adolescents with hemophilia, less than 3 percent also reported heterosexual contact with a person at increased risk for AIDS or HIV infection. Deathrep (columns 53 through 56) For patients whose death has been reported to CDC, this variable contains the year and quarter when CDC received the report. Columns 53 and 54 contain the year; columns 55 and 56 contain the quarter. For example, the value "8803" indicates that the patient's death was reported to CDC in July, August, or September, 1988. CDC began collecting this variable in October 1987. Deaths reported to CDC before October 1987 are coded as "8799". Adjwgt (columns 56 through 62) This variable contains an adjustment weight which, when used as a weighting variable in a frequency tabulation, produces tabulations of AIDS cases that are adjusted for delays in case reporting (see page 10 for a discussion of delays in reporting). The weights are based on estimated reporting delay distributions that take into account exposure, geographic, and demographic variations in case reporting. The adjustment weights and the resulting tabulations are not reliable for cases diagnosed during the most recent 3 to 6 months. Please note, this variable must not be used for tabulations involving dates of report to CDC (repdate), the living status of a patient (death), the date of death (deathqtr), or the date when a death was reported to CDC (deathrep). It is reasonable to use this variable for tabulations involving any other variable in the data set, including date of diagnosis.METHODS Case report form Separate case report forms are used for pediatric patients (those less than 13 years of age at the time of diagnosis) and adult/adolescent patients (those 13 years of age or older at the time of diagnosis). Although the forms are very similar, the pediatric form includes risk factor information for the child's mother. These forms are completed by the health-care provider or by the AIDS surveillance staff in the local or state health department. Names are retained by the state or local health department and are converted to an alpha-numeric code called "soundex" for use by CDC. CDC does not receive names of persons with AIDS. Because more than one state may report an individual case, CDC screens reported cases by soundex code and date of birth to cull duplicate reports. The variables available on the AIDS data set are listed in section 2. However, a few deserve special comment. - Living status. Patients survive for a variable amount of time following the diagnosis of AIDS. Because death usually occurs after the initial report to CDC, case reports may not be updated to reflect the change in living status. As a result, reporting of death among AIDS patients may be incom- plete. - Exposure category. Some patients may have more than one mode of exposure to HIV. For surveillance purposes, AIDS cases are counted only once in a hierarchy of exposure categories (see section 2, pages 16 and 17). This hierarchy is based on the most likely source of HIV infection. Persons with more than one reported mode of exposure are counted in the category that appears first in the hierarchy, except for persons with a history of both male homosexual/bisexual contact and intravenous drug use. They are counted in a separate category. - Diseases indicative of AIDS. Patients may develop additional diseases indicative of AIDS after their initial AIDS diagnosis. The case report form may not be updated to reflect additional diseases. Therefore, proportions of patients with the various AIDS-indicator disease should be considered minimal estimates. - Date of diagnosis. CDC collects only one diagnosis date per patient, i.e., the date when he or she was initially diagnosed with an AIDS-indicator disease. Patients who develop additional diseases do not receive additional diagnosis dates. Therefore, for patients with multiple AIDS-indicator diseases, you cannot determine which disease occurred first. Special Case Investigations Certain AIDS cases receive special follow-up by state and local health departments. Investigations are frequently performed after the initial case report to CDC. Case updates are incorporated into the data set as they are available to CDC. - No identified risk (NIR) patients. NIR patients are those reported without any recognized mode of exposure to HIV. Approximately 3 percent of cases are NIR patients at any one time. However, when additional information can be obtained for these patients, approximately 75 percent are reclassified into a known exposure category. For those not reclassified, the demographic profile is more similar to that of other persons with AIDS than to the general U.S. population. - Health-care workers. Ninety-five percent of health-care workers with AIDS are classified into a known exposure category. Of the health-care workers with an undetermined mode of exposure to HIV, less than one third cannot be reclassified after investigation. Delay in Reporting The timeliness of AIDS case reporting to CDC depends on several factors. These include the volume of cases reported from a state or locality and the availability of staff to complete ase report forms. In many instances, initial case report forms are incomplete and require additional follow-up by state and local health department staff, including reviews of other record systems and contact with healthcare providers. About 55 percent of all cases are reported to CDC within 3 months of the date of diagnosis, but about 20 percent are reported more than a year after diagnosis. Delays vary widely among exposure, geographic, racial/ethnic, and age categories. They are substantially longer for pediatric cases and for transfusion-associated cases in adults. Due to the delay, the number of cases diagnosed during any period often exceeds the number reported during that period. This is particularly important in examining trends over time, since many cases in recent periods of time will not yet be reported. To account for delays in the reporting of cases, a variable called adjwgt has been added to the data set. This variable may be used to weight each case on the data set and obtain adjusted case counts. For example, summing adjwgt for cases would estimate the number of cases diagnosed through the time period covered by the data set. Early Reporting Dates Before 1990, CDC occasionally received reports on patients before they met the CDC AIDS case definition. If such patients were later diagnosed with AIDS, the diagnosis date on their record (indicating when the patient first met the CDC definition) would be after the report date (when CDC first received information about the patient). Such records should be excluded from certain analysis. CDC's AIDS surveillance data base no longer receives reports on patients who do not meet the AIDS case definition. Follow-up of Reported AIDS Cases AIDS case records maintained at CDC contain all information reported to date from state or local health departments. As patients progress through their illness, additional diseases and conditions may be reported, or the patient's vital status may change. However, not all health departments have the resources to routinely follow-up patients for additional information, including vital status. For this reason and because many patients move out of the reporting health department's jurisdiction, CDC records do not always contain all current information for each patient. Non-reporting and Evaluation of AIDS Surveillance Cases of AIDS may not be reported to CDC for a variety of reasons. The diagnostic tests needed to confirm the AIDS diagnosis may not be performed, or physicians and hospital personnel may fail to report cases to the health department. Further, some patients with HIV disease may be ill or die from diseases or conditions not included in the current AIDS surveillance definition. Both CDC and state and local health departments have commissioned a variety of studies to evaluate the completeness of AIDS surveillance. Most evaluation projects have used alternate data resources, such as death certificates, hospital discharge records, and laboratory records. Individual records from these alternate sources have then been matched against records in AIDS surveillance data bases. Evaluation studies have varied in size and scope (e.g., varying numbers of ICD-9 codes from death certificates or computerized discharge records), geographic area covered, detection of both inpatient and outpatient cases, and time frames. In general, estimates from these studies suggest that the completeness of reporting varies considerably, from 56 to 100 percent. High-prevalence areas for AIDS appear to have have more complete reporting than low-prevalence areas.TECHNICAL INFORMATION Memory/Storage Requirements and Mediums The AIDS Public Information Data Set contains large quantities of data and requires significant computer resources for analysis. You need a 386-based MS-DOS microcomputer with at least 30 megabytes of disk storage and a high-density diskette drive. The new SETS interface allows you to dislay simple statistics without additional software such as SAS, SPSS, BMPD, or PRODAS. More complex analysis, however, will still require additional software. To transfer the data to another software package or to a mainframe computer for analysis, you must first load SETS, then use the Export option to extract the records and variables you wish to analyze. The Export option will create an ASCII data file, which can then be processed by other software, or transferred to your mainframe using software designed for this purpose. Examples of transfer programs include Crosstalk and Hayes SmartCom. Loading SETS The AIDS Public Information Data Set consists of over 10 diskettes. To install it onto your computer, insert diskette #1 in drive A and type the following DOS commands a: <ENTER> install <ENTER> The first command changes the current drive to A, and the second command begins the installation process. Please note that the first diskette is also the last diskette, i.e., you will need to process it at the beginning and the end of the installation procedure. You will need at least 30 minutes to install this software. Once you have installed SETS, type the command sets <ENTER> to run the program. SETS is a menu-driven program which can be mastered with minimum effort. Getting Help You can access help from SETS in two ways, by pressing the <F1> key at any time you are running SETS, or by selecting the Browse feature. Once you select Browse, select Documentation, then SETS Manual. SETS Features From the SETS main menu, you can select the following options: BROWSE-to browse the documentation, MSA and state tables, and the main data file. Browsing the main data file allows you to display the variable names and value labels contained in the data file. TABLE-to create and display cross tabulations of any of the variables in the data set. Tabulations are displayed in a spreadsheet format which can be saved and loaded onto the Lotus software. SUBSET-to specify which variables and records should be included in tabulations or in exported files. DEFAULTS-to adjust the setting for export drives and directories, and for the autosave feature and screen colors. Creating Tables After you install the SETS program, you can create a table by following these steps: 1. From the SETS directory, type sets <ENTER>, to run the program. 2. When the program appears, press <ENTER> until the main menu appears. 3. Use the arrow keys to highlight "Table" and press <ENTER>. 4. At "Display" press <ENTER>. 5. At the screen, "What would you like to do," press the <ENTER> key to select, "Create a new record subset." 6. Type "all" to select all records and press <ENTER>. 7. When the spreadsheet appears, press the <F2> key, edit. 8. Press the <F6> key for table expression assistance. 9. To create a table of SEXCLASS by RACE, begin by using the arrow keys to highlight the variable SEXCLASS, the press <ENTER> to select this variable. 10. Use the arrow keys to highlight the variable RACE, then press <ENTER> to select it. 11. Press the <F10> key to accept these two variables. 12. The spreadsheet will reappear with the text, "Edit: SEXCLASS, RACE" displayed at the bottom of the screen. 13. Type the text, "/labels" and press <ENTER>. Do not type the quotation marks. The SETS program will create the table. This process can take half and hour or longer, depending on the speed of your machine. Additional detail on how to create tables is provided in the on-line documentation. MSA and State Tables The microfiche contain frequency tables and cross tabulations of 8 variables extracted from CDC's national AIDS surveillance data set. They contain one set of tables for the entire United States, one set for each state, and one set for each MSA. The variables are: Variable Description age Age group at diagnosis of the first AIDS-indicator disease categ Indicates which of the CDC AIDS case revisions the patient meets dth_hyr Half-year of death for patients reported dead dx_hyr Half-year of diagnosis of first AIDS-indicator disease ent_hyr Half-year in which CDC first received information about the case ptgrp Patient grouping by mode of exposure to HIV. race Race of patient sex Sex of patient The values used for these variables are printed below. Age This variable contains the patient's age when he or she was first diagnosed with an AIDS-indicator disease. Ages printed on the microfiche are grouped as follows: 0 - 1 1 - 12 13 - 19 20 - 29 30 - 39 40 - 49 50 + Categ This variable reflects revision made to the CDC surveillance definition for AIDS. Only cases meeting the current (1987) surveillance definition are included on the microfiche. Categ indicates whether the patient also meets the pre-1985 or 1985 surveillance definition, and whether the diagnosis, if it meets only the 1987 definition, was definitive or presumptive. Cases that meet more than one of these surveillance definitions are classified into the definition category listed first. For more information about the 1987 definition, see Morbidity and Mortality Weekly Report, August 14,1987, Supplement, pages 3S-15S. 1 = Case meets the pre-1985 surveillance definition 2 = Case meets the 1985 surveillance definition 3 = Case meets the 1987 surveillance definition and was diagnosed definitively 4 = Case meets the 1987 surveillance definition and was diagnosed presumptively Dth_hyr For patients whose death has been reported to CDC, this variable contains the half-year of death. The first two numbers indicate the year; the second two indicate the first or second half of that year. For example, the value "8802" indicates that the patient died in the second half of 1988. Patients whose death has been reported to CDC, but whose date of death is unknows are coded as "9999". Dx_hyr This variable contains the half year in which the first AIDS-indicator disease was diagnosed. The first two numbers indicate the year; the second two indicate the first or second half of that year. Ent_hyr This variable contains the half-year in which CDC received the case report. The first two numbers indicate the year; the second two indicate the first or second half of that year. Ptgrp For surveillance purposes, AIDS patients are grouped into a hierarchy of exposure categories. Persons with more than one reported mode of exposure to HIV are counted in the exposure category listed first in the hierarchy, except for persons with a history of both homosexual/bisexual contact and intravenous drug use. They are counted in a separate category. "Pattern II" is a term adopted by the World Health Organization, and refers to countries with a distinctive pattern of HIV transmission. It is observed in areas of central, eastern, and southern Africa and in some Caribbean countries. In these countries, most of the reported cases occur in heterosexuals; the male to female ratio is approximately 1 to 1; and perinatal transmission is more common than in other areas. Intravenous drug use and homosexual transmission either do not occur or occur at low levels. "Other/undetermined" cases are in persons with no reported history of exposure to HIV through any of the routes listed in the hierarchy of exposure categories. Undetermined cases include persons who are currently under investigation by local health department officials; persons whose exposure history is incomplete because of death, refusal to be interviewed, or loss to follow-up; and persons who were interviewed or for whom other follow-up information was available and no exposure mode was identified. 01 = Male homosexual/bisexual contact 02 = Intravenous (IV) drug use (female and heterosexual male) 03 = Male homosexual/bisexual contact and IV drug use 04 = Hemophilia/coagulation disorder 05 = Heterosexual contact with a person with, or at increased risk for, HIV infection 06 = Born in Pattern-II country 07 = Receipt of transfusion of blood, blood components, or tissue 08 = Adult/adolescent other/undetermined 09 = Pediatric hemophilia/coagulation disorder 10 = Mother with, or at risk for, HIV infection 11 = Pediatric receipt of transfusion of blood, blood components, or tissue 12 = Pediatric undetermined Race 1 = White (not Hispanic) 2 = Black (not Hispanic) 3 = Hispanic 4 = Asian/Pacific Islander 5 = American Indian/Alaskan Native 9 = Unknown Sex 1 = Male 2 = Female Locating Individual Tables In accordance with CDC guidelines on protecting confidentiality and with an agreement made with state and local health departments for release of these data, entries whose value is 5 or less are not included in the tables. Only MSAs with 500,000 or more population (according to 1991 census estimates) are included on the microfiche. The AIDS Public Information Data Set contains frequency tables of 8 variables, and every possible 2-way cross tabulation of those variables for each state, each MSA with 500,000 or more population, and for the entire United States. Tables for the entire United States also contain cross tabulations of 2 additional variables, STATE and MSA. To access these tables, select the Browse feature on the SETS menu, then select "Documentation." A menu will appear which divides the country into 9 geographic regions, New England, North Atlantic, Mid-Atlantic, South Atlantic, Mid-West, Great Plains, South Central, Mountain, and Pacific. For example, to access data for New York City, first select the North Atlantic region. SETS will display an list of all states and MSAs in that region, including New York City. To view the tables for any state or MSA in that region, select the name of the state or MSA. SETS will then display the first table for the state or MSA you have selected. It will first display the 1-way frequency tables, 1 table per screen, then the 2-way cross tabulations. Tables are displayed alpha- betically, beginning with AGE and progressing to RACE and SEX. SETS allows you to search for individual table entries within each state or MSA file. Press the <F6> key to begin the search. It will also allow you display or print a particular page in the file. SETS contains on-line documentation that describes the search process in more detail. SETS also allows global searches, i.e. you can search for tables in any of the state or MSA files included in the data set, not just those contained in the current state or MSA file. For example, if you are displaying data for New York City, and want to compare them to data from Los Angeles, you can use the global search function to search for the entry "Los Angeles." SETS would then locate the first table in the Los Angeles file. To begin a global search, press the <F9> key. SAMPLE TABLE(s) OF INFORMATION MSA Codes Definitions for MSAs are issued by the Office of Management and Budget (OMB) to be used in presentation of statistics by agencies of the federal government. The metropolitan areas used on the AIDS Public Information Data Set are the MSAs for all areas except the 6 New England states. For these states, the New England County Metropolitan Areas (NECMA, also defined by OMB) are used. Metropolitan areas are named for a central city in the MSA or NECMA and may include several counties and cross state boundaries. Code Metropolitan area 80 Alkron, Ohio 160 Albany-Schenectady, N.Y. 200 Albuquerque, N.M. 240 Allentown, Pa. 360 Anaheim, Calif. 520 Atlanta, Ga. 640 Austin, Tex. 680 Bakersfield, Calif. 720 Baltimore, Md. 760 Baton Rouge, La. 875 Bergen-Passaic, N.J. 1000 Birmingham, Ala. 1123 Boston, Mass. 1163 Bridgeport, Conn. 1280 Buffalo, N.Y. 1440 Charleston, S.C. 1520 Charlotte, N.C. 1600 Chicago, Ill. 1640 Cincinnati, Ohio 1680 Cleveland, Ohio 1840 Columbus, Ohio 1920 Dallas, Tex. 2000 Dayton, Ohio 2080 Denver, Colo. 2160 Detroit, Mich. 2320 El Paso, Tex. 2680 Fort Lauderdale, Fla. 2800 Fort Worth, Tex. 2840 Fresno, Calif. 2960 Gary, Ind. 3000 Grand Rapids, Mich. 3120 Greensboro, N.C. 3160 Greenville, S.C. 3240 Harrisburg, Pa. 3283 Hartford, Conn. 3320 Honolulu, Hawaii 3360 Houston, Tex. 3480 Indianapolis, Ind. 3600 Jacksonville, Fla. 3640 Jersey City, N.J. 3760 Kansas City, Mo. 3840 Knoxville, Tenn. 4120 Las Vegas, Nev. 4400 Little Rock, Ark 4480 Los Angeles, Calif. 4520 Louisville, Ky. 4920 Memphis, Tenn. 5000 Miami,FIa. 5015 Middlesex, N.J. 5080 Milwaukee, Wis. 5120 Minneapolis-Saint Paul, Minn. 5190 Monmouth-Ocean City, N.J. 5360 Nashville, Tenn. 5380 Nassau-Suffolk, N.Y. 5483 New Haven, Conn. 5560 New Orleans, La. 5600 New York, N.Y. 5640 Newark, N.J. 5720 Norfolk, Va. 5775 Oakland, Calif. 5880 Oklahoma City, Okla. 5920 Omaha, Nebr. 5960 Orlando, Fla. 6000 Oxnard-Ventura, Calif. 6160 Philadelphia, Pa. 6200 Phoenix, Ariz. 6280 Pittsburgh, Pa. 6440 Portland, Oreg. 6483 Providence, R.I. 6640 Raleigh-Durham, N.C. 6760 Richmond, Va. 6780 Riverside-San Bernardino, Calif. 6840 Rochester, N.Y. 6920 Sacramento, Calif. 7040 Saint Louis, Mo. 7160 Salt Lake City, Utah 7240 San Antonio, Tex. 7320 San Diego, Calif. 7360 San Francisco, Calif. 7400 San Jose, Calif. 7440 San Juan, P.R. 7560 Scranton, Pa. 7600 Seattle, Wash. 8003 Springfield, Mass. 8160 Syracuse, N.Y. 8200 Tacoma, Wash. 8280 Tampa-Saint Petersburg, Fla. 8400 Toledo, Ohio 8520 Tucson, Ariz. 8560 Tulsa, Okla. 8840 Washington, D.C. 8960 West Palm Beach, Fla. 9160 Wilmington, Del. 9243 Worcester, Mass.