Centers for Disease Control and Prevention
CDC HomeSearchHealth Topics A-Z

WONDER Home Search WONDER Frequently Asked Questions WONDER Utilities WONDER Help Contact WONDER
Scientific Data Documentation
Public Information Data (1991)
AIDS:  Public Information Data (1991)

ABSTRACT

Summary

 Public health surveillance represents an ongoing and regular collection,
 analysis, interpretation, and application of health data for disease
 prevention and control. AIDS surveillance, like other national surveillance
 efforts, depends on health-care providers and the state and local health
 departments and, thus, requires a balance between information needs versus
 practical limitations. AIDS surveillance in the United States has achieved a
 high degree of completeness relative to other notifiable diseases. In
 addition, the surveillance system has been modified as understanding of AIDS
 and HIV infection has grown. Users of the AIDS Public Information Data Set
 should be familiar with the characteristics of public-health surveillance in
 general as well as with the evolution of AIDS surveillance.

General Information

 The AIDS Public Information Data Set is created twice a year by the Division
 of HIV/AIDS, National Center for Infectious Diseases, Centers for Disease
 Control (CDC) and consists of a data file containing 44 variables extracted
 from CDC's national AIDS surveillance data base and a documentation file
 which contains cross tabulations of 8 of these variables.  The documentation
 file contains one set of tables for the entire United States, one set for
 each state, and one set for each Metropolitan Statistical Area (MSA) with
 500,000 or more population.

 This data set is distributed using software called SETS, developed by CDC's
 National Center of Health Statistics.  SETS is menu-driven and allows you
 to create cross tabulations without using a statistical software package
 such as SAS or dBASE.  It also incorporates the metropolitan area and state
 tables previously distributed on microfiche.  Users who want to continue
 using these data with a statistical or data management package must first
 load SETS, and then use the SETS export feature to create and ASCII data
 file.  See Appendix A or the online documentation for more information.

 This manual describes the data set. It is divided into three sections and
 two appendices. On-line help screens provide additional documentation for
 SETS.

 Section 1, AIDS Surveillance in the United States, describes the data
 collection process and the effect changes in this process may have on data
 analysis and interpretation. The section reviews the source of AIDS
 surveillance data and describes which patients are included in the Centers
 for Disease Control (CDC) definition for AIDS. It also discusses reporting
 delays and reporting completeness.

 Section 2, Data File Variables and Coding Schemes, lists the variables
 included on the data file and describes each variable's coding scheme.

 Section 3, MSA and State Tables, describes the frequency tables and cross
 tabulations included on the documentation file.

 Appendix A: Loading the SETS Software, describes how to load and run SETS on
 your computer.  It also suggests computer hardware and software you can use
 to analyze the data.

 Appendix B: Metropolitan Statistical Areas lists the MSAs included in the
 data set.

Assurance of Confidentiality

 The data and documentation files on the enclosed diskettes contain informa-
 tion abstracted from acquired immunodeficiency syndrome (AIDS) case reports
 received by CDC. These data have been reported voluntarily to CDC by state
 and local health departments, and are protected under the Assurance of
 Confidentiality (Sections 306 and 308(d) of the Public Health Service Act,
 42 U.S.C. 242k and 242m(d)), which prevents disclosure of any information
 that could be used to directly or indirectly identify patients or
 establishments. The statistical data contained in the AIDS Public Information
 Data Set are being released for public use in accordance with the Assurance
 and do not identify patients directly, nor do they contain information that
 can identify patients indirectly.
BACKGROUND

 In 1981, after early reports of Pneumocystis carinii pneumonia, Kaposi's
 sarcoma, and other opportunistic infections in young homosexual men in
 Los Angeles, New York City, and San Francisco, CDC began surveillance for a
 newly recognized constellation of diseases, now termed the acquired
 immunodeficiency syndrome (AIDS). CDC developed a surveillance case definition
 for this syndrome and initially received case reports directly from health-
 care providers and state and local health departments. As the epidemic became
 more widespread, state and local health departments began to assume the
 responsibility for AIDS surveillance, and by 1985 all states had regulations
 requiring physicians and other health-care providers to report AIDS cases
 directly to state or local health departments. These health departments then
 share the reports with CDC, which produces the national AIDS surveillance
 data set.

 The goals of AIDS surveillance have been to monitor both trends in AIDS cases
 and the scope of severe morbidity due to infection with the human
 immunodeficiency virus (HIV). Advances in the understanding of the
 epidemiology and manifestations of HIV infection and changing diagnostic
 practices, however, present multiple challenges to those analyzing and
 interpreting the AIDS surveillance data. The following are a few examples:

 - A wide variety of persons are at risk for AIDS, including homosexual or
   bisexual men, intravenous drug users, transfusion or tissue transplant
   recipients, heterosexual partners of infected persons (including persons
   born in "Pattern-II" countries - certain Caribbean and central African
   countries where heterosexual transmission predominates), children born to
   infected mothers, and persons with mucous membrane or percutaneous exposure
   to blood or body fluids of infected persons (e.g., health-care workers).
   Because homosexual/bisexual males comprise such a large proportion of the
   total number of AIDS cases, trends in this subgroup will overshadow those
   in other groups unless the data are examined separately. Analysis of data,
   without regard to specific subgroups, may conceal information or lead to
   misinterpretation of the data.

 - The etiologic agent of AIDS, HIV, has been identified, and diagnostic tests
   for infection with this virus have been developed. As a result, the
   surveillance of AIDS, initially dependent on the presence of certain
   indicator diseases specific for the infection, has been expanded to include
   additional diseases (perhaps less specific for HIV infection) in the
   presence of laboratory evidence for infection. Introduction of these
   diagnoses has affected trends in AIDS cases.

 - Diagnostic practices have changed over time and vary geographically. AIDS is
   now a common diagnosis in many hospitals and clinics, and definitive
   diagnostic tests for manifestations of HIV infection (e.g., Pneumocystis
   carinii pneumonia or esophageal candidiasis) may not be done. HIV testing
   is not performed on all patients. Geographic variations in diagnostic
   practices and changes over time could markedly affect trends in AIDS
   surveillance.
DESCRIPTION OF POPULATION

Source of AIDS Surveillance Data

 CDC maintains national surveillance of AIDS through the receipt of AIDS case
 reports submitted by individual state and local health departments. Health
 departments either submit the case report forms directly or they report cases
 electronically through a CDC developed microcomputer system. All 50 states,
 the District of Columbia, and U.S. territories and possessions (including
 Puerto Rico, the Virgin Islands, Guam, and certain Pacific islands) report
 AIDS cases to CDC.

 Although state and local health departments share AIDS surveillance data with
 CDC, the responsibility and authority for AIDS surveillance rests with the
 individual health departments. Like any reportable disease, the completeness
 of AIDS reporting reflects the aggressiveness with which these health
 departments solicit case reports. Health departments may depend on health-
 care providers to know and comply with reporting requirements. Alternatively,
 health departments may regularly contact and interact with health-care
 facilities or individual providers to stimulate disease reporting.

 CDC has developed guidelines to assist health departments in stimulating AIDS
 case reporting and has encouraged them to take an active rather than passive
 approach to AIDS surveillance. Through surveillance cooperative agreements
 supported by CDC, health departments are encouraged to identify health-care
 facilities that serve AIDS patients and work closely with these facilities to
 promote reporting. They are also encouraged to send newsletters to health-
 care providers and attend professional organization meetings, and to use
 existing alternative data sources to identify AIDS cases, including death
 certificates, laboratory reports, and tuberculosis and tumor registries.
 States vary widely in the structure and organization of their surveillance
 systems and, therefore, in the completeness of their case reporting.

Case Definition

 AIDS surveillance does not encompass all manifestations of infection with HIV,
 but only severe, life-threatening diseases highly specific for the infection,
 as delineated in the CDC AIDS case definition. Before HIV was identified as
 the etiologic agent for AIDS, CDC defined a case of AIDS as a disease, at
 least moderately indicative of a defect in cell-mediated immunity, occurring
 in a person with no known cause for diminished resistance to the disease.
 Such diseases included Pneumocystis carinii pneumonia, Kaposi's sarcoma, and
 many other serious opportunistic infections (see American Journal of Medicine,
 March 1984, pages 493-500). With identification of HIV as the causative agent
 for AIDS and the availability of laboratory tests to detect HIV antibody, the
 case definition was expanded to reflect an increased understanding of HIV
 infection. The case definition was revised in 1985 (see CDC's Morbidity and
 Mortality Weekly Report, June 28,1985, pages 373-375) and again in 1987 (see
 Morbidity and Mortality Weekly Report, August 14,1987, Supplement, pages
 3S-15S). These revisions applied to persons with laboratory evidence for HIV
 infection. Among diseases added in 1985 were disseminated histoplasmosis,
 chronic isosporiasis, and certain non-Hodgkin's lymphomas. Among those added
 in 1987 were extrapulmonary tuberculosis, HIV encephalopathy, and HIV wasting
 syndrome. In children, recurrent, serious bacterial infections were also
 added. In addition, the 1987 revision allowed certain indicator diseases to
 be diagnosed on a presumptive rather than confirmed basis.

 While the reported incidence of AIDS increased only 3 to 4 percent as a
 result of the 1985 revision, roughly one fourth of all cases that were both
 diagnosed and reported in the year following the 1987 revision met only the
 additional criteria included in the 1987 revision. Furthermore, the
 proportion of cases meeting only the new criteria was higher in Hispanics
 and non-Hispanic blacks than in non-Hispanic whites, higher in heterosexual
 intravenous drug users, and lower in men who have sex with men. Due to the
 large number of cases meeting only the revised case definition and to the
 inconsistent use of the revised case definition in different populations,
 analyses of trends in AIDS cases must take these revisions into account.
VARIABLES AND THEIR CATEGORIES

Data File Variables and Coding Schemes

 The data file included in the AIDS Public Information Data Set conatins one
 line of data for each AIDS case reported to CDC.  Each line contains 62
 columns.  The columns contain 44 variables extracted from CDC's national AIDS
 data set.

 Column    Variable   Description

     1     age        Age group at diagnosis of the first AIDS-indicator
                      opportunistic disease
     2     sexclass   Sexual classification of patient
     3     race       Race of patient
     4     msa        Region of residence
   5-8     dxdate     Month of diagnosis of first AIDS-indicative
                      opportunistic disease
  9-12     repdate    Date when CDC first received information about the case
    13     death      Vital status of the patient
 14-17     deathqtr   Quarter of death for patients reported dead
 18-19     ptgroup    Patient grouping by mode of exposure to HIV
    20     nir        No Identified Risk. Status of investigations for patients
                      reported without known risk of exposure to HIV
    21     multrisk   Indicates if patient had more than one risk of exposure
                      to HIV
    22     birth      Country of birth
    23     categ      Indicates which of the CDC AIDS case revisions the
                      patient meets
    24     bact       Bacterial infections, multiple or recurrent (including
                      Salmonella septicemia). Applicable in pediatric cases
                      only.
    25     burkl      Lymphoma, Burkitt's (or equivalent term)
    26     candesop   Candidiasis, esophageal
    27     candlung   Candidiasis, bronchi, trachea, or lungs
    28     cmv        Cytomegalovirus disease (other than in liver, spleen, or
                      nodes); onset at > 1 month of age
    29     cmvret     Cytomegalovirus retinitis (with loss of vision)
    30     cocci      Coccidioidomycosis, disseminated or extrapulmonary
    31     cryptoco   Cryptococcosis, extrapulmonary
    32     cryptosp   Cryptosporidiosis, chronic intestinal
    33     dementia   HIV encephalopathy
    34     histo      Histoplasmosis, disseminated or extrapulmonary
    35     HS         Herpes simplex: chronic ulcer(s) (>1 month duration);
                      or bronchitis, pneumonitis, or esophagitis
    36     ibl        Lymphoma, immunoblastic (or equivalent term)
    37     iso        lsosporiasis, chronic intestinal (> 1 month duration)
    38     KS         Kaposi's sarcoma
    39     lip        Lymphoid interstitial pneumonia and/or pulmonary
                      lymphoid hyperplasia. Applicable in pediatric cases only.
    40     mavium     Mycobacterium avium complex or M.kansasii, disseminated
                      or extrapulmonary
    41     myco       Mycobacterium, of other species or unidentified species,
                      disseminated or extrapulmonary
    42     pc         Pneumocystis carinii pneumonia
    43     plb        Lymphoma, primary in brain
    44     pml        Progressive multifocal leukoencephalopathy
    45     sals       Salmonella septicemia. Applicable in adult cases only.
    46     tb         M.tuberculosis, disseminated or extrapulmonary
    47     tp         Toxoplasmosis of brain, onset at > 1 month of age
    48     wasting    Wasting syndrome due to HIV
    49     s_bi       Sex with a bisexual man (women only)
    50     s_iv       Sex with an IV drug user
    51     s_other     Sex with a person with hemophilia, a person born in a
                      Pattern-II country, or a transfusion recipient
    52     s_hiv      Sex with a person known to be infected with HIV or to
                      have AIDS
 53-56     deathrep   Date when death was reported to CDC
 57-62     adjwgt     Reporting delay adjustment weight

   Each of these variables is coded numerically. For example, column 13
   contains either "0" or "1". These numbers represent the variable death. The
   number "0" in this column indicates that CDC has not received a death
   notification for this case. A value of "1" indicates that CDC has been
   notified that this patient died. The codes used in the AIDS Public
   Information Data Set are printed below.

Age (column 1)

   This variable contains the patient's age when he or she was first diagnosed
   with an AIDS-indicator disease.

   0 = Less than 1 year old
   1 = 1 to 12 years old
   2 = 13 to 19 years old
   3 = 20 to 24 years old
   4 = 25 to 29 years old
   5 = 30 to 34 years old
   6 = 35 to 39 years old
   7 = 40 to 44 years old
   8 = 45 to 49 years old
   9 = 50 years old or older

Sexclass (column 2)

 Adult/adolescent males are classified according to their sexual orientation.

 1 = Adult/adolescent homosexual male
 2 = Adult/adolescent bisexual male
 3 = Adult/adolescent heterosexual male or pediatric male
 4 = Female (both adult/adolescent and pediatric)

Race (column 3)

 1 = White (not Hispanic)
 2 = Black (not Hispanic)
 3 = Hispanic
 9 = Asian/Pacific Islander, American Indian/Alaskan Native, or unknown

MSA (column 4)

 Region of residence is identified for adult/adolescent patients who live in
 MSAs with more than 1 million population, according to the 1990 census.
 Residence is defined as place of residence at onset of illness suggestive of
 AIDS. The MSA variable is coded as:
   0 = Not in an MSA, Population less than 50,000.
   1 = Northeast
     Bergen-Passaic, N.J.; Boston, Mass.; Hartford, Conn.; Nassau-Suffolk,
     N.J.; New York, N.Y.; Newark, N.J.; or Rochester, N.Y.
   2 = Central
     Chicago, Ill.; Cincinnati, Ohio; Cleveland, Ohio; Columbus, Ohio; Denver,
     Colo.; Detroit, Mich.; Indianapolis, Ind.; Kansas City, Mo.; Milwaukee,
     Wis.; Minneapolis-Saint Paul, Minn.; or Saint Louis, Mo.
   3 = West
     Anaheim, Calif.; Los Angeles, Calif.; Oakland, Calif.; Phoenix, Ariz.;
     Portland, Oreg.; Riverside-San Bernardino, Calif.; Sacramento, Calif.;
     Salt Lake City, Utah; San Diego, Calif.; San Francisco, Calif.; San Jose,
     Calif.; or Seattle, Wash.
   4 = South
     Atlanta, Ga.; Charlotte, N.C.; Dallas, Tex.; Fort Lauderdale, Fla.; Fort
     Worth, Tex.; Houston, Tex.; Miami, Fla.; New Orleans, La.; Orlando, Fla.;
     San Antonio, Tex.; San Juan, P.R.; or Tampa, Fla.
   5 = Mid-Atlantic
     Baltimore, Md.; Norfolk, Va.; Philadelphia, Pa.; Pittsburgh, Pa.; or
     Washington, D.C.
   9 = In an MSA with population less than 1 million, but greater than 50,000.

Dxdate (columns 5 through 8)

 This variable contains the year and month in which the first AIDS-indicator
 disease was diagnosed. Columns 5 and 6 contain the year; columns 7 and 8
 contain the month. Cases diagnosed before 1982 are coded as "8199".

Repdate (columns 9 through 12)

 This variable contains the year and month in which CDC received the case
 report. Columns 9 and I0 contain the year; columns 11 and l2 contain the
 month. Cases reported during 1981 are coded as "8199".

Death (column 13)

 0 = CDC has not received a death notification for this case
 1 = CDC has been notified that this patient died

Deathqtr (columns 14 through 17)

 For patients whose death has been reported to CDC, this variable contains the
 year and quarter of death. Columns 14 and 15 contain the year; columns 16
 and 17 contain the quarter. For example, the value "8803" indicates that the
 patient died in July, August, or September, 1988. Patients who are known to
 have died, but whose date of death is unknown are coded as "9999."

Ptgroup (columns 18 and 19)

 For surveillance purposes, AIDS patients are grouped into a hierarchy of
 exposure categories. Persons with more than one reported mode of exposure to
 HIV are counted in the exposure category listed first in the hierarchy,
 except for persons with a history of both homosexual/bisexual contact and
 intravenous drug use. They are counted in a separate category. Persons with
 multiple reported modes of exposure are indicated in the variable multrisk.

 "Pattern II" is a term adopted by the World Health Organization, and refers
 to countries with a distinctive pattern of HIV transmission. It is observed
 in areas of central, eastern, and southern Africa and in some Caribbean
 countries. In these countries, most of the reported cases occur in
 heterosexuals; the male to female ratio is approximately 1 to 1; and
 perinatal transmission is more common than in other areas. Intravenous drug
 use and homosexual transmission either do not occur or occur at low levels.

 "Other/undetermined" cases are in persons with no reported history of exposure
 to HIV through any of the routes listed in the hierarchy of exposure
 categories. Undetermined cases include persons who are currently under
 investigation by local health department officials; persons whose exposure
 history is incomplete because of death, refusal to be interviewed, or loss
 to follow-up; and persons who were interviewed or for whom other follow-up
 information was available and no exposure mode was identified.

Adult/adolescent exposure categories

 1 = Male homosexual/bisexual contact
 2 = Intravenous (IV) drug use (female and heterosexual male)
 3 = Male homosexual/bisexual contact and IV drug use
 4 = Hemophilia/coagulation disorder
 5 = Heterosexual contact with a person with, or at increases risk for,
     HIV infection
 6 = Born in Pattern-II country
 7 = Receipt of transfusion of blood, blood components, or tissue
 8 = Other/undetermined

Pediatric exposure categories

  9 = Hemophilia/coagulation disorder
 10 = Mother with, or at risk for, HIV infection
 11 = Receipt of transfusion of blood, blood components, or tissue
 12 = Other/undetermined

NIR (column 20)

 NIR (no identified risk) is coded only for patients whose mode of exposure to
 HIV is coded as undetermined in ptgroup.

 1 = Patient currently under investigation
 2 = Patient died, red interview, or is lost to follow-up
 3 = Patient investigation complete but no mode of exposure was identified

Multrisk (column 21)

 Multrisk is coded only for adult/adolescent patients (13 years old or older)
 and indicates if the patient has risk(s) of exposure to HIV other than the
 one indicated by ptgroup.

 0 = Patient's only mode of exposure to HIV is that indicated by ptgroup
 1 = Patient has additional risk(s) of exposure

Birth (column 22)

 1 = Patient was born in the United States or its dependencies and possessions,
     or place of birth was not specified
 2 = Patient was born in Pattern-II country
 3 = Patient was born in a foreign country which is not Pattern II

Categ (column 23)

 This variable reflects changes made over time to the CDC surveillance
 definition for AIDS. Only cases meeting the current (1987) surveillance
 definition are included in this data set. Categ indicates whether the patient
 also met the pre-1985 or 1985 surveillance definition, and whether the
 diagnosis, if it meets only the 1987 definition, was definitive or presumptive.
 Cases that meet more than one of these surveillance definitions are classified
 into the definition category listed first. For more information about the
 1987 definition, see Morbidity and Mortality Weekly Report, August 14,1987,
 Supplement, pages 3S-15S.

 1 = Case meets the pre-1985 surveillance definition
 2 = Case meets the 1985 surveillance definition
 3 = Case meets the 1987 surveillance definition and was diagnosed definitively
 4 = Case meets the 1987 surveillance definition and was diagnosed
     presumptively

AIDS-indicator opportunistic diseases (columns 24 through 48)

 Columns 24 through 48 contain information about each of the AIDS-indicator
 diseases listed on the AIDS confidential case report form. Each of these
 variables is one character long and is coded as follows:

 0 = AIDS-indicator opportunistic disease was not diagnosed
 1 = AIDS-indicator opportunistic disease was diagnosed definitively
 2 = AIDS-indicator opportunistic disease was diagnosed presumptively

Heterosexual risk information (columns 49 through 52)

 These variables (s_bi, s_iv, s_other, and s_hiv) contain additional risk
 information for patients infected heterosexually. All 4 variables are coded
 as follows:

 0  = no
 1  = yes
 9  = missing/unknown

 The variable s_bi is coded only for women (for men, the variable contains a
 blank). All 4 variables contain "9" (missing/unknown) for patients with
 hemophilia, regardless of whether the risk information is in fact unknown.
 This restriction is neck in order to comply with the Assurance of
 Confidentiality on page 5. Of the 1,535 AIDS cases reported through June
 1991 among adults/adolescents with hemophilia, less than 3 percent also
 reported heterosexual contact with a person at increased risk for AIDS or
 HIV infection.

Deathrep (columns 53 through 56)

 For patients whose death has been reported to CDC, this variable contains the
 year and quarter when CDC received the report. Columns 53 and 54 contain the
 year; columns 55 and 56 contain the quarter. For example, the value "8803"
 indicates that the patient's death was reported to CDC in July, August, or
 September, 1988.

 CDC began collecting this variable in October 1987. Deaths reported to CDC
 before October 1987 are coded as "8799".

Adjwgt (columns 56 through 62)

 This variable contains an adjustment weight which, when used as a weighting
 variable in a frequency tabulation, produces tabulations of AIDS cases that
 are adjusted for delays in case reporting (see page 10 for a discussion of
 delays in reporting). The weights are based on estimated reporting delay
 distributions that take into account exposure, geographic, and demographic
 variations in case reporting. The adjustment weights and the resulting
 tabulations are not reliable for cases diagnosed during the most recent 3 to
 6 months. Please note, this variable must not be used for tabulations
 involving dates of report to CDC (repdate), the living status of a patient
 (death), the date of death (deathqtr), or the date when a death was reported
 to CDC (deathrep).  It is reasonable to use this variable for tabulations
 involving any other variable in the data set, including date of diagnosis.
METHODS

Case report form

 Separate case report forms are used for pediatric patients (those less than
 13 years of age at the time of diagnosis) and adult/adolescent patients
 (those 13 years of age or older at the time of diagnosis). Although the forms
 are very similar, the pediatric form includes risk factor information for the
 child's mother. These forms are completed by the health-care provider or by
 the AIDS surveillance staff in the local or state health department.

 Names are retained by the state or local health department and are converted
 to an alpha-numeric code called "soundex" for use by CDC. CDC does not
 receive names of persons with AIDS. Because more than one state may report an
 individual case, CDC screens reported cases by soundex code and date of
 birth to cull duplicate reports.

 The variables available on the AIDS data set are listed in section 2.
 However, a few deserve special comment.

 - Living status. Patients survive for a variable amount of time following the
   diagnosis of AIDS.  Because death usually occurs after the initial report
   to CDC, case reports may not be updated to reflect the change in living
   status.  As a result, reporting of death among AIDS patients may be incom-
   plete.
 - Exposure category. Some patients may have more than one mode of exposure to
   HIV. For surveillance purposes, AIDS cases are counted only once in a
   hierarchy of exposure categories (see section 2, pages 16 and 17). This
   hierarchy is based on the most likely source of HIV infection. Persons with
   more than one reported mode of exposure are counted in the category that
   appears first in the hierarchy, except for persons with a history of both
   male homosexual/bisexual contact and intravenous drug use. They are counted
   in a separate category.

 - Diseases indicative of AIDS. Patients may develop additional diseases
   indicative of AIDS after their initial AIDS diagnosis. The case report form
   may not be updated to reflect additional diseases. Therefore, proportions
   of patients with the various AIDS-indicator disease should be considered
   minimal estimates.

 - Date of diagnosis. CDC collects only one diagnosis date per patient, i.e.,
   the date when he or she was initially diagnosed with an AIDS-indicator
   disease. Patients who develop additional diseases do not receive additional
   diagnosis dates. Therefore, for patients with multiple AIDS-indicator
   diseases, you cannot determine which disease occurred first.

Special Case Investigations

 Certain AIDS cases receive special follow-up by state and local health
 departments. Investigations are frequently performed after the initial case
 report to CDC. Case updates are incorporated into the data set as they
 are available to CDC.

 - No identified risk (NIR) patients. NIR patients are those reported without
   any recognized mode of exposure to HIV. Approximately 3 percent of cases
   are NIR patients at any one time. However, when additional information can
   be obtained for these patients, approximately 75 percent are reclassified
   into a known exposure category. For those not reclassified, the demographic
   profile is more similar to that of other persons with AIDS than to the
   general U.S. population.

 - Health-care workers. Ninety-five percent of health-care workers with AIDS
   are classified into a known exposure category. Of the health-care workers
   with an undetermined mode of exposure to HIV, less than one third cannot be
   reclassified after investigation.

Delay in Reporting

 The timeliness of AIDS case reporting to CDC depends on several factors.
 These include the volume of cases reported from a state or locality and the
 availability of staff to complete ase report forms.  In many instances,
 initial case report forms are incomplete and require additional follow-up by
 state and local health department staff, including reviews of other record
 systems and contact with healthcare providers.

 About 55 percent of all cases are reported to CDC within 3 months of the
 date of diagnosis, but about 20 percent are reported more than a year after
 diagnosis.  Delays vary widely among exposure, geographic, racial/ethnic, and
 age categories.  They are substantially longer for pediatric cases and for
 transfusion-associated cases in adults.   Due to the delay, the number of
 cases diagnosed during any period often exceeds the number reported during
 that period.  This is particularly important in examining trends over time,
 since many cases in recent periods of time will not yet be reported.

 To account for delays in the reporting of cases, a variable called adjwgt has
 been added to the data set.  This variable may be used to weight each case on
 the data set and obtain adjusted case counts.  For example, summing adjwgt
 for cases would estimate the number of cases diagnosed through the time
 period covered by the data set.

Early Reporting Dates

 Before 1990, CDC occasionally received reports on patients before they met
 the CDC AIDS case definition.  If such patients were later diagnosed with
 AIDS, the diagnosis date on their record (indicating when the patient first
 met the CDC definition) would be after the report date (when CDC first
 received information about the patient).  Such records should be excluded
 from certain analysis. CDC's AIDS surveillance data base no longer receives
 reports on patients who do not meet the AIDS case definition.

Follow-up of Reported AIDS Cases

 AIDS case records maintained at CDC contain all information reported to date
 from state or local health departments. As patients progress through their
 illness, additional diseases and conditions may be reported, or the patient's
 vital status may change. However, not all health departments have the
 resources to routinely follow-up patients for additional information,
 including vital status. For this reason and because many patients move out of
 the reporting health department's jurisdiction, CDC records do not always
 contain all current information for each patient.

Non-reporting and Evaluation of AIDS Surveillance

 Cases of AIDS may not be reported to CDC for a variety of reasons. The
 diagnostic tests needed to confirm the AIDS diagnosis may not be performed,
 or physicians and hospital personnel may fail to report cases to the health
 department. Further, some patients with HIV disease may be ill or die from
 diseases or conditions not included in the current AIDS surveillance
 definition.

 Both CDC and state and local health departments have commissioned a variety
 of studies to evaluate the completeness of AIDS surveillance. Most evaluation
 projects have used alternate data resources, such as death certificates,
 hospital discharge records, and laboratory records. Individual records from
 these alternate sources have then been matched against records in AIDS
 surveillance data bases. Evaluation studies have varied in size and scope
 (e.g., varying numbers of ICD-9 codes from death certificates or computerized
 discharge records), geographic area covered, detection of both inpatient and
 outpatient cases, and time frames. In general, estimates from these studies
 suggest that the completeness of reporting varies considerably, from 56 to
 100 percent. High-prevalence areas for AIDS appear to have have more complete
 reporting than low-prevalence areas.
TECHNICAL INFORMATION

Memory/Storage Requirements and Mediums

 The AIDS Public Information Data Set contains large quantities of data and
 requires significant computer resources for analysis. You need a 386-based
 MS-DOS microcomputer with at least 30 megabytes of disk storage and a
 high-density diskette drive. The new SETS interface allows you to dislay
 simple statistics without additional software such as SAS, SPSS, BMPD, or
 PRODAS.  More complex analysis, however, will still require additional
 software.

 To transfer the data to another software package or to a mainframe computer
 for analysis, you must first load SETS, then use the Export option to extract
 the records and variables you wish to analyze.  The Export option will create
 an ASCII data file, which can then be processed by other software, or
 transferred to your mainframe using software designed for this purpose.
 Examples of transfer programs include Crosstalk and Hayes SmartCom.

Loading SETS

 The AIDS Public Information Data Set consists of over 10 diskettes.  To
 install it onto your computer, insert diskette #1 in drive A and type the
 following DOS commands
 a:      <ENTER>
 install <ENTER>

 The first command changes the current drive to A, and the second command
 begins the installation process. Please note that the first diskette is also
 the last diskette, i.e., you will need to process it at the beginning and the
 end of the installation procedure.  You will need at least 30 minutes to
 install this software.

 Once you have installed SETS, type the command
 sets    <ENTER>
 to run the program.  SETS is a menu-driven program which can be mastered with
 minimum effort.

Getting Help
 You can access help from SETS in two ways, by pressing the <F1> key at any
 time you are running SETS, or by selecting the Browse feature.  Once you
 select Browse, select Documentation, then SETS Manual.

SETS Features
 From the SETS main menu, you can select the following options:

 BROWSE-to browse the documentation, MSA and state tables, and the main data
 file.  Browsing the main data file allows you to display the variable names
 and value labels contained in the data file.

 TABLE-to create and display cross tabulations of any of the variables in the
 data set.  Tabulations are displayed in a spreadsheet format which can be
 saved and loaded onto the Lotus software.

 SUBSET-to specify which variables and records should be included in
 tabulations or in exported files.

 DEFAULTS-to adjust the setting for export drives and directories, and for the
 autosave feature and screen colors.

Creating Tables
 After you install the SETS program, you can create a table by following these
 steps:
 1. From the SETS directory, type sets <ENTER>, to run the program.
 2. When the program appears, press <ENTER> until the main menu appears.
 3. Use the arrow keys to highlight "Table" and press <ENTER>.
 4. At "Display" press <ENTER>.
 5. At the screen, "What would you like to do," press the <ENTER> key to
    select, "Create a new record subset."
 6. Type "all" to select all records and press <ENTER>.
 7. When the spreadsheet appears, press the <F2> key, edit.
 8. Press the <F6> key for table expression assistance.
 9. To create a table of SEXCLASS by RACE, begin by using the arrow keys to
    highlight the variable SEXCLASS, the press <ENTER> to select this
    variable.
 10. Use the arrow keys to highlight the variable RACE, then press <ENTER>
    to select it.
 11. Press the <F10> key to accept these two variables.
 12. The spreadsheet will reappear with the text, "Edit: SEXCLASS, RACE"
    displayed at the bottom of the screen.
 13. Type the text, "/labels" and press <ENTER>.  Do not type the quotation
    marks.
 The SETS program will create the table.  This process can take half and hour
 or longer, depending on the speed of your machine.  Additional detail on how
 to create tables is provided in the on-line documentation.

MSA and State Tables

 The microfiche contain frequency tables and cross tabulations of 8 variables
 extracted from CDC's national AIDS surveillance data set. They contain one
 set of tables for the entire United States, one set for each state, and one
 set for each MSA. The variables are:

 Variable   Description

 age        Age group at diagnosis of the first AIDS-indicator disease
 categ      Indicates which of the CDC AIDS case revisions the patient meets
 dth_hyr    Half-year of death for patients reported dead
 dx_hyr     Half-year of diagnosis of first AIDS-indicator disease
 ent_hyr    Half-year in which CDC first received information about the case
 ptgrp      Patient grouping by mode of exposure to HIV.
 race       Race of patient
 sex        Sex of patient

 The values used for these variables are printed below.

Age

 This variable contains the patient's age when he or she was first diagnosed
 with an AIDS-indicator disease. Ages printed on the microfiche are grouped
 as follows:

  0 -  1
  1 - 12
 13 - 19
 20 - 29
 30 - 39
 40 - 49
 50 +

Categ

 This variable reflects revision made to the CDC surveillance definition for
 AIDS. Only cases meeting the current (1987) surveillance definition are
 included on the microfiche. Categ indicates whether the patient also meets
 the pre-1985 or 1985 surveillance definition, and whether the diagnosis, if
 it meets only the 1987 definition, was definitive or presumptive. Cases that
 meet more than one of these surveillance definitions are classified into the
 definition category listed first. For more information about the 1987
 definition, see Morbidity and Mortality Weekly Report, August 14,1987,
 Supplement, pages 3S-15S.

 1 = Case meets the pre-1985 surveillance definition
 2 = Case meets the 1985 surveillance definition
 3 = Case meets the 1987 surveillance definition and was diagnosed definitively
 4 = Case meets the 1987 surveillance definition and was diagnosed
     presumptively

Dth_hyr

 For patients whose death has been reported to CDC, this variable contains
 the half-year of death. The first two numbers indicate the year; the second
 two indicate the first or second half of that year. For example, the value
 "8802" indicates that the patient died in the second half of 1988. Patients
 whose death has been reported to CDC, but whose date of death is unknows are
 coded as "9999".

Dx_hyr

 This variable contains the half year in which the first AIDS-indicator
 disease was diagnosed. The first two numbers indicate the year; the second
 two indicate the first or second half of that year.

Ent_hyr

 This variable contains the half-year in which CDC received the case report.
 The first two numbers indicate the year; the second two indicate the first
 or second half of that year.

Ptgrp

 For surveillance purposes, AIDS patients are grouped into a hierarchy of
 exposure categories. Persons with more than one reported mode of exposure to
 HIV are counted in the exposure category listed first in the hierarchy,
 except for persons with a history of both homosexual/bisexual contact and
 intravenous drug use. They are counted in a separate category.

 "Pattern II" is a term adopted by the World Health Organization, and refers
 to countries with a distinctive pattern of HIV transmission. It is observed
 in areas of central, eastern, and southern Africa and in some Caribbean
 countries. In these countries, most of the reported cases occur in
 heterosexuals; the male to female ratio is approximately 1 to 1; and perinatal
 transmission is more common than in other areas. Intravenous drug use and
 homosexual transmission either do not occur or occur at low levels.

 "Other/undetermined" cases are in persons with no reported history of exposure
 to HIV through any of the routes listed in the hierarchy of exposure
 categories. Undetermined cases include persons who are currently under
 investigation by local health department officials; persons whose exposure
 history is incomplete because of death, refusal to be interviewed, or loss
 to follow-up; and persons who were interviewed or for whom other follow-up
 information was available and no exposure mode was identified.

 01 = Male homosexual/bisexual contact
 02 = Intravenous (IV) drug use (female and heterosexual male)
 03 = Male homosexual/bisexual contact and IV drug use
 04 = Hemophilia/coagulation disorder
 05 = Heterosexual contact with a person with, or at increased risk for, HIV
      infection
 06 = Born in Pattern-II country
 07 = Receipt of transfusion of blood, blood components, or tissue
 08 = Adult/adolescent other/undetermined
 09 = Pediatric hemophilia/coagulation disorder
 10 = Mother with, or at risk for, HIV infection
 11 = Pediatric receipt of transfusion of blood, blood components, or tissue
 12 = Pediatric undetermined

Race

 1 = White (not Hispanic)
 2 = Black (not Hispanic)
 3 = Hispanic
 4 = Asian/Pacific Islander
 5 = American Indian/Alaskan Native
 9 = Unknown

Sex

 1 = Male
 2 = Female


Locating Individual Tables

 In accordance with CDC guidelines on protecting confidentiality and with an
 agreement made with state and local health departments for release of these
 data, entries whose value is 5 or less are not included in the tables. Only
 MSAs with 500,000 or more population (according to 1991 census estimates)
 are included on the microfiche.

 The AIDS Public Information Data Set contains frequency tables of 8
 variables, and every possible 2-way cross tabulation of those variables
 for each state, each MSA with 500,000 or more population, and for the entire
 United States.  Tables for the entire United States also contain cross
 tabulations of 2 additional variables, STATE and MSA.

 To access these tables, select the Browse feature on the SETS menu, then
 select "Documentation."  A menu will appear which divides the country into
 9 geographic regions, New England, North Atlantic, Mid-Atlantic, South
 Atlantic, Mid-West, Great Plains, South Central, Mountain, and Pacific.  For
 example, to access data for New York City, first select the North Atlantic
 region.  SETS will display an list of all states and MSAs in that region,
 including New York City.  To view the tables for any state or MSA in that
 region, select the name of the state or MSA.

 SETS will then display the first table for the state or MSA you have
 selected.  It will first display the 1-way frequency tables, 1 table per
 screen, then the 2-way cross tabulations.  Tables are displayed alpha-
 betically, beginning with AGE and progressing to RACE and SEX.

 SETS allows you to search for individual table entries within each state
 or MSA file.  Press the <F6> key to begin the search.  It will also allow you
 display or print a particular page in the file.  SETS contains on-line
 documentation that describes the search process in more detail.

 SETS also allows global searches, i.e. you can search for tables in any of
 the state or MSA files included in the data set, not just those contained in
 the current state or MSA file.  For example, if you are displaying data for
 New York City, and want to compare them to data from Los Angeles, you can use
 the global search function to search for the entry "Los Angeles." SETS would
 then locate the first table in the Los Angeles file. To begin a global
 search, press the <F9> key.


SAMPLE TABLE(s) OF INFORMATION

MSA Codes

 Definitions for MSAs are issued by the Office of Management and Budget (OMB)
 to be used in presentation of statistics by agencies of the federal
 government. The metropolitan areas used on the AIDS Public Information Data
 Set are the MSAs for all areas except the 6 New England states. For these
 states, the New England County Metropolitan Areas (NECMA, also defined by OMB)
 are used. Metropolitan areas are named for a central city in the MSA or NECMA
 and may include several counties and cross state boundaries.

  Code    Metropolitan area

    80    Alkron, Ohio
   160    Albany-Schenectady, N.Y.
   200    Albuquerque, N.M.
   240    Allentown, Pa.
   360    Anaheim, Calif.
   520    Atlanta, Ga.
   640    Austin, Tex.
   680    Bakersfield, Calif.
   720    Baltimore, Md.
   760    Baton Rouge, La.
   875    Bergen-Passaic, N.J.
  1000    Birmingham, Ala.
  1123    Boston, Mass.
  1163    Bridgeport, Conn.
  1280    Buffalo, N.Y.
  1440    Charleston, S.C.
  1520    Charlotte, N.C.
  1600    Chicago, Ill.
  1640    Cincinnati, Ohio
  1680    Cleveland, Ohio
  1840    Columbus, Ohio
  1920    Dallas, Tex.
  2000    Dayton, Ohio
  2080    Denver, Colo.
  2160    Detroit, Mich.
  2320    El Paso, Tex.
  2680    Fort Lauderdale, Fla.
  2800    Fort Worth, Tex.
  2840    Fresno, Calif.
  2960    Gary, Ind.
  3000    Grand Rapids, Mich.
  3120    Greensboro, N.C.
  3160    Greenville, S.C.
  3240    Harrisburg, Pa.
  3283    Hartford, Conn.
  3320    Honolulu, Hawaii
  3360    Houston, Tex.
  3480    Indianapolis, Ind.
  3600    Jacksonville, Fla.
  3640    Jersey City, N.J.
  3760    Kansas City, Mo.
  3840    Knoxville, Tenn.
  4120    Las Vegas, Nev.
  4400    Little Rock, Ark
  4480    Los Angeles, Calif.
  4520    Louisville, Ky.
  4920    Memphis, Tenn.
  5000    Miami,FIa.
  5015    Middlesex, N.J.
  5080    Milwaukee, Wis.
  5120    Minneapolis-Saint Paul, Minn.
  5190    Monmouth-Ocean City, N.J.
  5360    Nashville, Tenn.
  5380    Nassau-Suffolk, N.Y.
  5483    New Haven, Conn.
  5560    New Orleans, La.
  5600    New York, N.Y.
  5640    Newark, N.J.
  5720    Norfolk, Va.
  5775    Oakland, Calif.
  5880    Oklahoma City, Okla.
  5920    Omaha, Nebr.
  5960    Orlando, Fla.
  6000    Oxnard-Ventura, Calif.
  6160    Philadelphia, Pa.
  6200    Phoenix, Ariz.
  6280    Pittsburgh, Pa.
  6440    Portland, Oreg.
  6483    Providence, R.I.
  6640    Raleigh-Durham, N.C.
  6760    Richmond, Va.
  6780    Riverside-San Bernardino, Calif.
  6840    Rochester, N.Y.
  6920    Sacramento, Calif.
  7040    Saint Louis, Mo.
  7160    Salt Lake City, Utah
  7240    San Antonio, Tex.
  7320    San Diego, Calif.
  7360    San Francisco, Calif.
  7400    San Jose, Calif.
  7440    San Juan, P.R.
  7560    Scranton, Pa.
  7600    Seattle, Wash.
  8003    Springfield, Mass.
  8160    Syracuse, N.Y.
  8200    Tacoma, Wash.
  8280    Tampa-Saint Petersburg, Fla.
  8400    Toledo, Ohio
  8520    Tucson, Ariz.
  8560    Tulsa, Okla.
  8840    Washington, D.C.
  8960    West Palm Beach, Fla.
  9160    Wilmington, Del.
  9243    Worcester, Mass.





Home  Frequently Asked Questions  Search  Utilities  Help  Contact Us

This page last reviewed: Thursday, June 05, 2003

Department of Health and Human Services
Centers for Disease Control and Prevention
Epidemiology Program Office
Division of Public Health Surveillance and Informatics