Survey Methodology:
Survey of Doctorate Recipients

1. Overview

a. Purpose

The Survey of Doctorate Recipients[1] (SDR) is designed to provide demographic and career history information about individuals with doctoral degrees. The results of this survey are vital for educational planners within the Federal Government and in academia. The results are also used by employers in all sectors (education, industry, and the government) to understand and predict trends in employment opportunities and salaries in S&E fields for doctorate holders and to evaluate the effectiveness of equal opportunity efforts. NSF also finds the results important for internal planning, since most NSF grants go to individuals with doctoral degrees. This survey is designed to complement the other surveys of scientists and engineers conducted by SRS in order to provide a comprehensive picture of the number and characteristics of individuals with training and/or employment in science and engineering in the United States. This combined system is known as the Scientists and Engineers Statistical Data System (SESTAT).

b. Respondents

This survey is completed by individuals with doctorates in science and engineering.

c. Key variables

2. Survey Design

a. Target population and sample frame

The population of the 2001 survey consisted of all individuals under the age of 76 who received a research doctorate in science or engineering from a U.S. institution and were residing in the United States on April 15, 2001. The sample frame used to identify these individuals was the Doctorate Records File, maintained by the National Science Foundation. The primary source of information for the frame is the Survey of Earned Doctorates.[2] For individuals who received a degree prior to 1957, when the SED started, information was taken from a register of highly qualified scientists and engineers that the National Academy of Sciences had assembled from a variety of sources, including university and college catalogues of doctorate-granting institutions, Federal laboratories, selected industrial organizations, and American Men and Women in Science.

b. Sample design

This is a longitudinal survey. Recent recipients of research doctorates are added each time the survey is conducted and those individuals over age 75 are dropped. The following variables were used for sample stratification in the 2001 survey: field of degree, sex, race/ethnic identification, disability status, and place of birth (U.S. versus foreign-born).

A total of 40,000 individuals with research doctoral degrees in S&E were included in the 2001 survey.[3]

c. Data collection techniques

Initial data collection in 2001 was by mail. Procedures included a prenotification letter, first mailing of the questionnaire, a reminder postcard and up to two follow-up mailings.

Nonrespondents to the mail questionnaire were followed up using computer-assisted telephone interviewing (CATI) techniques. The instrument used in the phone follow-up was modified from the mail instrument to avoid difficulties encountered in administering some of the questions by phone, especially those (such as field of degree and field of occupation) that require individuals to select from an extensive list of possible responses.

Both the mail and phone instruments were designed to be as similar as possible to the instruments used in the other SESTAT surveys in order to facilitate combining results. A few questions in the SDR, however, obtain information of special interest for the population with doctorates. For example, the SDR contains information on faculty and tenure status not included in the other SESTAT surveys.

Information in the 2001 survey was collected for the week of April 15, 2001. Data collection took place between May and October of 2001.

d. Estimation techniques

Usable responses are weighted by the product of two weights: (1) the inverse of the sampling rate used for initial sample selection, and (2) a nonresponse adjustment factor for each sampling cell, equal to the ratio of sample cases in the sampling cell to the number of usable responses in the sampling cell. In the event that the nonresponse adjustment factor exceeds a prespecified ratio, collapsing procedures are used, i.e., the cell is combined with other cells with similar characteristics on the variables used for stratification. If this fails to provide adequate safeguards on the range of weights, the nonresponse adjustment weight is constrained to equal the maximum allowable rate.

In 2001, both logical and hot deck imputation techniques were used to compensate for item nonresponse.

3. Survey Quality Measures

a. Sampling variability

The sample size is sufficiently large that estimates based on the total sample should be subject to no more than moderate sampling error. However, sampling error can be quite substantial in estimating the characteristics of small subgroups of the population. For example, the coefficient of variation in 1991 for the percentage of women among those with a primary work activity of development or design was approximately 10 percent.

b. Coverage

As discussed in the Education section, coverage for the Survey of Earned Doctorates is believed to be excellent. Since this is the sample frame for most of the SDR sample, the SDR benefits from this excellence. For years prior to 1957 (the commencement of the SED), the sample frame was compiled from a variety of sources.

While it is likely that this component of the sample frame was more subject to coverage problems than is true for later cohorts, pre-1957 doctorates constitute less than 1 percent of the target population in 2001.

c. Nonresponse

(1) Unit nonresponse - The response rate for the 2001 survey was 82.6 percent. While this is a relatively high response rate, nonresponse error remains a possible source of concern for this survey. In order to minimize the impact of this source of error, results are adjusted for nonresponse through the use of statistical weighting techniques. A nonresponse study, performed in 1989 when the survey response rate was 55 percent, indicated that there were some potential sources of bias in the survey that were not fully corrected by the adjustment for nonresponse.[4] Most important, since individuals located outside the United States are relatively hard to locate, the survey tends to overestimate slightly the size of the U.S. population of scientists and engineers. In 1989 we estimated that the overestimate was approximately 4 percent. A similar study in 1979 indicated an overestimate of approximately 6 percent.

Due to their relatively high visibility, it was also easier to locate faculty members, especially those with tenure, than individuals employed in industry, resulting in a slight overestimation of the former group and an underestimation of the latter. We estimate that the overestimate of the percentage employed in academia was in the range of 5 percentage points and that the underestimate of the percentage employed in industry was approximately 3 percentage points.

Another variable with nonnegligible response error, according to the 1989 nonresponse bias study, was Federal support status. The data indicated that this was overestimated by approximately 5 percentage points. Presumably, individuals who receive support from the Federal Government are relatively likely to respond to a governmental survey.

Considerable care was used in designing the 1990s surveys to reduce the nonresponse bias noted in the 1980s surveys. This included implementing extensive follow-up procedures that resulted in a dramatic increase in response rates and paying special attention to reaching difficult-to-locate sample members. [5]

(2) Item nonresponse - In 2001 the item nonresponse rates for key items (employment status, sector of employment, field of occupation, and primary work activity) ranged from 0.0 percent to 0.3 percent. Some of the remaining variables had nonresponse rates that were considerably higher. For example, salary and earned income, particularly sensitive variables, had item nonresponse rates of 5.4 and 6.2 percent, respectively. Personal demographic data such as marital status, citizenship and race/ethnicity had rates ranged from 0.5 to 3.5 percent.

d. Measurement

Several of the key variables in this survey are difficult to measure and thus are relatively prone to measurement error. For example, individuals do not always know the precise definitions of occupations that are used by experts in the field and may thus select occupational fields that are technically incorrect.

As is true for any multimodal survey, it is likely that the measurement errors associated with the different modalities are somewhat different. This possible source of measurement error is especially troublesome, since the proclivity to respond by one mode or the other is likely to be associated with variables of interest in the survey. To the extent that certain types of individuals may be relatively likely to respond by one mode compared with another, the multimodal approach may have introduced some systematic biases into the data. SRS and the Census Bureau have designed a special study to investigate the extent of this bias for the NSCG. Due to the similarities between the SDR and the NSCG, we expect these results to provide insights about the SDR. [6]

4. Trend Data

There have been a number of changes in the definition of the population surveyed over time. For example, prior to 1991, the survey included some individuals who had received doctoral degrees in fields outside of S&E or had received their degrees from non-U.S. universities. Since coverage of these individuals had declined over time, the decision was made to delete them from the 1991 survey. The survey improvements made in 1993 are sufficiently great that SRS staff believe that trend analyses between the data from the 1990s surveys and the surveys in prior years must be performed very cautiously, if at all. Individuals who wish to explore such analyses are encouraged to discuss this issue further with the survey project officer listed below.

5. Availability of Data

a. Publications

b. Electronic access

Data from this survey are available on the SRS Web site and on SESTAT. Access to restricted data for researchers interested in analyzing microdata can be arranged through a licensing agreement.

c. Contact for more information

Kelly H. Kang
Senior Program Analyst
Human Resources Statistics Program
Division of Science Resources Studies
National Science Foundation
4201 Wilson Boulevard, Suite 965
Arlington, VA 22230
(703) 292-7796
via e-mail at kkang@nsf.gov

Survey Methodology: Survey of Doctorate Recipients