Design and Methodology

For information on the design and methodology of the SESTAT surveys, click on a topic below. Scroll down for additional topics.


Overview

SESTAT is a database of the employment, education, and demographic characteristics of the nation's scientists and engineers. The data are collected from the following three surveys, which have been sponsored every two years since 1993 by the National Science Foundation (NSF):

Only the NSRCG and SDR were conducted in 2001.

Information from these surveys has been integrated into the SESTAT database, available on the World Wide Web. The SESTAT database allows for analyses of different components of the science and engineering workforce. When accessing SESTAT through the WWW, the user may select the integrated database for 1993, 1995, 1997, or 1999. In addition, data from the individual surveys maybe accessed for special analytical purposes. Access to some data may be restricted due to confidentiality considerations.

See Target Population and Coverage below for details on the overall SESTAT target population and specific population coverage of each component survey.


Target Population and Coverage

The target population for a data system is the specific population about which information is desired. The SESTAT target population includes residents of the United States with at least a bachelor's degree and who, as of the survey reference period, were

More specifically, the U.S. resident (1) had to have at least one bachelor's or a higher degree in an S&E field as of June 30 of the previous year, or (2) had to have at least a bachelor's degree in a non-S&E field and had to be working in an S&E occupation as of the survey reference week (the week of April 15, 1993; April 15, 1995; April 15, 1997; April 15, 1999; or April 15, 2001).

Coverage problems are found in most surveys and analysts should be careful to understand the shortcomings of the data and how that might effect the analysis planned. For example, SESTAT does not cover degrees between the reference date of the survey (April 15, 19xx) and the preceding July 1. Accordingly analysis on the first few months of an individual's career is limited.

Groups Not Covered in 1993

Within the coverage defined above, some bachelor's- and master's-level people were not included in the 1993 surveys. Not covered groups include:

Doctorate-level, S&E-trained people not surveyed in 1993 were predominately U.S. residents who received an S&E doctorate after June 1992 or earned that degree at a foreign institution, who had no degree of any type in any field as of April 1, 1990, and:

Groups Not Covered in 1995

Similar groups were not surveyed, and therefore not included in SESTAT, in 1995. The following bachelor's- and master's-level people were not included:

Doctorate-level, S&E trained people not surveyed in 1995 were predominately U.S. residents who received an S&E doctorate either after June 1994 or earned that degree at a foreign institution, who had no degree of any kind in any field as of April 1, 1990, and:

Groups Not Covered in 1997

Like the earlier years, some groups were not surveyed, and therefore not included in SESTAT, in 1997. The following bachelor's- and master's-level people were not included:

Doctorate-level, S&E trained people not surveyed in 1997 were predominately U.S. residents who received an S&E doctorate either after June 1996 or earned that degree at a foreign institution, who had no other degree of any type in any field as of April 1, 1990, and:

Groups Not Covered in 1999

Like the earlier years, some groups were not surveyed, and therefore not included in SESTAT, in 1999. The following bachelor's- and master's-level people were not included:

Doctorate-level, S&E trained people not surveyed in 1999 were predominately U.S. residents who received an S&E doctorate either after June 1998 or earned that degree at a foreign institution, who had no other degree of any type in any field as of April 1, 1990, and:

Groups Not Covered in 2001

Like the earlier years, some groups were not surveyed, and therefore not included in SESTAT, in 2001. As noted above, the National Survey of College Graduates was not conducted in 2001. Therefore, only information on those included in the NSRCG and SDR samples are available for 2001: Information on individuals not covered by the 2001 NSRCG or 2001 SDR is not available in 2001.

Multiple Coverage (Survey Population Overlap)

Some scientists and engineers had multiple chances of selection because they were linked to the sampling frames for more than one SESTAT component survey. This frame characteristic is referred to as multiplicity. For example, a U.S. resident who received a bachelor's degree before 1990, went on to complete a master's degree in statistics in June 1990, and then earned a doctorate degree in June 1992 could have been selected for all three 1993 surveys. See "Weighting Strategy" for a discussion of how the SESTAT weights compensate for these multiple chances of selection. 


Component Surveys

The 1993, 1995, 1997, 1999 and 2001 SESTAT data come from the following surveys: A description of each of these component surveys follows. Some groups of people in the desired target population for SESTAT were not included in the target populations of any of the three component surveys. See Target Population and Coverage above for a description of the difference between the desired target population and the surveyed population.

National Survey of College Graduates (NSCG)

The NSCG primarily covers experienced scientists and engineers (S&E) with a bachelor's or master's degree. Some smaller groups are also included such as those with an S&E Ph.D. earned from a foreign institution or those working in S&E occupations, but with no S&E degree. The National Survey of College Graduates (NSCG) accounts for the largest segment of SESTAT's target population--covering about 80 to 90 percent of the population. The survey, conducted by the U.S. Bureau of the Census, is derived from the 1990 Decennial Census Long Form sample.

The 1993 NSCG was a special baseline survey that included all those who had earned a bachelor's degree or higher prior to 4/1/90 --whether in science or engineering or not. It covered a much larger target population than the usual NSCG--covering over 30 million college graduates, rather than the usual 10 to 12 million S&E's. The sample for this survey was drawn from 1990 Census Long Form respondents -- those residing in the United States on April 1, 1990 or residing abroad as U.S. military personnel.

The 1995 NSCG target population covered the more usual S&E population portion. The sample was selected from 1993 NSCG respondents and 1993 National Survey of Recent College Graduates (NSRCG) respondents (see Sample Designs). Adding the NSRCG group - which will now be referred to as the 1993 NSRCG Panel -- allowed the 1995 NSCG coverage to be more consistent with the 1993 NSCG coverage -- that is those who earned their degree more than two years ago. Specifically, the 1993 NSRCG Panel portion added to the 1995 NSCG were those who earned an S&E degree between April 1, 1990 and June 30 1992. The 1995 NSCG was conducted by the U.S. Census Bureau.

The 1997 NSCG target population followed the same pattern as the 1995 NSCG. That is, the 1997 NSCG sample was selected from the 1995 NSCG (including the 1993 NSCG and the 1993 NSRCG Panel) and the 1995 National Survey of Recent College Graduates (1995 NSRCG Panel) sample. Different portions of the 1997 NSCG sample were conducted by two separate survey organizations. The U.S. Census Bureau administered the original 1993 NSCG portion of the 1997 NSCG sample. Westat, Inc. administered the 1993 NSRCG Panel and the 1995 NSRCG Panel portions. In this year, slightly different versions of the NSCG questionnaire were used by two organizations. The 1993 and 1995 NSRCG Panel sample, being more recent graduates than the original NSCG sample, received one of two NSCG questionnaires, each with several additional questions regarding their plans regarding further education. Two NSCG questionnaires were used by Westat because unlike the portion at the Census Bureau, the NSRCG Panel sample also included nonrespondents to the 1995 NSCG. Both of these questionnaires are referred to as the NSRCG Follow-up (Version A and Version B).

The 1999 NSCG sample was selected from the 1997 NSCG sample (including the 1993 NSCG, and the 1993 and 1995 NSRCG Panels) and the 1997 National Survey of Recent College Graduates sample (1997 NSRCG Panel). Again, different portions of the 1999 NSCG were conducted by two separate survey organizations.  This time, the U.S. Census Bureau administered the 1993 NSCG and the 1993 NSRCG Panel portions.  Westat, Inc. administered the 1995 NSRCG Panel and the 1997 NSRCG Panel portions. As in 1997, slightly different versions of the NSCG questionnaire were used.  The NSRCG Panel samples (more recent graduates) surveyed by Westat received one of two NSCG questionnaires, each with several additional questions regarding their plans regarding further education.  Two questionnaires were used because unlike the portions at the Census Bureau, Westat’s NSRCG Panel samples also included nonrespondents to the 1997 NSRCG. Both of these questionnaires are referred to as the NSRCG Follow-up (Version A and Version B).

The National Survey of College Graduates was not conducted in 2001.

Survey of Doctorate Recipients

The general scope of the Survey of Doctorate Recipients covers individuals who received a doctorate in an S&E field from a U.S. institution. Doctoral level professional degrees such as those awarded in medicine, law, or education are not included.

The 1993 Survey of Doctorate Recipients (SDR) covers the portion of SESTAT's target population that received doctoral degrees in an S&E field from a U.S. educational institution between January 1, 1942 and June 30, 1992. The 1995 SDR includes those who received doctoral degrees in a S&E field from a U.S. educational institution between January 1, 1942 and June 30, 1994. The 1997 SDR includes those who received doctoral degrees in a S&E field from a U.S. educational institution between January 1, 1942 and June 30, 1996. The 1999 SDR includes those who received doctoral degrees in an S&E field from U.S. educational institution between January 1, 1942 and June 30, 1998. The 2001 SDR includes those who received doctoral degrees in an S&E field from U.S. educational institution between January 1, 1942 and June 30, 2000. 

The 1993 and 1995 SDR were conducted by the National Research Council (NRC), which also has been maintaining the Doctorate Records File (DRF), a historical database of U.S. doctorate recipients used in constructing SDR's sampling frame. The 1997 SDR was conducted by National Opinion Research Center (NORC).  The 1999 and 2001 SDR were conducted by the U.S. Census Bureau. NORC has been maintaining the DRF since 1997.

National Survey of Recent College Graduates

In general, the National Survey of Recent College Graduates covers those who received a S&E degree from a U.S. institution in the two academic years prior to the survey reference date. Specifically, the 1993 National Survey of Recent College Graduates (NSRCG) covers the portion of SESTAT's target population that received bachelor's and master's degrees in an S&E field from a U.S. educational institution between April 1, 1990 and June 30, 1992. The Institute for Social Research (ISR) of Temple University selected the samples of educational institutions and recent graduates for the 1993 NSRCG. Westat, Inc., conducted the survey.

The 1995 NSRCG covers those who received bachelor's or master's degrees in an S&E field from a U.S. educational institution between July 1, 1992 and June 30, 1994. Westat, Inc., selected the sample and conducted the survey.

The 1997 NSRCG covers those who received bachelor's or master's degrees in an S&E field from a U.S. educational institution between July 1, 1994 and June 30, 1996. Westat, Inc., selected the sample and conducted the survey.

The 1999 NSRCG covers those who received bachelor's or master's degrees in an S&E field from a U.S. educational institution between July 1, 1996 and June 30, 1998. Westat, Inc., selected the sample and conducted the survey.

The 2001 NSRCG covers those who received bachelor's or master's degrees in an S&E field from a U.S. educational institution between July 1, 1998 and June 30, 2000. Westat, Inc., selected the sample and conducted the survey.

Once individuals have entered the SESTAT system through the NSRCG, a subsample is followed as part of the NSCG. (see section on National Survey of College Graduates).


Sample Designs

Probability sampling was used for the SESTAT component surveys to create a defensible basis for generalizing from the combined samples to the SESTAT target population. Selecting a probability sample means establishing a frame through which members of the target population can be identified -- either directly or via linkage to other units (e.g., individuals to housing units). Because scientists and engineers constitute only a small percentage of the U.S. population, it would have been cost prohibitive to survey the entire nation to identify members of the target population who could be interviewed. Instead, a multiple-frame sampling approach to surveying U.S. scientists and engineers was used (see "Component Surveys").

While the SESTAT surveys have somewhat different sample design---due to differing information available from the sample frames-- they share common goals. The samples are designed to enhance reliability of the estimates through oversampling. Oversampling stratification takes into consideration field and level of S&E degree and demographic characteristics. Increased sample is allocated to women, underrepresented minorities, the disabled, and individuals in the early part of their career.

Sample Design: 1993 National Survey of College Graduates (NSCG)

The sampling frame for the 1993 National Survey of College Graduates (NSCG) was constructed from the 1990 Decennial Census Long Form sample. Sampling was restricted to Long Form sampled individuals with at least a bachelor's degree who, who as of April 1, 1990, were age 72 or younger. A total of 4,728,000 long form sampled individuals met these criteria; 214,643 were selected for the NSCG sample.

The sample design was a two-phase, stratified random sample of individuals with at least a bachelor's degree. Phase 1 consisted of sampling from the Long Form using a stratified systematic sample. Phase 2 consisted of subsampling the Long Form cases, in which a stratified design with probability-proportional-to-size, systematic selection within strata was used. The Long Form sampling weight was used as the size measure in selection to come as close as possible to a self-weighting sample within Phase 2 strata.

Phase 2 strata were defined according to demographic characteristics, highest degree achieved, occupation, and sex. The maximum sampling rate was 3.00 percent, but most strata were sampled at rates of between 2.03 and 2.82 percent. Successively lower rates were used for each of the following groups: whites with bachelor's or master's degrees and employed in a science and engineering (S&E) occupation; nonwhites with bachelor's or master's degrees and employed in a non-S&E occupation; non-foreign-born doctorate recipients; and whites with bachelor's or master's degrees and employed in a non-S&E occupation.

The unweighted response rate for the 1993 NSCG was 78 percent, yielding 148,932 interviews with individuals who had at least a bachelor's degree and identifying an additional 19,224 cases not eligible for interview (e.g., those who were deceased, over 75, not an S&E, no longer living in the U.S.). Interview data were then used to determine whether the respondents fit into SESTAT's target population of scientists and engineers -- a total of 74,693 of the survey respondents fit the description and were incorporated into the SESTAT integrated database.

Sample Design: 1993 National Survey of Recent College Graduates (NSRCG)

The 1993 National Survey of Recent College Graduates (NSRCG) used a two-stage sample design. Educational institutions were sampled in the first stage, and bachelor's and master's graduates were sampled from within these institutions for the second stage. The Integrated Postsecondary Education Data System (IPEDS) was used to construct the sampling frame for educational institutions.

IPEDS is a system of surveys sponsored by the National Center for Education Statistics to collect data from all U.S. educational institutions whose primary purpose is postsecondary education. The frame for the NSRCG was restricted to IPEDS data records associated with four-year U.S. colleges and universities offering bachelor's or master's degrees in one or more S&E fields. Of these institutions, 196 had such large numbers of the nation's S&E graduates that they were selected with certainty.

From the remaining institutions, 79 were selected using systematic, probability- proportional-to-size sampling after the file was sorted by ethnicity, region, public/private status, and presence of agricultural courses. The measures of size were devised to account for the rareness of certain fields of study and for the incidence of Hispanic, African-American, and foreign students.

Each sampled institution was asked to provide a roster of students receiving a bachelor's or master's degree in an S&E field between April 1, 1990 and June 30, 1992. From the 273 selected institutions, 25,785 students were selected using stratified sampling. Sampling rates ranged from 1 in 144 (for those receiving bachelor's degrees in psychology, or degrees in nonspecified fields) to 1 in 2 (for those receiving bachelor's and master's degrees in materials engineering). Of the 25,785 selected students, a total of 19,426 eligible scientists and engineers responded to the 1993 NSRCG. 2,670 sample members were deemed ineligible.

Sample Design: 1993 Survey of Doctorate Recipients (SDR)

The Survey of Doctorate Recipients (SDR) is a longitudinal survey of doctorate recipients. Samples of new cohorts are added to the base sample every two years. The sampling frame for the SDR is constructed from the Doctorate Records File (DRF), a historical database derived from the Survey of Earned Doctorates, an ongoing census of all U.S. doctorate recipients since 1942.

The SDR frame is restricted to two groups: (1) S&E doctorates under 76 years of age who are U.S. citizens and (2) non-U.S. citizens who plan to remain in the U.S. after they receive their degree. For the 1993 SDR, there were 568,726 from the sampling frame, 49,228 of whom were sampled.

A two-phase sample design has been used for the SDR since 1991. Before then, the SDR design was a highly stratified, simple random sample of doctorate S&Es. Strata were defined on the basis of frame information and a "cohort" variable associated with the year the doctorate was received.

Beginning in 1991, the number of strata were reduced primarily by collapsing over the pre-1991 cohorts and then introducing new stratification variables to facilitate oversampling of the disabled and certain minority groups. Also at that time, a new 1991 cohort sample was selected using the Phase 1 stratum definitions and sampling rates. This new cohort was added to the older cohort samples to create the Phase 1 sample for the 1991 SDR and subsequent years. This Phase 1 sample was then restratified using the newer stratum definitions. Because minority and disability information was not known for older cohorts, a combination of frame and survey responses was used to assign members of the older cohorts to Phase 2 strata. These Phase 2 sample cases were then subsampled in 1991 (and to a lesser extent in 1993) to yield the desired sample allocations for each stratum. For the 1993 SDR, the sample for the new cohort (1992-93 graduates) was selected as an independent supplement to the older cohort sample. The new cohort sample was selected using stratified simple random sampling.

The sampling rates and stratum definitions were comparable to those of the Phase 2 older cohort sample. The overall 1993 sampling rate was 8.8 percent, but rates for individual sampling strata ranged from 4.5 percent to 66.7 percent. Strata sampled at 66.7 percent included Native American female doctorate recipients in the earth/ocean/atmospheric sciences and handicapped, female, doctorate recipients in electrical/electronics/communications engineering. Strata with the lowest sampling rates were white males with doctorates in economics or other social sciences. A total of 39,495 eligible scientists and engineers responded to the 1993 SDR.

Sample Design: 1995 National Survey of College Graduates (NSCG)

Subsamples for the 1995 NSCG were drawn from a frame consisting of the combined samples of eligible respondents to the 1993 NSCG and the 1993 National Survey of Recent College Graduates (NSRCG).

Cases that overlapped surveys were removed from the 1995 NSCG frame according to a "unique linkage rule." Those 1993 NSCG cases who had a chance of being selected for the 1993 NSRCG or 1993 Survey of Doctorate Recipients (SDR) were removed from the frame; 1993 NSRCG cases that had a chance of being selected for the 1993 SDR were also removed from the frame; and finally, 1993 NSCG or 1993 NSRCG cases known to have a chance of being selected for the 1995 NSRCG or the 1995 SDR were removed from the frame.

The frame was stratified by demographic group, highest S&E degree, highest S&E major, and sex. A sample consisting of 62,004 individuals for the mail survey was selected using probability- proportional-to-size sampling within these strata. The 1993 analysis weight was used as the size measure. 403 of these cases were deemed ineligible, resulting in an initial sample size of 61,891.

There were 41,522 eligible respondents to the mail survey. Nonrespondents were subsampled for computer-assisted telephone interview (CATI) or computer-assisted personal interview (CAPI) follow-up, again using stratified probability-proportional-to-size sampling. Across all data collection modes, a total of 53,448 eligible scientists and engineers responded to the 1995 NSCG.

Sample Design: 1995 National Survey of Recent College Graduates (NSRCG)

The 1995 design for the NSRCG was similar to the 1993 design. In the two-stage sampling approach, educational institutions were again sampled in the first stage using probability-proportional-to-size sampling, but a composite size measure was designed to facilitate oversampling of rare domains used for the first time in 1995. The 1991-1992 Integrated Postsecondary Education Data System (IPEDS) was used to construct the sampling frame for institutions. The rules for including institutions were the same as the 1993 rules.

One hundred and two institutions were so large that they had to be selected with certainty, and then 173 institutions were sampled from the "less certain" portion of the frame after stratifying by region, public-versus-private, and percentage of S&E degrees. Each sampled institution was asked to provide a roster of students receiving a bachelor's or master's degree in an S&E field between July 1, 1992 and June 30, 1994.

From the 266 responding institutions, 23,771 students were selected using stratified sampling. Strata were defined according to the year the degree was received, major field, degree status, and Native American status. Initial nonrespondents and those who had to be traced were subsampled to yield the desired sample size of 21,000 cases. A total of 16,338 eligible scientists and engineers responded to the 1995 NSRCG. 1,630 sample members were deemed ineligible.

Sample Design: 1995 Survey of Doctorate Recipients (SDR)

The sample design for the 1995 SDR was much like that of the 1993 SDR with some exceptions. In 1995, a sample of new cohorts -- those earning doctorate degrees at U.S. institutions between July 1, 1992 and June 30, 1994 -- was added, and the previous sample of doctorate recipients (degrees received January 1, 1942 to June 30, 1992) was subsampled. The combined sample was about the same size as the 1993 sample. New versus old cohorts were sampled at similar rates within strata defined by demographic group, field of study, and sex.

Probability-proportional-to-size sampling was used to select each stratum sample. The sampling weight was used as the size measure for old cohorts and a value of "1" was used as the size measure for the new cohort population. An initial sample of 49,829 cases was selected for the mail survey, 31,243 of which responded. Nonrespondents were subsampled for CATI follow-up, again using stratified, proportional-to-size sampling procedures. A total of 11,327 mail nonrespondents were followed up by CATI. Across all modes of data collection, 35,370 eligible doctorate recipients completed interviews.

Sample Design: 1997 National Survey of College Graduates (NSCG)

The 1997 NSCG sample was drawn from a frame consisting of eligible respondents to the 1995 National Survey of College Graduates (1993 NSCG and 1993 NSRCG Panel) and the 1995 National Survey of Recent College Graduates (1995 NSRCG Panel). The survey contractors, the Census Bureau and Westat, Inc., administered this survey.

The Census portion of the 1997 NSCG included 45,877 individuals who were initially sent the mail survey. Mail nonrespondents were sent to computer-assisted telephone interview (CATI) or computer-assisted personal interview (CAPI) follow-up. A total of 33,435 cases were deemed complete mail interviews (respondents less ineligibles and noninterviews). The remaining complete interviews were obtained by CATI (7,067) and CAPI (2,004), for a total of 42,506 eligible respondents.

The Westat portion of the 1997 NSCG had a total sample size of 15,048, all of which were sent to CATI. Of these, 12,307 were eligible completes and 485 were ineligible, given an unweighted response rate of 85% for the 1997 cycle (eligible completes as a percent of eligible sample). About 4 percent of the completed Panel surveys were received by mail.

NSCG cases that overlapped the other SESTAT surveys were removed from the 1997 NSCG frame according to a "unique linkage rule." This meant that those 1995 NSCG cases who had a chance of being selected for the 1993 or 1995 or 1997 NSRCG or SDR were removed from the frame, as were individuals now over 75 years of age.

Sample Design: 1997 National Survey of Recent College Graduates (NSRCG)

The 1997 design followed the design of the earlier NSRCG surveys. In the two-stage sampling approach, educational institutions were again sampled in the first stage using probability-proportional-to-size sampling. The institution sample was the same as used for the 1995 cycle, with 102 certainty selections and 173 selected with probability proportional to size. Of the 275 institutions, 1 was ineligible and 274 responded (100% response rate).

Each sampled institution was asked to provide a roster of students receiving a bachelor's or master's degree in an S&E field between July 1, 1994 and June 30, 1996. 14,057 graduates were sampled. Of these, 10,452 were eligible and completed the survey and 1,032 were ineligible. The unweighted response rate was 82%. This was both the graduate level response rate and the overall response rate because the institution response rate was 100%. Of the completed surveys, about 3 percent were received by mail.

Sample Design: 1997 Survey of Doctorate Recipients (SDR)

The sample design for the 1997 SDR was much like that of the 1993 and 1995 SDR. In 1997, an oversample of new cohorts -- those earning doctorate degrees at U.S. institutions between July 1, 1994 and June 30, 1996 -- was added, and the previous sample of doctorate recipients (degrees received January 1, 1942 to June 30, 1994) was subsampled. The combined sample was 55,367. New and old cohorts stratified by demographic group, field of study, and sex.

Probability-proportional-to-size sampling was used to select each stratum sample. For strata consisting of rare groups, cases were selected with certainty to maintain sufficient sample size for analysis. An initial sample of 55,367 cases was selected for the mail survey, of which 38,309 responded. Of these cases, 35,667 were deemed complete interviews, with the remainder either permanently or temporarily out-of-scope. Nonrespondents, were subsampled for CATI follow-up, based on the assignment of permanent random number (PRNs) to all cases in the sample then subsampling was performed only on eligible pending cases. A total of 15,809 mail nonrespondents were followed up by CATI. CATI data collection generated 8,285 complete interviews. Across all modes of data collection, 35,667 eligible doctorate recipients completed interviews.

Sample Design: 1999 National Survey of College Graduates (NSCG)

The 1999 NSCG sample was drawn from a frame consisting of eligible respondents to the 1995 National Survey of College Graduates (1993 NSCG, 1993 NSRCG Panel, 1995 NSRCG Panel) and the 1997 National Survey of Recent College Graduates (1997 NSRCG Panel). Two survey contractors, the Census Bureau and Westat, Inc., administered this survey.

The Census portion included 39,989 individuals who were initially sent the mail survey. Mail nonrespondents were sent to computer-assisted telephone interview (CATI) or computer-assisted personal interview (CAPI) follow-up. A total of 28,275 cases were deemed complete mail interviews (respondents less ineligibles and noninterviews). The remaining complete interviews were obtained by CATI (7,273), for a total of 35,548 eligible respondents, giving an unweighted response rate of 90%.

The Westat portion had a total sample size of 14,527, all of which were sent to CATI. Of these, 11,397 were eligible completes and 357 were ineligible, giving an unweighted response rate of 81% for the 1999 cycle (eligible completes as a percent of eligible sample). About 4 percent of the completed Panel surveys at Westat were received by mail, and another 9% were completed on the Web.

NSCG cases that overlapped the other SESTAT surveys were removed from the 1999 NSCG frame according to a "unique linkage rule." This meant that those 1999 NSCG cases who had a chance of being selected for the 1993, 1995 1997, or 1999 NSRCG or SDR were removed from the frame, as were individuals now over 75 years of age.

Sample Design: 1999 National Survey of Recent College Graduates (NSRCG)

The 1999 design followed the design of the earlier NSRCG surveys. In the two-stage sampling approach, educational institutions were again sampled in the first stage using probability-proportional-to-size sampling. The institution sample was the same as used for the 1997 cycle, with 102 certainty selections and 173 selected with probability proportional to size. Four institutions were added to the sample with certainty to compensate for the undercoverage problem. Of the 279 institutions, 1 was ineligible, 1 declined to provide a graduate list, and 277 responded (99.6% unweighted response rate).

Each sampled institution was asked to provide a roster of students receiving a bachelor's or master's degree in an S&E field between July 1, 1996 and June 30, 1998. 13,918 graduates were sampled. Of these, 9,984 were eligible and completed the survey and 987 were ineligible. The unweighted response rate was 78.8%. The overall response rate was 78.5%, the product of the unweighted institutional response rate of 99.6% and the unweighted graduate rate of 78.8%. Of the completed surveys, about 5 percent were received by mail.

Sample Design: 1999 Survey of Doctorate Recipients (SDR)

The 1999 SDR sample design divides the 1999 SDR sampling frame cases into three mutually exclusive groups: the old cohort, the nearly new cohort, and the new cohort. These groups were defined by the doctoral degree academic years. Frame cases with doctoral degrees earned prior to July 1, 1992 were included in the old cohort, cases with doctoral degrees earned between July 1, 1992 and June 30, 1996 were included in the nearly new cohort, and cases earning a doctoral degree between July 1, 1996 and June 30, 1998 were included in the new cohort.

1999 SDR total sample size was 40,000 and 4,000 of the total sample consisted of the new cohorts to ensure that the sampling rate of the new cohort was at least 15 percent higher than that of the old cohort. The remaining 36,000 sample cases were then divided so that the nearly new cohort would have a 10 percent higher sample allocation than the old cohort.

The 1999 SDR used a stratified design, where strata were defined by demographic group, degree field, and sex.  The strata were formed by the multiway cross of these variables. The number of sample cases were allocated to be selected from each stratum.  The sample allocation followed a seven-step process.  For strata where the allocated sample size was equal to the frame size, all cases were selected for sample.  For all other strata, sample cases were selected using the probability to size (PPS) selection method separately for each cohort group (with the sampling weights as the size measure).

From an initial sample of 40,000 cases, 27,269 responded by mail. Of these cases, 26,216 were deemed complete interviews, with the remainder either permanently or temporarily out-of-scope. Nonrespondents in the mail phase --14,407 -- were followed by CATI. CATI data collection generated 5,102 complete interviews. Across all modes of data collection, 31,318 eligible doctorate recipients completed interviews.

Sample Design: 2001 National Survey of Recent College Graduates (NSRCG)

The 2001 design followed the design of the earlier NSRCG surveys. In the two-stage sampling approach, educational institutions were again sampled in the first stage using probability-proportional-to-size sampling. In addition to the same institution sample used for the 1999 cycle (106 certainty selections and 173 selected with probability proportional to size), one institution was added to the sample with certainty to deal with the undercoverage. Of the 280 institutions, 2 were ineligible, 2 declined to provide graduate lists, and 276 responded (99.3% response rate).

Each sampled institution was asked to provide a roster of students receiving a bachelor's or master's degree in an S&E field between July 1, 1998 and June 30, 2000. 13,516 graduates were sampled. Of these, 9,887 were eligible and completed the survey and 937 were ineligible. The unweighted response rate was 80.1%. The overall response rate was 79.5%, the product of the unweighted institutional response rate of 99.3% and the unweighted graduate rate of 80.1%. Of the completed surveys, about 7 percent were received by mail.

Sample Design: 2001 Survey of Doctorate Recipients (SDR)

The 2001 SDR sample design divides the 2001 SDR sampling frame cases into three mutually exclusive groups: the old cohort, the nearly new cohort, and the new cohort. These groups were defined by the doctoral degree academic years. Frame cases with doctoral degrees earned prior to July 1, 1994 were included in the old cohort, cases with doctoral degrees earned between July 1, 1994 and June 30, 1998 were included in the nearly new cohort, and cases earning a doctoral degree between July 1, 1998 and June 30, 2000 were included in the new cohort.

2001 SDR total sample size was 40,001 and 4,000 of the total sample consisted of the new cohorts to ensure that the sampling rate of the new cohort was at least 15 percent higher than that of the old cohort. The remaining 36,001 sample cases were then divided so that the nearly new cohort would have a 10 percent higher sample allocation than the old cohort.

The 2001 SDR used a stratified design, where strata were defined by demographic group, degree field, and sex.  The strata were formed by the multiway cross of these variables. The number of sample cases were allocated to be selected from each stratum.  The sample allocation followed a seven-step process.  For strata where the allocated sample size was equal to the frame size, all cases were selected for sample.  For all other strata, sample cases were selected using the probability to size (PPS) selection method separately for each cohort group (with the sampling weights as the size measure).

From an initial sample of 40,001 cases, 26,702 responded by mail. Of these cases, 25,814 were deemed complete interviews, with the remainder either permanently or temporarily out-of-scope. Nonrespondents in the mail phase --13,086 -- were followed by CATI. CATI data collection generated 5,552 complete interviews. Across all modes of data collection, 31,366 eligible doctorate recipients completed interviews.


Data Collection

The Survey Questionnaires

The questionnaires in each of the component surveys were largely the same -- roughly 90 percent of the questions were identical. The remaining questions were survey-specific; that is, they collected information relevant only to that survey's population. Each year, the NSCG and SDR surveys used a mixed-mode approach, beginning with a self-administered mail questionnaire. These questionnaires were carefully designed to be as "mode-neutral" as possible to ensure that the mode (self-administered, telephone, or in-person) did not influence a person's responses.

The draft 1993, 1995 and 1997 mail questionnaires were pretested in focus groups. Questionnaires were distributed at the start of the focus group, and the participants were asked to complete the questionnaire as if it had just arrived in the mail. Once the participants had completed their questionnaire, the focus group moderator used a retrospective "Think Aloud" approach to probe for any problems the participants experienced while completing the questionnaire.

See SESTAT Variable Crosswalks and Survey Instruments to view facsimiles of the survey questionnaires.

Mode of Administration/Response Rates

Mode of administration refers to how a survey is conducted -- by mail, by telephone, or in person. The National Survey of College Graduates (NSCG) and Survey of Doctorate Recipients (SDR) are mixed-mode surveys, while the National Survey of Recent College Graduates (NSRCG) is primarily conducted as a telephone survey. More specifically:


Editing Guidelines and Procedures

The three SESTAT surveys were conducted by three different survey data collection contractors, depending on the year. As a consequence, NSF developed standardized guidelines so that all contractors used the same editing procedures for their respective surveys. Certain data were deemed "critical data elements" that needed to be complete and consistent before a data record was considered complete.

After status determination and data entry, general editing was performed and the "best coding" and "other, specify" coding procedures were completed. The editing rules include (1) valid code range edits, (2) skip error edits, (3) mark one edits for question with more than one response marked, and (4) consistency edits. Procedures were developed for general editing rules such as distinguishing between questions that had "refused," "don't know," or "blank" responses; for rounding rules for decimals or fractions; for missing data on questions with a series of "yes/no" responses; for number of employees; for coding primary and secondary work activities, and most and second-most important reason for working outside field of highest degree; and for most important reason for attending training.

Occupation and Education "Best" Coding

The purpose of "best coding" procedures for occupation and degrees was to correct respondent reporting problems. Typical respondent errors included such things not making a code entry or not reviewing the entire list before making a selection. Recurring problems were found with respondents distinguishing managerial and teaching occupations

Special coding procedures were developed to increase the data quality and comparability for occupation and education codes. In most cases, the respondents self-selected their occupation and education categories from job and education code lists at the end of the questionnaire. The remainder were chosen by CATI respondents through a series of questions that began with the broad categories and narrowed the selection to the specific category.

The special coding procedures focused on correcting respondent reporting errors. However, an important "best coding" rule was that the coder should not change the respondent chosen code unless there was clear evidence that the respondent's choice was incorrect.

During "best coding" the coder reviewed a variety of respondent-provided information and used standardized references and procedures.

The "best code" for occupation was determined by reviewing factors such as

 The "best code" for a degree held by the respondent was determined by using one of two "flow charts." One flow chart outlined the procedures for dealing with verbatim responses that list one major field of study. The other flow chart outlined the coding procedures for a response that gave more than one field of study. These flow charts standardized the coding procedures and gave special procedures for situations involving exact verbatim matches; handling single, broad, or nonspecific field matches; and rules for assigning the most specific NSF education code. "Best codes" for education were assigned after determining whether the respondent selected a code that was too general, transposed the code numbers, or wrote the numbers incorrectly. Education codes were not "best-coded" in three cases: if the respondent-selected code was more specific than the respondent verbatim and both verbatim and code are in the same field; if the verbatim response was more specific than the self-selected code, and both were in the same field; or if the verbatim response and the selected code fell under the same broad educational category. Only when it was evident that the self-code was incorrect was a "best code" assigned.

"Other, Specify" Coding

The purpose of editing "other, specify" responses was to identify responses that belonged in specific existing categories. This procedure is called "back-coding." "Other, specify" responses often fell into one of the following categories: an existing response category; legitimate "other" response; or not a legitimate response (i.e. does not answer the question). Other responses that fell into the first category were back-coded.


Missing Data Imputation

A completed interview was defined as a questionnaire in which all designated "critical" questions, such as degrees received and occupation, were answered. When possible, telephone follow-up was used to obtain answers to critical items for otherwise complete questionnaires. (See "Editing Guidelines and Procedures" for further details.)

Except for items with verbatim responses, missing data for noncritical items was replaced or "imputed." Imputation was not begun until after all logical editing was completed. The specific procedure used to impute missing data is known as sequential hot deck imputation. Hot deck imputation replaces missing values for a particular data item with an existing response from another data record associated with another individual's data record (the "donor") who is considered to be "similar" to the individual whose record has the missing value (the "recipient.") In sequential hot deck imputation the donor record is typically the nearest record with an existing response and that is similar to the recipient.

To ensure that adjacent data records were similar, the records for each component survey were grouped into imputation classes on the basis of variables thought to be strongly or even uniquely associated with the data item to be imputed. A donor record was selected only from those records that belonged to the same imputation class as the recipient record.

Before imputation, data records within each imputation class were also sorted by variables thought to be associated with both the answer for the data item and the propensity for nonresponse to the data item. Serpentine sorting was used as it ensured that adjacent data records were as similar as possible. In serpentine sorting, the sort order is reversed as boundaries are crossed for higher level sort variables.


Weighting Strategy

Unbiased survey estimates depend on estimation procedures that incorporate the selection probabilities for each sampling unit. Selection probabilities for the SESTAT surveys vary greatly from unit to unit because of the extensive oversampling used to facilitate analyses of smaller populations and less common fields of study. Nonresponse and undercoverage can also bias estimates with respect to the population of interest. In the SESTAT data, some of these idiosyncracies associated with survey data analysis were removed by constructing sampling weights -- for each survey -- that reflect differential selection probabilities and by adjusting these weights to compensate for nonresponse and undercoverage in each survey.

Sampling weights were defined as the reciprocal of the probability of selection for each sampled units, and the weights were adjusted by using weighting class or poststratification adjustment procedures. The final adjusted sampling weights become the analysis weights, which have been added to each individual's record in the survey database (as "Z_WEIGHTING_FACTOR_SURVEY"). These weights should be used only in making estimates for the individual surveys.

In the 1993 National Survey of College Graduates (NSCG), poststratification adjustment was used to force the sampling weights for survey respondents to the 1990 Decennial Census Long Form sample estimates. In the 1993 National Survey of Recent College Graduates (NSRCG), the weighting class for the sampling weight was adjusted for nonresponse; a ratio adjustment was also made to reflect known proportions in the population. In the 1993 Survey of Doctorate Recipients (SDR), the weighting class for the sampling weight was adjusted for nonresponse. Similar procedures were followed in developing analysis weights for 1995, 1997, 1999, and 2001.

The analysis weights varied substantially across and within the component surveys, ranging from 1 to 436 for SESTAT as a whole in 1993, 1 to 734 in 1995, , 1 to 884 in 1997 and 1 to 878 in 1999. The median weights were 59, 71, 79 and 85 for 1993, 1995, 1997 and 1999, respectively. The larger weight variation in the later years resulted from the subsampling of mail nonrespondents for CATI/CAPI follow-up.

Each survey database was designed to be combined with the other two surveys to capture the advantages of a larger sample size and greater coverage of the target population. However, combining the three databases meant addressing the issue of cross-survey multiplicity. Scientists and engineers in SESTAT could belong to the surveyed population of more than one component survey, depending upon their degrees and when they received them. For instance, a person with a bachelor's at the time of the 1990 Census who went on to complete a master's degree in 1991 could be selected in the 1993 NSCG and the 1993 NSRCG.

The following unique-linkage rule was devised to remove these multiple selection opportunities: each member of SESTAT's target population is uniquely linked to one and only one component survey, and that individual is included in SESTAT only when he or she is selected for the linked survey.

As a result, each person had only one chance of being selected into the combined SESTAT database. Cases with multiple selection opportunities were first linked to the SDR and then to the NSRCG if the case was not also linked to the SDR. Sampled individuals for each component survey were examined to determine which other component surveys (if any) they could have been selected for. In the NSCG, sampled individuals who also had a chance of being selected for the NSRCG or the SDR in that year were assigned zero as their SESTAT analysis weight. Similarly, sampled individuals in the NSRCG who also had a chance of being selected for the SDR in that year were assigned zero as their SESTAT analysis weight. The component survey's analysis weight for all other cases was brought over as the SESTAT analysis weight. The SESTAT weight on the database (called "Z_WEIGHTING_FACTOR") should be used when analyzing SESTAT data derived from the three component surveys.



Go to SESTAT Home Page' Go to NSF Home Page Go to SRS Home Page
Updated: October 30, 2003