Help - Statistical Terminology - NSF, Division of Science Resources Studies

A | B | C | D | E | F | G | H | I | J | K | L | M
N | O | P | Q | R | S | T | U | V | W | X | Y | Z

Bias

A tendency to underestimate or overestimate a population value of interest.

Census

An enumeration of the total population of interest. Since no sample is selected from the population, there is no sampling error. However, nonsampling errors are still possible in a census.

Coefficient of variation of sample estimates (C.V.)

The ratio of the standard error for an estimate to the mean value of the estimate. This is used to measure the imprecision in survey estimates introduced by sampling. A coefficient of variation of 1 percent would indicate that an estimate could vary slightly due to sampling error, while a coefficient of variation of 50 percent means that the estimate is very imprecise. The most common way to improve the coefficient of variation requires increases in sample size that are typically expensive to accomplish.

Cooperation rate

The percentage of in-scope individuals (or organizations) who complete a survey after being contacted. The denominator for the cooperation rate excludes individuals (or organizations) whom one has tried unsuccessfully to contact. Thus, the cooperation rate for a survey will be higher than its response rate unless all selected individuals (or organizations) are contacted.

Coverage

The extent of correspondence between the target population and the sampling frame. Ideally, all members of the target population are included in the sampling frame. However, this is infrequently the case for major surveys. Coverage is rarely estimable in precise terms; however, survey designers are usually aware of the likely reasons for undercoverage and can often estimate the extent of the problem. In addition to the problem of undercoverage (missing population members), sampling frames can suffer from overcoverage, i.e., the inclusion of units that do not belong on the sampling frame and/or the listing of a given unit more than once. These problems are usually correctable. Duplicate listings are either deleted prior to sample selection or are corrected for by appropriate statistical adjustments. Listings that are not in-scope according to the survey definition are typically deleted during data collection or analysis and corresponding statistical adjustments are made to estimate the likely extent of out-of-scope cases among the survey nonrespondents.

Estimation procedures

Procedures followed in making population estimates from the survey responses.

Imputation

The process by which one estimates missing values for items that a survey respondent failed to provide.

Item nonresponse

The failure of a respondent to answer a particular item on the survey. When item nonresponse is high and respondents and nonrespondents differ substantially, item nonresponse can be a serious threat to the accuracy of the estimates. Imputation techniques can be used to reduce the impact of this problem, but the extent to which they are effective is difficult to determine.

Measurement error

The extent to which there are discrepancies between survey results and the true value of what the survey researcher is attempting to measure. There are several possible sources of error here. Respondents may report inaccurate information because they do not have the required information, due to carelessness, or because they do not understand the question asked. Alternately, respondents may provide accurate information, but errors are introduced in the data processing stage due to keypunching, coding, or programming errors. Since it is often not possible to determine the "true value" of what one is trying to measure, precise estimates of measurement error are usually not possible. However, techniques exist for obtaining some information about the likely extent of measurement error. For example, information reported by individuals may be compared with appropriate institutional records on the individual.

Microdata

Nonaggregated data about the units sampled. For surveys of individuals, microdata contain records for each individual interviewed; for surveys of organizations, the microdata contain records for each organization.

Multimodal survey

A survey in which more than one data collection mode was used, e.g., a mix of mail and phone data collection. This approach is often used in large surveys because mail data collection is cheaper than phone but response rates are typically too low to meet desired levels. Mail nonrespondents are surveyed by phone. The major problem with this approach is that the mode of data collection may produce different answers. This can potentially lead to incorrect inferences about the associations among variables.

Out-of-scope

Sampling units that are not part of the population of interest. For example, in the National Survey of Recent College Graduates, only individuals who received a bachelor's or master's degree within a specified time frame are of interest. If an educational institution provided the name of an individual who failed to graduate, the individual would be considered out-of-scope for the survey. Information on this individual would not be included in the final estimates from the survey.

Population

The individuals or organizations of interest in a given survey. In sample surveys one makes inferences about the population from the sample selected.

Probability proportional to size (pps)

A sampling technique in which the probability of a unit's being selected is based on a measure of size. For example, if the measure of size is expenditures, organizations with high expenditures are selected with higher probability than organizations with low expenditures.

Respondent

The individual or organization providing the information requested in the survey. The type of respondent influences what type of information can be obtained, e.g., individuals completing a degree may provide different information about the degree than a representative of the academic institution granting the degree would provide.

Response rate

Indicates the percentage of sample members who provided information in response to being surveyed. Care in interpreting response rates is necessary, because there is not one single uniformly accepted measure of response rate. One common measure, used extensively in demographic surveys, is the percentage of in-scope sample members who responded to the survey. In surveys that focus on estimating expenditures, the response rate is often calculated as the percentage of the total expenditures represented by responding sample members. This measure is often referred to as a weighted response rate (though weighting may also be used to adjust for different probabilities of sample selection).

Sample

The individuals or organizations selected to represent the population.

Sample design

The procedures used in selecting the sample. These procedures can be as simple as randomly selecting a certain percentage of the cases. However, more complex designs are frequently used in order to obtain reliable information about a particular group(s) of interest and/or to minimize the cost of obtaining the information desired.

Sample frame

Those individuals or organizations from which one selects the actual sample for the survey. Ideally, the sample frame is the same as the target population. In reality, however, there are often differences.

Scope of survey

The population to which the researcher plans to generalize his or her results. The scope of the survey may be limited by both theoretical and practical considerations. For example, while it may be of theoretical interest to obtain information on the characteristics of institutionalized individuals, practical difficulties often lead researchers to declare such individuals out-of-scope for a survey. Out-of-scope cases may be eliminated at the time of sample frame construction or during data collection or data processing.

Standard error

This is a commonly used measure of how precisely one can estimate a population value from a given sample. For large sample surveys, a reasonable interpretation of the standard error is that approximately 68 percent of the time the sample estimate will be within one standard error of the population value. For example, if one estimates that the mean income for individuals within a specified group is $30,000 with a standard error of $5,000, one would be right 68 percent of the time in assuming that the true (or population) mean income for the group is between $25,000 and $35,000.

Subsample

A sample selected from a sample frame that is itself a sample of a larger population. Often the original sample is used to identify individuals or organizations of interest or is used to sort units into groups to be sampled at different rates.

Stratification

A sampling technique in which sampling is done separately for separate parts of the population. Stratification is often used to ensure that one has an adequate number of sampling units with relatively rare characteristics (e.g., stratification may be done on race/ethnic status if one wishes to make comparisons among racial/ethnic groups).

Target population

Those individuals or organizations about whom one wishes to make inferences on the basis of the survey results.

Two-stage sample

A sample selected in two steps. In one common type of two-stage sample, the first stage consists of a sample of organizations of interest and the second stage consists of individuals within organizations.

Unit nonresponse

The failure of an individual or organization to respond to the survey. When unit nonresponse is high and respondents and nonrespondents differ substantially, unit nonresponse can be a serious threat to the accuracy of a survey. There are statistical techniques that can be used to reduce the impact of this problem, but all rest on assumptions about the characteristics of missing units that are difficult to evaluate without expensive additional data collection.