survey methodology

Survey Methodology:
Survey of Industry Research and Development

Overview top

a. Purpose

The Survey of Industry Research and Development is the primary source of information on R&D performed by industry within the fifty United States and the District of Columbia. The results of the survey are used to assess trends in R&D expenditures. Government agencies, corporations, and research organizations use the data to investigate productivity determinants, formulate tax policy, and compare individual company performance with industry averages. Individual researchers in industry and academia use the data to investigate a variety of topics and while preparing professional papers, dissertations, and books. The usefulness of the information collected in this survey is enhanced by the linkage of the data file to the Census Bureau's Longitudinal Establishment Data file, which contains information on the outputs and inputs of companies' manufacturing plants. Further, total R&D expenditure statistics are used by the Bureau of Economic Analysis for inclusion in their System of National Accounts and Foreign Direct Investment programs. Prior to 2001, completion of four items on the questionnaire were mandated by law: sales, total number of employees, total R&D, and Federally funded R&D. Beginning in 2001, response to the item that asks for the distribution of total R&D by state also was required.

b. Respondents

The Survey of Industrial Research and Development is an annual sample survey that intends to include or represent all for-profit R&D-performing companies, either publicly or privately held. The survey is completed by representatives at manufacturing and nonmanufacturing companies known to conduct R&D and by representatives from samples of companies in both sectors that may conduct R&D.; A company is defined as one or more establishments under common ownership or control. In some cases representatives at the establishment level return forms, so that more than one form per company is returned. In these cases, company data are aggregated during processing.

c. Key variables

2. Survey Design top

a. Target population and sample frame

The target population consists of all industrial companies that perform R&D in the United States. The Standard Statistical Establishment List (SSEL), a Bureau of the Census compilation that contains information on more than 3 million establishments with paid employees, is the target population from which the frame used to select the survey sample is created. For companies with more than one establishment, data were summed to the company level. The frame from which the survey sample is drawn includes all for-profit companies classified in nonfarm industries. For surveys prior to 1992, the frame was limited to companies above certain size criteria based on number of employees. These criteria varied by industry. Some industries were excluded from the frame because it was believed that they contributed little or no R&D activity to the final survey estimates. For the 1992 sample, new industries were added to the frame, and the size criteria were lowered considerably and applied uniformly to firms in all industries. As a result, nearly 2 million enterprises with 5 or more employees are given a chance of selection for the annual samples. For comparison, the frame for the 1987 sample included 154,000 companies of specified sizes and industries.

b. Sample design

Prior to the 1999 survey, each firm was assigned a single Standard Industrial Classification (SIC) code based on the activity of the establishment having the highest dollar value of payroll. This assignment was done on a hierarchical basis. The enterprise was first assigned to the economic division (manufacturing or nonmanufacturing) based on the aggregated payroll of its establishments, then to the 2-digit SIC code with the highest payroll within the assigned division, then to the 3-digit SIC code with the highest payroll within the assigned 2-digit industry. For surveys after 1999, data were summed to the company level, and each company then was assigned a single North American Industrial Classification System (NAICS) code based on payroll. The method used followed the hierarchical structure of the NAICS. The company was first assigned to the economic sector, defined by a 2-digit NAICS code representing manufacturing, mining, trade, etc., that accounted for the highest percentage of its aggregated payroll. Then the company was assigned to a subsector, defined by a 3-digit NAICS code, that accounted for the highest percentage of its payroll within the economic sector. Finally, the company was assigned a 4-digit NAICS code within the subsector, again based on the highest percentage of its aggregated payroll. Assignment below the 4-digit level was not done because of the concentration of R&D in relatively few industries and disclosure concerns.

Sampling Strata. A fundamental change initiated in 1995 and repeated for the 1996-1998 samples was the redefinition of the sampling strata. For the survey years 1992-94, 165 sampling strata were established, each stratum corresponding to one or more 3-digit-level SIC codes. The objective was to select sufficient representation of industries to determine whether alternative or expanded publication levels were warranted. Starting with the 1995 survey, 40 publication levels were defined (25 manufacturing and 15 nonmanufacturing) and the sampling strata based on 3-digit SIC codes were set to correspond to the publication level industry aggregations. Beginning with the 1999 survey and the conversion to NAICS, 29 manufacturing and 20 nonmanufacturing strata were defined corresponding to the 4-digit industries and groups of industries for which statistics were developed and published.

Certainty Companies. Before 1994, companies with 1,000 or more employees had been selected with certainty, but it was observed that the level of spending varied considerably and that many of these companies reported no R&D expenditures each year. For these reasons, it was determined that these companies should be given chances of selection based upon the size of their R&D spending if they were in the previous survey or upon an estimated R&D value if they were not. Consequently, the size criteria were dropped for surveys after 1994. The criteria based on the estimated amount of R&D spending for identifying companies selected for the survey with certainty was set at $1 million for 1995. With a fixed total sample size, there was concern that the representation of the very large noncertainty universe by a smaller sample each year would be inadequate. To limit the growth occurring each year in the number of certainty cases within the total sample, the certainty criterion was raised for the 1996 survey to $5 million in total R&D; expenditures and has remained the same for subsequent surveys.

Frame Partitioning. Partitioning of the frame for noncertainty companies into large and small companies was first introduced in 1994 because of concern arising from a study of 1992 survey results which showed that a disproportionate number of small companies was being selected for the sample, and often assigned very large weights. These small companies seldom reported R&D activity. This disproportion was a result of the minimum probability rule used as part of the independent probability proportionate to size (pps) sampling procedure employed exclusively prior to 1994. This rule increased the probabilities of selection for several hundred thousand smaller companies. For the 1994 and subsequent surveys, simple random sampling (srs) was applied to the small company partition causing the smaller companies to be sampled more efficiently than with independent pps sampling since there was little variability in their size. The large company partition continued to be sampled using independent pps sampling until 1998 when fixed sample size pps sampling was begun. For the 1994 and 1995 surveys, total company payroll was the basis for partitioning the noncertainty frame. For each industry grouping, the largest companies representing the top 90 percent of the total payroll for the industry grouping were included in the pps frame. The balance, the smaller companies comprising the remaining 10 percent of payroll for the industry grouping, was included in the srs frame. Beginning with the 1996 survey, total company employment became the basis for partitioning the frame. The total company employment levels defining the partitions were based on the relative contribution to total R&D expenditures of companies in different employment size groups in both the manufacturing and nonmanufacturing sectors. In the manufacturing sector, all companies with total employment of 50 or more were included in the large company partition. In the nonmanufacturing sector, all companies with total employment of 15 or more were included in the large company partition. Companies in the respective sectors with employment below these values were included in the small company partition. These counts in the 2000 survey were about 632,000 in the large company partition and 1.3 million in the small company partition. In the 2001 survey, the were about 650,000 companies and approximately 1.3 million companies, respectively.

"Zero" industries. One final modification in the frame development for 1996, which was repeated for the 1997 and 1998 surveys, was the designation of "zero" industries in the large company partition. Zero industries were those three-digit SIC industries having no R&D expenditures reported in survey years 1992-98. These industries remained within the scope of the survey, but only a limited sample was drawn from them because it was unlikely that these industries conducted R&D. Simple random sampling was used to control the number of companies selected from these industries. For the 1999 through 2001 surveys, no zero industries were defined because of the conversion to NAICS. For the next several cycles of the survey, NAICS industries will be evaluated to ascertain if any of them should be designated "zero" industries.

Sample Selection. In 1996 a significant revision in the procedure for selecting samples from the partitions led to a change in the development and presentation of estimates. The revised procedure was repeated for subsequent surveys. In 1995 the sample of companies from the large company partition was selected using probability proportionate to size sampling in each of the 40 strata. Likewise, the simple random sampling of the small company partition was done for each of the 40 strata. However, beginning in 1996, the number of strata established for the small company partition was reduced to two. One stratum consisted of small companies classified in manufacturing industries and the second stratum consisted of small companies classified in nonmanufacturing industries. Simple random sampling continued as the selection method for these two strata. The purpose of selecting the small company panel from these two strata was to reduce the variability in industry estimates largely attributed to the random year-to-year selection of small companies by industry and the high sampling weights that sometimes occurred. As a consequence of this change, estimates for industry groups within manufacturing and nonmanufacturing were not made from these two strata. The statistics for the detailed industry groups were based only on the sample from the large company partition. Estimates from the small company partition were included in statistics for total manufacturing, total nonmanufacturing, and all industries. For completeness, in the affected tables for 1996-1998 the estimates also were added to the categories "other manufacturing" and "other nonmanufacturing." For 1999 and 2000, the estimates were published separately in the "small manufacturing companies" and "small nonmanufacturing companies" categories. For 2001, the sampling of the small companies was again limited to the two strata. However, because of the increasing growth of the R&D activity observed in these strata, it was decided to code the reporting companies into their appropriate industries and include them in the industry estimates as was done prior to 1996. As a result the "small manufacturing companies" and "small nonmanufacturing companies" categories were eliminated. In future sampling operations, it is probable that the small companies will again be partitioned into and sampled by the industry strata so as to insure their representation in all industries. Increased year-to-year variability of the industry estimates is to be expected.

c. Data collection techniques

The survey is conducted by the Census Bureau in accord with an interagency agreement with SRS. Survey instruments are mailed to company representatives in March each year with a request that they be completed by May 15. Five mail follow-ups are conducted. Phone follow-up is used with the 300 largest nonrespondent R&D; performers (as determined by expenditures reported in previous surveys) that have not filed a request for extension. Two questionnaires are used each year to collect data for the survey. Known large R&D performers are sent a detailed questionnaire, Form RD-1. The Form RD-1 requests data on sales or receipts, total employment, employment of scientists and engineers, expenditures for R&D performed within the company with Federal funds and with company and other funds, character of work (basic research, applied research, and development), company-sponsored R&D expenditures in foreign countries, company-funded R&D performed by other organizations, Federally funded R&D by funding agency, R&D costs by type of expense, domestic R&D expenditures by State, energy-related R&D and foreign R&D by country. Because companies receiving the Form RD-1 have participated in previous surveys, computer-imprinted data reported by the company for the previous year are supplied for reference. Companies are encouraged to revise or update this imprinted data if they have more current information, however prior-year statistics that had been previously published are revised only if large disparities were reported. Small R&D performers and firms included in the sample for the first time were sent Form RD-1A. This form collects the same information as Form RD-1 except for five items: Federal R&D support to the firm by funding agency, R&D costs by type of expense, domestic R&D expenditures by State, energy-related R&D, and foreign R&D by country. It also includes a screening item that allows respondents to indicate they do not perform R&D. No prior-year information is made available since the majority of the companies that receive the Form RD-1A have not been surveyed previously.

d. Estimation techniques

For various reasons, many firms chose to return the survey questionnaires with one or more blank items. For some firms, internal accounting systems and procedures may not have allowed quantification of specific expenditures. Others may have refused to answer any voluntary questions as a matter of company policy. When respondents did not provide the requested information, estimates for the missing data were made using imputation algorithms. In general, the imputation algorithms computed values for missing items by applying the average percentage change for the target item in the nonresponding firm's industry to the item's prior-year value for that firm, reported or imputed. This approach, with minor variation, was used for most items.

3. Survey Quality Measures top

a. Sampling variability

The sample is designed to produce coefficients of variation of 2 percent for industries designated as "high priority" industries and 5 percent for other industries. The designation of "high priority" is assigned when prior surveys have identified an industry as one in which there is a large amount of R&D expenditures.

b. Coverage

Coverage error constitutes a possible source of error for the survey because the SSEL is undoubtedly missing some in-scope companies, especially relatively new ones. It should be noted that coverage errors for surveys prior to 1992 were more likely, because not all companies on the SSEL were subject to selection. The Census Bureau continually strengthens and updates the SSEL so that coverage error is minimized.

c. Nonresponse

(1) Unit nonresponse - Of the companies surveyed for 2001, 16.8 percent did not respond. Nonresponse studies of companies that do not respond to the survey are conducted periodically to improve response rates in future surveys. Overall, the magnitude of unit nonresponse bias is manageable because even if no response can be elicited from a company, other sources of information about the company are used to estimate its R&D data.

(2) Item nonresponse - Companies are encouraged to estimate information when actual data are unavailable. Even so, item nonresponse rates for key data elements in the survey can be high. When estimates are not reported and cannot be elicited by following up with the respondent, complex, comprehensive imputation techniques developed over the survey's long history are used to minimize the effects of item nonresponse. Imputation rates for the key source of funding elements for 2001 ranged from 1.1 percent to 68.2 percent.

d. Measurement

Variations in respondent interpretations of the definitions of R&D activities and variations in accounting procedures are of particular concern. Specifically, some companies have difficulty separating basic research from applied research, locating geographically where R&D is performed, and reporting the cost of energy R&D. The sophistication and comprehensiveness of a company's accounting system often depends on its size and activities and its willingness to accommodate Government-sponsored surveys. Work was conducted in the mid-1990's, using cognitive lab approaches, to evaluate ways in which the form could be modified to ease reporting difficulties and reduce measurement error. Recommendations resulting from that work continue to be incorporated into the survey questionnaires. Other ongoing efforts to minimize measurement error include questionnaire pre-testing, improvement of questionnaire wording and format, inclusion of more cues and examples in the questionnaire instructions, consultations with respondents, post-survey evaluations, record check studies, and computer editing.

4. Trend Data top

The statistics resulting from this survey are better indicators of changes in, rather than absolute levels of, R&D spending and personnel. Nevertheless, the statistics are often taken to be a continuous time series prepared using the same collection, processing, and tabulation methods. Such uniformity has not been the case. Since the survey was first fielded, improvements have been made to increase the reliability of the statistics and to make the survey results more useful. To that end, past practices have been changed and new procedures instituted. Preservation of the comparability of the statistics has, however, been an important consideration in making these improvements. Nonetheless, changes to survey definitions, the industry classification system, and the procedure used to assign industry codes to multi-establishment companies have had some, though not substantial, effects on the comparability of statistics. The aspect of the survey that had the greatest effect on comparability was the selection of samples at irregular intervals (i.e., 1967, 1971, 1976, 1981, 1987, and 1992) and the use of a subset or panel of the last sample drawn to develop statistics for intervening years. This practice introduced cyclical deterioration of the statistics. As compensation for this deterioration, periodic revisions were made to the statistics produced from the panels surveyed between sample years. Early in the survey's history, various methods were used to make these revisions. After 1976 and until the 1992 advent of annual sampling, a linking procedure called wedging was used. In wedging, the 2 sample years on each end of a series of estimates served as benchmarks in the algorithms used to adjust the estimates for the intervening years. To that end, the wedging algorithm does not change estimates from sample years and adjusts estimates from panel years, recognizing that deterioration of the panel is progressive over time. One of the primary reasons for deciding to select a new sample annually rather than at irregular intervals was to avoid applying global revision processes such as wedging. Consequently, the 1992 survey was intended to be the last one affected by the wedging procedure.

5. Availability of Data top

a. Publications

The data from this survey are published annually in SRS InfoBriefs, and in the series Research and Development in Industry, all available on the SRS web site (http://www.nsf.gov/sbe/srs/). Detailed historical statistics for 1953-1998 can be obtained from NSF's Industrial Research and Development Information System (IRIS) at http://www.nsf.gov/sbe/srs/iris/, an online interface to the Survey of Industrial Research and Development Historical Database (SIRDHD). The SIRDHD is a collection of more than 2,500 statistical tables containing all of the statistics produced and published from the 1953-1998 cycles of the annual Survey of Industrial Research and Development. Information from this survey is also included in Science and Engineering Indicators and in National Patterns of R&D Resources.

b. Electronic access

Data from this survey are available on the SRS Web site.

c. Contact for more information

Additional information about this survey can be obtained by contacting:

Raymond Wolfe
Economist
Research and Development Statistics Program
Division of Science Resources Statistics
National Science Foundation
4201 Wilson Boulevard, Suite 965
Arlington, VA 22230
(703) 292-7789
via e-mail at rwolfe@nsf.gov
Last Modified: Nov 02, 2004 Comments to srsweb@nsf.gov