III. Performance Measures

GPRA concepts. For purposes of annual performance reporting, GPRA seeks information about program outputs and outcomes. The immediately observable products of program activity such as publications or graduates are termed outputs. Intermediate and longer-term results for which the program is designed, such as producing knowledge or enabling improved health or national security, are referred to as outcomes.

Outputs and outcomes should be distinguished from inputs, such as researchers' knowledge and time, use of equipment, instruments, facilities, and supplies 1 Further, OMB guidance differentiates between outputs (e.g., graduates) and production activities (e.g., teaching). The text of GPRA does not specify a distinction between outputs and activities, but one purpose of the Act is to focus attention beyond an effort or activity in order to assess outputs and outcomes.

The OMB guidance for implementation of GPRA also discusses program impacts. These are the total long-run direct or indirect effects or consequences of the program. These effects may be intended or unintended, and may be positive or negative. Although information about impacts is useful for understanding the eventual effects of government programs, it is not required under the GPRA legislation.

In principle, it should be easy to distinguish among indicators for inputs, outputs, intermediate outcomes, and end outcomes. In practice, these four concepts represent a continuum for which indicators can blend one into another. A simplified description of the process of training in science and engineering illustrates the point. High school graduates represent a final output from secondary schools and an input to colleges and universities (as well as an input to employers who hire high school graduates directly). Baccalaureate recipients represent a final output from colleges and universities when they move directly into science and engineering employment. Baccalaureate recipients represent an intermediate output when they move on to graduate training. Individuals who complete doctoral or post doctoral programs represent outputs of those programs and inputs to renew the scientific work force's human resource base. Meanwhile, maintenance of a top quality science and engineering work force, appropriately employed, represents an outcome that enables continued conduct of world-class science, training of the next generation of scientists and engineers, and deployment of scientific expertise throughout the many sectors of the economy to assure development and application of new knowledge and techniques. And all of these, in combination with other factors, enable attainment of the Federal over-arching national goals of improved health, environment, prosperity, national security, and quality of life.

Whether a particular product or result is an output or intermediate or end outcome depends upon the mission and goals of the agency which produced it and upon whether it is viewed from the perspective of that agency's individual plan or from the perspective of over-arching national goals. Program-specific information about goals, institutional setting, and overall context is needed before definitions of outputs, intermediate outcomes, and end outcomes can be tailored to program reporting and before indicators can be identified. Appendix C provides an example from the Department of Agriculture that illustrates how overall agency goals and specific program goals can be used to derive performance indicators.

Pre-existing measures. Because pre-existing measures of research results were developed primarily for other purposes, they have not yet been adapted for use in reporting at the agency level. Pre-existing measures capture only a subset of the spectrum of research outputs and outcomes. Unfortunately, they do not map neatly or cleanly onto GPRA concepts. Although there are many measures of potential applicability to the science enterprise, most track inputs or levels of research activity. Some could be used as a starting point for examination of output. A few could be considered to capture selected aspects of outcomes. Thus far, comprehensive efforts to determine impacts appear to be rare. Consequently, pre-existing measures can serve only as a starting point for agency thinking about how to design the most effective assessment methods. Some well-known pre-existing measures are discussed below.

Publication counts. Publication counts have been used in non-GPRA contexts as measures of the quantity of knowledge produced by a research program. Publication itself is a tangible indictor of the transfer of research findings to the public domain, and publication in a peer reviewed journal is an indicator of a positive scientific evaluation of the information. Although publication counts provide useful information when combined with a larger richer set of indicators and analyses, their use alone or without sufficient information about other aspects of performance and the circumstances of the research can produce an incomplete, if not inaccurate, picture. For example, differences in publication rates between scientific disciplines may reflect differences in propensity to publish, in the definition of the smallest publishable unit, and in patterns of collaboration rather than differences in productivity. Also, the mere introduction of the counting of publications as a performance indicator, depending on how it is done, can influence publication patterns or publication rates--setting up incentives that focus on the production of more articles, rather than on the discovery of new knowledge.

Patent counts. Counts of patents, new devices, computer programs, and other inventions do not say much about whether a program is conducting world-class science at the frontier of knowledge; but, some mission agencies may use them to gain insight about connections between their program and the agency mission. If such counts are used in the assessment of a fundamental science program, they should be used in combination with other sources of information as part of a richer, more detailed assessment. Further, any use made of such counts should be undertaken only with full awareness of their limitations. In particular, patent counts and other indicators of inventive activity tend to be low for basic research programs. The statistical instability of the variability in small numbers from year to year suggests that inclusion of measures of inventive activity among a few summary indicators in a short program report would be a risky strategy for a fundamental science program; however, it should be possible to handle the problem of high variability in small numbers by using, for example, a rolling three-year average.

Citation counts. In the program evaluation literature, citation counts are sometimes described as an unobtrusive form of wide-scale scientific review. As with publication counts, their use and interpretation should be undertaken with certain caveats: in a few cases, high numbers of citations may indicate a negative evaluation (e.g., the disputed cold fusion results); possible citation "clubs" or "spirals" do not say much about the underlying science; citation rates vary among fields; in many fields, experimental work tends to be cited more frequently than theoretical, and occasional methods papers achieve extremely high levels of perfunctory citation with the consequence that citation counts, in general, may under-value advances in understanding and over-value sheer experimental activity; and, in fields for which the writing of a book is a major publication outlet (e.g., some social sciences), citation counts are an unfair assessment of value, since Science Citation Index, on which such counts are based, only includes references from journals.

Since prior experience with program evaluation suggests that retrospective scientific review and citation counts seem to provide complementary perspectives, evaluators generally advocate using the two together for detailed program evaluation. For example, a scientific review panel's judgments may be sharpened when it is required to evaluate and respond to literature-based data regarding the program being evaluated. A perception seems to be emerging that citation counts could be usefully combined with other descriptive information in summary reports of overall performance.

Contributions to other goals. A program may also contribute to other Federal goals, and such contributions are relevant aspects of program performance whether or not they are listed among specific program objectives. Such contributions can be included in program reports (and can be added to program objectives). Measures have been attempted for other aspects of research programs such as the development of human resources and physical infrastructure, the building of cross-disciplinary and cross-sectoral partnerships, or the numbers of undergraduates involved in a research program or in informal science education activities.

Output indicators for some activities might be available from published reports. Others could be collected from principal investigators at the completion of research projects and aggregated at the agency level. If data are collected from individual investigators and program managers, it should be made clear that such data will be aggregated and reported at the agency level. It should also be made clear that not all projects or programs need to contribute to all of an agency's goals. This makes good management sense, and communicating the point should help assure researchers that they have the flexibility that they need for creative work.

Some experimental efforts at the National Science Foundation to develop new sets of indicators are reported in Appendix C.

Rate-of-return and other measures developed by economists. Economists have developed a number of techniques intended to estimate the benefits of, or returns to, research. These generally involve efforts to link, directly or indirectly, the knowledge produced by research to the benefits eventually produced by use of the knowledge in practical applications. Basic approaches include efforts to (1) compute the benefits associated with the results of a research program or aggregation of programs, (2) compare the benefits of the research to the costs of conducting the research by constructing a benefit-cost ratio, and (3) compare benefits to costs by computing the implicit rate of return.

The findings of the "Assessment Process" (the Process is described in Appendix A) and of other sources (e.g., American Enterprise Institute et al. 1994) indicate that existing economic methods and data are sufficient to measure only a subset of important dimensions of the outcomes and impacts of fundamental science. Sufficiency varies among Federal programs--economic methods are perhaps best suited to assessing programs in some mission agencies and least suited to assessing programs not directly aimed at specific applications. When methods and data permit, economic techniques can be used to communicate the size and significance of the benefits of research. Two examples of the computation of economic benefits of research are given inAppendix C. One discusses the estimation of the cost savings flowing from biomedical research at the National Institutes of Health; the other, the economic impacts of research in metrology at the National Institute of Standards and Technology.

Economists have also developed substantive information about the determinants of the level and pattern of investments in research and the adoption and diffusion of new products and processes. However, there are complexities among the ever-changing pattern of innovative activities that are not well understood and for which the limitations of the data preclude study. In particular, what economists cannot now do is estimate (1) the benefit compared to the cost "at the margin" regarding the start of one more research program in comparison to something else, or (2) the benefit compared to the cost "at the margin" for an additional research program in one field or application as compared to another 2. Since economists require information about benefits and costs "at the margin" to make decisions about resource allocation, many suggest that existing economic methods and data do not provide useful criteria for allocating resources among potential areas for future research. Of course, to the extent that existing data permit computations of benefits or returns "on average" (rather than "at the margin"), economic methods can be used to gain retrospective insight about past performance.

Future benefits. Decision makers and policy makers sometimes seek information about what can be expected in the future if investments are made in one line of research or another. There are no measures (in the conventional sense of the word) of what the future benefits of research will be, at least in part because the future pattern and course of research impacts cannot be known. The "Assessment Process" did not attempt to develop measures of future benefits. Nor did it attempt to develop methods for setting priorities for future spending 3.

Other approaches. Performance reports need not and should not rely on quantitative measurement alone. Annual performance reports might, for example, document progress toward enabling goals over a rolling historical period of, say, the last twenty years; they might present examples of outstanding or more typical research accomplishments; or they might build descriptive case studies of how the accretion of knowledge through research eventually leads to long-run applications which contribute to over-arching national goals.

Merit review with peer evaluations. The insufficiency of measures per se is one reason why merit review with peer evaluations of past performance provide important information for retrospective performance assessment. The focus of such assessments for responding to GPRA would be at the program level. Since agencies are just now developing their approaches for assessment under GPRA, it is not yet clear how the expert assessments would be structured. Individual assessment panels might focus on key agency programs or groups of related agency programs (covering each such program or group of related programs every five years or so).

It should be recalled that a program under GPRA is an activity or project listed in the Federal budget; however, GPRA gives agencies the option to aggregate or dis-aggregate activities for GPRA reporting, as long as it does not omit or minimize the significance of any major function or operation of the agency. In practice, the definition of a program for reporting under GPRA seems to be evolving to include a major function or operation of an agency or a major mission-directed goal that cuts across agency components or organizations.

The credibility and effectiveness of scientific review for retrospective assessment is critically dependent on how it is organized and on the types of participants. A review panel clearly must have competencies that are a good match to the program content; it must have reviewers who are respected and objective (for example, not likely to be influenced by concerns that the panel's conclusions will influence future funding for their own work). Only then will a review be credible.

An assessment panel should consider whether research performance has been at the frontiers of scientific knowledge. In addition, program managers may seek expert assessment of the program's contributions to other enabling goals--for example, contributions to maintaining a high quality scientific work force appropriately employed or to ensuring that facilities and instrumentation are maintained to support work at the cutting edge.

Program managers may also seek expert insight about whether the program has made contributions to the knowledge base for specific mission goals as well as over-arching national goals. Intended users of the results of the program could provide information about the relevance or importance of the program's results. Their perspectives could be tapped by including them in the review panels. For "mission agencies," intended users might be those expected to apply the results of the science program (e.g., industry, agriculture, or users within the agency). For "non-mission agencies" such as the National Science Foundation, users might be researchers in areas for which the program's work is claimed to have impact. For agencies that support general knowledge development and scientific training, it might be appropriate to include stakeholders for the general pools of knowledge and talent to which the agency contributes.

To assure objective judgments from expert panel members, input should in principle be sought from researchers who were not among those supported by the program or involved in selecting projects funded by the program.

An example of the use of assessment panels at the National Institute of Standards and Technology is given in Appendix C.

International standing. Maintaining leadership across the frontiers of scientific knowledge is a critical element in our investment strategy for science. As noted above, for an individual agency, the evaluation criterion is whether the agency's research is conducted at the frontiers of scientific knowledge. For evaluation from an NSTC or national perspective, information is needed about United States standing internationally. The findings of the "Assessment Process" indicate that, although some data and methods exist for international comparisons of a nation's research activity and some aspects of overall research output, the methods for international comparison are still in their infancy. Further work is needed to develop cost-effective strategies for assessing American standing on the world stage. An inter-agency group, such as the Committee on Fundamental Science, might consider how this can best be accomplished. We stress that leadership evaluation does not entail simplistic numerical ranking of national programs. Our national interest in leadership rests in having our research and educational programs perform at the cutting edge--sometimes in competition, but often in explicit collaboration, with scientists from other nations.



Return Return Return


Previous Screen Top of Page Next Screen Table of Content Home Page