Assessing Fundamental Science: Appendix C

In subsequent pages, this appendix presents the following examples:

United States Department of Agriculture
National Research Initiative Competitive Grants Program:
One of a Portfolio of USDA-Funded Agricultural Research Programs

Department of Energy
Scientific Research Performance Criteria and Measures

National Science Foundation
Science and Technology Centers Program Pilot Project
Use of an Alternative GPRA Format

National Science Foundation
Facilities Pilot Project

National Science Foundation
Paradigm Shifts as Performance Indicators

National Institutes of Health
Cost Savings Resulting from Biomedical Research

National Institute of Standards and Technology
Economic Impacts of Research in Metrology

National Institute of Standards and Technology
Assessment Panels for Merit Review with Peer Evaluations

Research Round Table Discussion Paper
Developing and Presenting Performance Measures for Research Programs

United States Department of Agriculture (USDA)
National Research Initiative Competitive Grants Program:
One of a Portfolio of USDA-Funded Agricultural Research Programs

The USDA is engaged in a comprehensive effort to develop goals and assessment methods for its portfolio of research programs. Outcomes such as environmentally sound and economical production of food and fiber, globally competitive U.S. agricultural industries, improved human health and well-being through better nutrition, and sustained rural communities are the result of the total investment in agricultural research and not just one program.

The example given below illustrates how the USDA is working to address the challenge of assessing one of its portfolio of programs, the National Research Initiative Competitive Grants Program (NRICGP). It shows how planning for assessment begins with a statement of the purpose of the program from which specific goals are derived and how planning is undertaken in the context of broader goals for agricultural research. Specific goals, in turn, provide the basis for development of assessment methods.

The NRICGP has the unique roles within the USDA to:

Fund research competitively, based on peer review.
Support the widest participation of U.S. scientists, including scientists at land-grant universities, non-land-grant public universities, private universities and organizations, and Federal (intramural) laboratories.
Support up to 80% fundamental research related to agriculture, food, and the environment, with at least 20% of the funds used in support of mission-linked research.

The NRICGP presents the USDA with the same challenge faced by other mission agencies and the National Science Foundation, namely how to assess fundamental research aimed at expanding knowledge without a particular near-term application in mind. Yet the Department's other predominately mission-linked research programs depend on the foundation knowledge provided by the research supported by this competitive grants program. In addition, grants awarded by the NRICGP are usually only one source of support for the research carried out by the researcher(s), with other sources provided by other USDA research programs, other federal programs, and state appropriations to land grant universities.

The goals listed below are an attempt first to identify specifically what is expected of the NRICGP based on the unique roles of this program within the broader portfolio of USDA research programs, and second to identify goals more generally of the broader portfolio of USDA research programs for which major contributions are expected from the NRICGP. The performance measures are examples based on what specifically would be measured for the NRICGP.

The goals of the NRICGP are:

Increase the amount and quality of science applied to solve problems facing agriculture and forestry.
Increase the investment in areas of research that have the greatest potential for expanding the knowledge base needed to solve current problems as well as meet unforeseen threats to our food supply or the environment.
Attract the widest participation of the best U.S. scientists, both new and established, for research needed to assure sustainable agriculture and forestry.
Provide the means by which research funds in support of agricultural research can be distributed competitively through peer review, ensuring the highest quality science.
Provide support for agricultural research that is of national benefit and is unlikely to be financially supported regionally, or locally, by another federal agency, or by private industry.

The goals of USDA-funded research for which the NRICGP plays a major role are:

Increase the knowledge base related to the agriculture, food, and the environment.
Assure that U.S. world leadership in science and engineering includes areas of science and engineering relevant to agriculture, food, and the environment.
Respond to new, emerging, or reemerging agricultural problems or priorities that have science-based solutions.
Strengthen the agricultural research enterprise, including through the opening of new areas of science with relevance to agriculture, food, and the environment.

Performance measures for the NRICGP are:

Document contributions to the quality of science with potential to open new directions for applied agricultural research.
Document contributions to the amount of research supported to solve problems facing agriculture and forestry.
Document the extent of participation of the scientific community in agricultural research.
Document contributions to the agricultural research enterprise, including training of future agricultural scientists.

USDA experience to date. While it is too soon to derive general lessons from the USDA experience in development of specific goals and performance measures for its competitive grants program, this example may be useful to other mission agencies with a similar broad and comprehensive array of programs.

Department of Energy
Scientific Research Performance Criteria and Measures

This example provides a set of criteria and measures that have been developed by the Department of Energy (DoE) for performance evaluation of basic research performed at the national laboratories and at other government-owned, contractor-operated facilities. The set illustrates the use of multiple indicators for setting target levels of performance and assessing subsequent performance.

The major elements of the scientific research performance criteria and measures at DoE are:

Quality of the basic science--as indicated by expert advisory committees; peer reviews; sustained progress; recognition by the scientific community; and world-class research facilities.
Relevance to DoE missions and national needs--as indicated by sustained advancement of fundamental science; programs in support of energy and other civilian technology development programs; joint efforts with industry or other government agencies, and advanced research facilities that serve the needs of a wide diversity of scientific users from industry, academia, and government laboratories.
Construction and operation of research facilities that meet user needs and requirements--as indicated by achieving performance specifications; meeting schedule and cost milestones; operating facilities that are used for research at the forefront of science; operating reliably according to planned schedules; maintaining and improving facilities at reasonable and defensible cost; and obtaining endorsement by strong and enthusiastic user organizations.
Quality of research management--as indicated by well developed research plans; meeting budget projections and milestones; identifying and overcoming technical problems; and effective decision making in managing and redirecting projects.

Sub-criteria that apply to all research programs are:

Scientific quality of the research--Is the research of such quality that it pushes the state of current knowledge close to its limits or makes new contributions to the field? Measures for this sub-criterion are:
1. Results of conventional peer review and reviews across scientific sectors of research and its accomplishments.
2. Progress, as indicated by sustained advancement and achievement.
3. Quality of the research staff as revealed by awards, high quality publications in refereed journals, leadership in the scientific community, and the opinions of other professionals in the fields of work.
Innovativeness of the research--Has the work spawned new areas of investigation by others? Measures for this sub-criterion are:
1. Creative and original concepts and designs for research facilities and programs.
2. New programs spawned; new research areas highlighted.
3. New technologies or techniques developed.
4. Seminal papers, as judged by independent peer review.
Impact on industry--Has the work generated interest and involvement of potential industrial partners or users? Measures for this sub-criterion are:
1. The degree to which research tasks are directed at national technology needs.
2. New or improved technologies are created as indicated by industrial usage and spinoff applications.
3. Linkage to industry through collaborations and jointly funded research efforts and the use of national experimental facilities
Impact on other government agencies--Has the work generated interest and involvement of other government agencies? Measures for this sub-criterion are:
1. Work directed in support of other agencies' known needs.
2. Savings from identifiable consolidation.
3. Work for others, if appropriate.

Sub-criteria that apply to multi-investigator, multi-disciplinary, integrated research programs are:

Integrated program planning and review. Measures for this sub-criterion are:
1. Quality of integrated research plans.
2. Effectiveness in forming and using interdisciplinary teams, as appropriate.
3. Degree to which programs meet national needs and the Department's mission, as determined by outside advisory committees.
4. Results of program reviews by Department and other stakeholder managers, covering progress, research task priorities, and personnel expertise.
5. Outreach to industrial organizations; conferences and workshops organized and attended; conference leadership.
Efficiency of operations. Measures for this sub-criterion are:
1. Effective utilization of personnel, facilities, and equipment--as indicated by trends in costs.
2. Decision making--as indicated by effective setting and changing of milestones and the ability to redirect or stop projects.
3. Based on available resources, ability to meet program milestones.
4. Are potential technical problems identified and steps taken to mitigate them?
Education and training--Has the program been actively involved in promoting the educational agenda of the nation? Measures for this sub-criterion are:
1. Graduate students and postdoctoral research staff participating in research.
2. Programs for local students and high school teachers.
3. Contribution to the broad education mission of the Department which includes work to improve, nationally, pre-college science and mathematics education.
Government, university, industry cross-sector alliances--Do the programs develop and incorporate alliances that promote opportunities? Measures for this sub-criterion are:
1. Evidence of leadership in cross-sector alliances.
2. Advantageous utilization of cross-sector resources.
3. Evidence of "economies" obtained in timing, funding, and use of infrastructure attributable to alliances.
4. Numbers of cross-sector temporary personnel assignments.

Sub-criteria that apply to design and construction of research facilities are:

Achieving performance, schedule, and cost milestones as agreed to prior to initiation of construction. Measures for this sub-criterion are:
1. Is the project meeting key milestones as defined and managed according to DoE Order 4700.1, Project Management System?
2. Use of facilities or management of their use for others.
Preeminence of the facility--Does it maintain its position as a leader in its field (presuming of course that when it was proposed, accepted, authorized, and built, it was done so because it was then best-in-class)? Measures for this sub-criterion are:
1. Is there a plan for improvements and are they being made in a timely way?
2. Is there support by the Department of Energy Program Office for the improvement plans and proposals? By others? By the users?
3. How do the operating specifications of the facility compare with other facilities worldwide? Is it unique? Is it better? Is it comparable?
Performance of the Facility--Is it meeting its originally targeted performance plan? Measures for this sub-criterion are:
1. Does it have an operating plan agreed to by the Department of Energy? Is it achieving that operating plan in terms of up time and down time, number of unscheduled outages, and the like? Does the facility meet other appropriate performance specifications like beam quality, flux, luminosity, etc.?
2. Are the expectations of the users being met as--indicated by the opinion of user and other advisory groups?
Quality of the Science--Is the research at the facility deemed to be world-class science as measured by an appropriate peer review process? Measures for this sub-criterion are:
1. Is a reasonable fraction of the work that is supported at the forefront of the science?
2. Is the facility being pushed to extend its capability by users who have been deemed worthy of support?
3. Have national and international awards been granted based on work done at the facility?
4. What is the judgement of independent advisory Committees that report to the research Director, and is their advice considered?
Strength and enthusiasm of the user community--Is the community of users an important part of the scientific community at large? Measures for this sub-criterion are:
1. What are the projections and how well are they being met for:
  1. Level of user activity.
  2. Investment made by users to use the facility.
  3. Diversity of users among university, industry, and government affiliations, including minority businesses.
  4. Communication of research results by users, by appropriate means such as publications and presentations.
  5. University involvement--as indicated by the involvement of graduate students and number and quality of graduate research theses.
  6. Industrial involvement as indicated by the involvement of industrial users and publications.
2. Is the program reaching out to new users? What are their projections for future new users and how well are those projections being met?

DoE experience to date. In formulating these criteria and measures, DoE encountered a cooperative and supportive response from the internal and external science community. Consequently, the Department is negotiating these criteria into new and renewal contracts for the national laboratories and other government-owned, contractor-operated facilities that perform basic research. The facilities include single purpose laboratories such as high energy physics accelerators, as well as the multi-program national laboratories. In these contracts:

The contractor and DoE agree that they will utilize a performance-based management system for contractor oversight. The performance-based management system will include the use of clear and reasonable criteria and measures, agreed to in advance, as standards against which the contractor's overall performance will be assessed.
The contractor and DoE agree that the parties will utilize these performance criteria (with any negotiated changes) to evaluate basic research and associated technical work of the contractor. The DoE and the contractor further agree that these criteria and measures will be reviewed annually and modified, if necessary, by agreement of the parties.

National Science Foundation
Science and Technology Centers Program Pilot Project:
Use of an Alternative GPRA Format

This example describes the GPRA pilot project for the Science and Technology Centers (STC) Program of the National Science Foundation (NSF). The example illustrates the use of an alternative approach allowed under GPRA. The GPRA alternative approach requires clear criteria for determining if a program is "minimally effective" or "successful," but it does not require performance indicators that are quantifiable and measurable.

The STC activity supports twenty five university-based research centers in a variety of scientific areas. The program supports cutting-edge interdisciplinary research that requires the advantages of larger scale and more stable funding provided by a center. The activity also supports education, knowledge transfer from academic researchers to industry, government agencies, and other sectors of society, and knowledge transfer among academic institutions. The center mode complements other modes of research support (awards to individual investigators or small research groups, for example) by providing higher levels of longer-term support for collaborative activities that cross disciplinary and institutional barriers.

Designing performance indicators for this GPRA pilot project was a collaborative effort among evaluation specialists and the program staff that oversee awards to the STC. The resulting informal advisory group agreed that the STC Program goals were not just to support research, knowledge transfer, and education activities. In addition, the program intended to pursue these goals in ways that were distinct from, and complementary to, the activities supported by grants to individual investigators and small groups. The performance indicators therefore had to reflect not only on quality and quantity of impact, but on the uniqueness of the program's contributions as well.

After a less than satisfactory attempt to create quantitative performance measures in Fiscal Year 1994, the program decided to create a more qualitative set of indicators. For its Fiscal Year 1995 performance plan, the STC Program pilot project proposed to use the alternative approach under GPRA. The law states:

include separate descriptive statements of--
1. 1. a minimally effective program, and
  2. a successful program, or
2. such alternative as authorized by the Director of the Office of Management and Budget,
with sufficient precision and in such terms that would allow for an accurate, independent determination of whether the program activity's performance meets the criteria of the description; ...

The STC performance plan articulated three goal areas for the Program:

The informal GPRA advisory group then confronted the issue of how to judge the success of the program based on the performance of the centers supported in the program. There was no obvious precedent for this sort of portfolio assessment; so, the following operational solution was proposed:

The STC activity would be considered minimally effective if:

The STC activity would be considered successful if:

This proposal has several implications. First, it eliminates competition among the centers to be "best," and it promotes cooperation. Second, this proposal recognizes that some level of failure is acceptable--even necessary--if the program and the centers are being asked to take risks.

Unfortunately, the problem of defining "success" in reaching a goal remained. The informal advisory group tried to design definitions of success that would be credible but would provide the centers with some flexibility. The solution was to create two qualitative indicators for each goal area and to describe "significant progress" and "outstanding progress" for each indicator. A center would be viewed as reaching a goal (for the three goal areas of research, knowledge transfer, and education) if an expert panel considered the center to have made significant progress on both indicators for that goal or outstanding progress on one of the indicators. This would allow centers to invest deeply in a single innovative effort in a particular goal area or to spread their efforts among several initiatives.

Experience to date. However, the informal advisory group agreed that this performance plan was beyond the ability of the STC Program at least initially. Data gathering, analysis, and evaluation for the pilot project are currently being conducted by an evaluation contractor. As a result, it is too soon to derive general lessons from this pilot project.

An example of indicators for significant progress and outstanding progress is shown below for Goal 1, interdisciplinary research:

National Science Foundation
Facilities Pilot Project

As part of the National Science Foundation (NSF) involvement with GPRA, we chose to assess the operational performance of our national facilities.

(1) Assessment of National Facilities

The Foundation supports a number of user facilities, such as telescopes and accelerators, in several different disciplinary directorates. The facilities have a common purpose: as phrased in the NSF strategic plan, "to enable the United States to uphold a position of world leadership" in selected fields of science. Each facility is planned in response to needs in a specific field, and each one starts from a different technical baseline.

(2) Description of the Method

In our first attempt at a performance plan, the participants in the pilot project developed five generic goals for their facilities. When facilities directors were asked to produce whatever data they thought was useful in relation to these five goals, there were difficulties (see below) but some interesting ideas emerged. The pilot project leader then called the group together, along with some representatives of other major facilities that NSF supports, and the group developed a dozen generic performance indicators under three broad performance concepts:

Efficiency of operations
Effectiveness of operations to the scientific user community
Effectiveness of activities to the external community

The key to the plan was to think about the performance measures in terms of percentage change from a baseline. The baseline number could be different for each facility, and even measured in different metrics. To help standardize, the group had to invent a term, "user units," to refer generically to entities like beam time and observing hours. In the end, however, the percentage change from each facility could be folded into a percent change for the whole portfolio of facilities. The portfolio concept allows for variation among the facilities in their indicators for any particular year. When an individual facility experiences bad weather, for example, its figures may droop; but that individual variation will play only a small role in the Foundation-wide average. When old or ineffective facilities are dropped and new facilities added through the regular peer review process, the indicators for the portfolio should improve.

Thinking in terms of the portfolio concept was a major challenge for the group because NSF normally devotes so much attention to the evaluation of individual facilities. The performance indicators for the individual facilities would, of course, be available to program managers and site visit teams, but, when used at that level, they would be interpreted in context, with regard to the performance expectations for that particular site. At the same time, improving the generic, aggregate performance characteristics of the portfolio became a goal for the Foundation as a whole.

(3) Lessons and Insights

Our first attempt at developing goals was only partially successful. We found that national facility directors had difficulties in translating these goals into practical performance measures and indicators. The involvement of the facility directors as part of a team approach was found to be far more effective, and resulted in a "buy-in" from the groups involved.

NSF is continuing to develop these concepts, working with the leaders of these facilities.

National Science Foundation
Paradigm Shifts as Performance Indicators

The National Science Foundation (NSF) is engaged in a wide-ranging interactive process with staff to develop new performance indicators for NSF programs. During this process, NSF staff have suggested many creative ideas for further exploration. One of these would use observed shifts in research paradigms to help identify outcomes or impacts of research undertaken in the past. Background for applying this approach to examining outcomes or impacts of research in computer science is summarized in this example.

Historically, the scientific method has been based on theory and experimentation. A researcher would propose a theory to explain some phenomenon, an experiment would be formulated to test the theory, and observations of the experiment would be used to validate the theory and/or propose modifications to the theory to account for the observations.

With the advent of computing, the possibility of a new paradigm emerged: computer simulations could be used in place of experiments, with results of a simulation playing the role of observations, leading to validation of theories or raising questions that required modification to theories. In some cases the computation could be used to complement experiments, in others it could actually replace them. One of the earliest recognitions of role of computation in the scientific method occurred in the fluid dynamics community, particularly aeronautics. For example, it is simply not possible to run an experiment with a manned vehicle entering the earth's atmosphere at hypersonic speeds in order to determine the effects of heating. These "experiments" were done via computer simulation and the result was that the space shuttle flew without ever being tested in reentry.

There are innumerable other cases where computer simulation has replaced experimentation and for a variety of reasons. As with the space shuttle, it may simply not be possible to conduct the appropriate experiment. A scientific example of this is the study of the initial conditions required for the formation of galaxies; one simply can't run the experiment. Another case occurs when the experiment or the gathering of the data changes the environment, thereby altering the experiment. For example, if one is trying to determine the tensile strength of a pure material, the inclusion of sensors to measure the buildup of stresses alters the material, and thus the experiment. Finally, there are cases where the experiment cannot or should not be done because it is life-threatening. An example is nuclear testing where no full scale experiments have been permitted for years and where a major computational program has been proposed to support the simulations necessary to evaluate the integrity of the nuclear stockpile.

There are many other potential areas where the paradigm shift has not occurred for lack of either computer power or acceptance on the part of the public. For example, drug design is still based primarily on the experience of the scientist and on experimentation. It is possible to do simulations of relatively small molecules with the accuracy required to study "docking," the process by which a drug binds with the existing biological structure, but dealing with complex molecules such as proteins is still in the future. Even further in the future is the use of simulations to replace clinical testing to determine the effects of a drug. Here, not only do we lack computational power, but also public acceptance.

It should be clear that these paradigm shifts, where a field or discipline accepts computer simulations as a partner in the scientific method, are an indicator of the advancement of computing technology and thus can be used as a performance indicator for computational science. It should also be clear that it is very difficult to predict in what fields and when such a shift will occur. Such a prediction is predicated not only on a variety of technological advancements, which are difficult enough to forecast, but also on societal acceptance, be it from a relevant industry or the public at large. Thus, paradigm shifts as performance indicators are most effective if viewed by looking back to determine if and when they happened rather than trying to look forward to predict when they might occur. In this sense they are best suited to the "alternative" form of performance assessment where they could be used to help determine if an organization is exhibiting "minimally effective" or "successful" performance. It is relatively easy for a panel of experts to assess to what degree computation is influencing and being used by a particular field, and to determine when it is accepted by that field as an equal component of the scientific method.

Such acceptance of a paradigm shift has taken place recently in cosmology, where computation is now being used to test existing theories and to help guide formulation of new ones. As identified by a leading researcher in the field, several developments came together to enable this shift. First, well defined physical theories which make testable predictions possible were developed. Second, numerical algorithms which can accurately simulate the formation of cosmological structures, such as galaxies and clusters of galaxies, starting from primordial initial conditions were developed and refined; and these can now be combined into predictive numerical codes. Third and most recently, computing power and memory have reached a level where it is possible to model the universe in full three dimensions, with time evolution, rather than in just two dimensions. This last development is a crucial step generally in a paradigm shift, as virtually all physical phenomena of interest are three dimensional with some degree of time dependence.

NSF plans to explore development of shifts in research paradigm as a performance indicator. It is too soon to report general methodological lessons at this time.

National Institutes of Health
Cost Savings Resulting from Biomedical Research

This example illustrates how the National Institutes of Health (NIH) have used economic methods to compute savings flowing from biomedical research. Over the years, NIH has transmitted estimated benefits to Congress and the public through hearings, publications, and other media.

The NIH publication, Cost Savings Resulting from NIH Research Support (2nd edition), summarizes 34 case studies, including:

The National Cancer Institute developed adjuvant therapy for Dukes' C colon cancer using levamisole and 5-flurouracil. For patients annually diagnosed with Dukes' C colon cancer, adjuvant therapy with levamisole and (5-FU) can significantly delay tumor recurrence and reduce the risk of dying of recurrent colon cancer by one-third. The one-year savings for the cohort of patients expected to initiate the new treatment is estimated to fall between $161.4 and $215.2 million, based on reduction in work days lost due to premature mortality. The cost of the supporting NIH applied and clinical research is estimated at $13.2 million for the period 1978-1990. (All in 1992 dollars.)
The National Heart Lung and Blood Institute developed antenatal steroid therapy to prevent neonatal respiratory distress syndrome. One-year reduction in treatment costs for the cohort of patients that initiate the new treatment are estimated to range from $16.5 to $145.1 million (depending on what the neonatal length-of-stay in intensive care and the current rate of adoption of the new treatment turn out to be). The cost of the supporting NIH applied and clinical research is estimated at $7.4 million for the period 1976-1983. (All in 1992 dollars.)
The National Institute of Allergy and Infectious Diseases formulated Haemophilus influenzae type b intervention in two-month old infants. Haemophilus influenzae type b disease is the leading cause of bacterial meningitis in the United States. It also causes a wide spectrum of other serious infections. Assuming 80 percent of infants receive the vaccine, the one-year reduction in direct costs of custodial care plus indirect costs of mortality and morbidity for the cohort of patients that initiate the new vaccine is estimated at $346.6 to $462.1 million. The cost of the NIH applied and clinical research is estimated at $20.1 million, from 1972 to 1989. (All in 1992 dollars.)

The estimated savings are based on the difference between estimated direct plus indirect costs for a particular disease before and after the medical innovation. Direct costs include the costs of medical resources required to provide health care in response to the illness or condition, as well as nonmedical costs (custodial care, special diets, tutors, transportation, special equipment, governmental and voluntary community support programs) associated with the condition. Indirect costs represent the productivity lost to society as a result of premature mortality or lost work days due to morbidity. In what is called the "human capital" approach, such costs are valued in terms of lost earnings and expressed in terms of dollars. Since the value of extra years of life or of reduced pain and suffering due to the medical innovation is not estimated, the estimates represent a conservative approach to valuing the benefits of biomedical R&D.

The case studies reported in Cost Savings Resulting from NIH Research Support do not include basic research. However, fundamental science (whether it is basic or applied research) produces the continuing insight and understanding about the mechanisms of life and disease which are needed for the R&D and innovation that lead to health care improvement. The linkages between fundamental research and health care advances are complicated, long-term, and impossible to allocate completely and clearly. Nevertheless, the NIH calculations of savings flowing from biomedical research provide insight about the size and importance of health care innovations enabled by fundamental science.

In general, when the retrospective study period is long enough, the appropriate data are available, and the trail of connections between fundamental research and eventual long-run impacts is sufficiently clear, then economic methods can help to illuminate the sorts of contributions to over-arching national goals that are enabled by fundamental research.

National Institute of Standards and Technology (NIST)
Economic Impacts of Research in Metrology

In the early 1980s, NIST began to examine rates of return realized from investment in different types of technology and in different stages of a technology's development by measuring and quantifying the economic returns from the Institute's research and services. The results, summarized below, provide another perspective on the value that U.S. taxpayers realize from their investment in NIST. Conducted by independent researchers under contract to NIST, the studies have estimated the "social" rate of return, which is the aggregate rate of return to all investment in the technology generated by specific projects. The general methodology used in studies of the economic impact of NIST laboratory research is to compute the aggregate flow of benefits over time and the aggregate costs for a particular investment. Then, the present value of benefits and the present value of costs can be computed in order to solve for the benefit-cost ratio. Or the implicit internal rate of return can be computed from the costs of the investment and the aggregate flow of benefits over time.

In these studies, the internal rate of return is computed in the same way that rates of return are calculated from the time flows of costs and benefits associated with a particular project or investment in the business and financial communities. Estimated internal rates of return are one indicator that corporations use in making choices among alternative investment projects, including R&D projects. Economists use the same approach when they estimate the "private" rate of return to a particular firm on its original investment and the "social" rate of return on that investment--where the "social" rate of return refers to benefits accruing to the firm and to all others, regardless of whether they were involved in the original investment process.

The estimates summarized below are conservative because they are based only on quantifiable benefits realized by companies and consumers. Important qualitative benefits, such as enabling industry standards or opening new avenues of research, are not included in the estimates.

The first step in estimating costs and benefits for a particular project is to conduct intensive interviews of the relevant NIST manager(s) to determine the scope and nature of the project's technical output and the stages of the economic process (R&D, production, and marketing) where technical infrastructure produced by the project would be absorbed. Next, the costs of the project itself, the costs to industry of applying the project results over time, and the economic benefits realized over time are estimated. The NIST project cost and the net benefit series over time can then be solved for the project's social rate of return.

For example, a study of the NIST project that developed the technical basis for product-acceptance test methods in the optical fiber industry began by identifying the fiber parameters for which more accurate test methods were needed. Next, private sector firms were surveyed to determine how the new test methods were being used (e.g., to control production or to facilitate sale of the product). The cost savings realized by the entire industry were estimated from the data collected. These aggregate economic (social) benefits were netted against costs. The net benefit stream and the original investment cost were solved for the internal rate of return. In addition to providing estimates of quantifiable benefits, these studies have also documented qualitative benefits, such as impacts on future R&D decisions.

Summary Results. The studies and their estimated rates of return are:

Economic impact of NIST research on the semiconductor industry
Economic impact of NIST research on calibration and testing industries
Economic impact of NIST research on optical industries
Economic impact of NIST research on computer systems

The median rate of return for these twelve economic impact studies is 147 percent. Across all project areas, the rate of return ranges from 41 percent to 428 percent, indicating that significant benefits flow back to the U.S. economy and society. These rates compare favorably with those reported in studies of returns on other public investments in technology and on private-sector R&D investments. Key features of four of these studies are highlighted below.

Research on Power and Energy Measurement and Calibration Services. Results of this study illustrate how the benefits of NIST measurements fan out across industry and markets. The Institute maintains the U.S. standard for the watthour; and it conducts research to improve the measurement accuracy of the 2,000 standard watthour meters used to calibrate the more than 2 million watthour meters sold in the U.S. each year. In all, U.S. utilities monitor more than 100 million watthour meters that record customers' power usage, totaling more than 2,700 billion kilowatt hours annually and generating industry revenues exceeding $180 billion. Through its research, NIST has enabled a tenfold increase in the measurement accuracy of watthour meters, reducing the uncertainty to 0.005 percent. The result has been an increase in the accuracy of customers' bills, which translates into even greater assurance that consumers are charged only for the power that they actually use. This improvement and other benefits due to the traceability of meter measurements to national standards have produced sizable returns to U.S. taxpayers. A 1994 analysis estimates that total benefits exceed costs by a ratio of 41 to 1. The estimated social rate of return was 428 percent.

Research on Optical Fiber Standards. NIST provided basic measurement technology, as well as technical assistance in developing industry standards, for the optical fiber industry in the 1980s. The rate of return to NIST-conducted research in this area was 423 percent. Over the period studied, NIST provided technical support to the industry as it promulgated 22 standards for this complex area of technology. A 1991 economic impact study noted that while the direct economic benefits to the industry were primarily from substantial reductions in market transaction costs, an important indirect effect of the NIST work was "a much faster rate of growth for the optical fiber market and hence for the U.S. optical fiber industry." The president of the Telecommunications Industry Association said: "Without the NIST assistance and leadership, the U.S. fiber optics industry would not be in the competitive position it is today."

Research in Electromagnetic Compatibility/Interference Metrology. The problem of interference among electronic and electrical devices has grown enormously with the proliferation of these devices. A 1991 study of the economic impact of NIST research in electromagnetic compatibility/interference metrology conducted over the previous decade found that organizations using NIST's research improved research efficiency and reduced transaction costs. Based on these cost-saving benefits, the study showed an estimated spillover rate of return of more than 260 percent for this program.

Research on Methods to Prevent Failures of Integrated Circuits. As semiconductor devices become increasingly dense (a state-of-the-art microprocessor, for example, contains more than 3 million transistors), design and manufacturing challenges also become increasingly severe. NIST worked with the U.S. semiconductor industry in the 1980s to develop improved methods to test for a specific problem--electromigration--that causes the thin metal wires connecting integrated circuit components to fail. Benefits to this industry, including reduced production and transaction costs as well as improved research efficiencies, led to an estimated rate of return of 117 percent for this NIST project.

Fundamental Research. NIST also pursues fundamental research to meet future industrial needs. Although we cannot estimate the aggregate future benefits that will flow from the eventual applications of knowledge produced by the results of current research, we can identify some probable areas of impact. For example, NIST has a program of future-looking research in which it develops the tools and fundamental understanding that will help it anticipate and respond to measurement needs arising from advances in science and technology and intensifying international competition. Examples of recent research likely to have an economic impact are:

In 1995, scientists at JILA, a joint institute of NIST and the University of Colorado, produced the lowest temperature ever attained and were the first to achieve Bose-Einstein condensation. This is a new state of matter in which individual atoms condense into a "superatom" behaving as a single entity. The accomplishment eventually may lead to improvements in the accuracy of atomic clocks. NIST is the inventor of the first atomic clock in 1950, and NIST is the developer and custodian of NIST-7, the most accurate clock in the world and key to the system of clocks used in global time synchronization. Each improvement in atomic timekeeping has led to almost immediate technological applications, from synchronizing telecommunications and electrical power grids to the Global Positioning System.
NIST has developed the equivalent of surveying tools for charting the domains of molecules and atoms, creating unique measurement capabilities certain to be in great demand in the semiconductor industry and the entire field of nanotechnology. The NIST Molecular Measuring Machine, or M cubed, began performance testing in late 1994, achieving previously unattainable levels of measurement repeatability. M cubed also was pressed into practical service--a comparison of semiconductor industry methods for measuring the width of electrically conducting lines on integrated circuits, a measurement of great importance to the industry's quality-control efforts. M cubed is designed to measure, with nanometer-level accuracy, relative distances between molecule-sized features over an area 250,000 times greater than the field of view of typical scanning tunneling microscopes.
NIST developed an innovative "atom optics" method for rapidly fabricating and precisely positioning nanoscale metallic structures 10 times narrower than the smallest features in state-of-the-art integrated circuits. The achievement suggests new avenues for fabricating smaller, faster electronic devices, as well as opportunities for building new measurement aids for manufacturers of those devices.

NIST-Industry Linkages. The NIST laboratories plan and carry out their research in collaboration with industry. As a result, the federal investment yields critically needed measurement methods and other infrastructural technologies that open the way to advances in research, improvements in processes and products, efficiencies in the marketplace, and other benefits reaped by companies and industries, and, through them, the economy.

National Institute of Standards and Technology (NIST)
Assessment Panels for Merit Review with Peer Evaluations

The seven technical Laboratories of NIST undergo annual assessments by external panels convened by the National Research Council. The panels consist of scientists, engineers and technical managers from academia, industry and government. In 1995, the panels had a total membership of 144 with about 25% from academia, 60% from industry and 15% from government. These assessment panels visit the Laboratories twice a year, once as a group and once on an individual basis where they have an opportunity to interact directly with scientific staff. They produce written evaluations of performance, missions, and short- and long-term goals for each Laboratory as a whole and for each division. The written evaluation is a detailed report with about 20-30 recommendations and questions for the Laboratory to address before the panels reconvene.

Activities of the panels include: reviewing the technical programs of NIST with respect to the needs of U.S. scientific and technological communities; making reports to the NIST Director and briefing the statutory Visiting Committee on Advanced Technology; and apprising them of the balance and general effectiveness of the programs of NIST. The panels also assist NIST in examining emerging technologies expected to require research in metrology. The panels make recommendations with regard to the following types of questions:

Do research plans address critical technical issues within the scope of NIST's mission?
Is work appropriate and effective in reaching objectives?
Is planning adequate for the near and long term?
Is research at the state of the art?
Are technical approach and methodology appropriate?
Are available equipment and facilities adequate?
Are staff skills and expertise optimum for the work?
Is dissemination of technical results to scientists, industry, and the general public optimal?

In Fiscal Year 1995, the evaluation panels focused on six specific issues of concern:

Quality and effectiveness of current technical programs
Priorities and the priority-setting process
Status of equipment and facilities, including current and future needs
Impact on industry and how that impact is measured
Implementation of the new NIST Industry Fellows Program
Health of programs on evaluation and dissemination of technical data

This process for evaluating NIST programs has been used effectively for 37 years. A major review of both the process and product is being undertaken this year.

Research Round Table Discussion Paper
August 1, 1995
Developing and Presenting Performance Measures for Research Programs

The implementation of the Government Performance and Results Act (GPRA) of 1993 will provide many challenges to managers across the Federal Government. Adapting GPRA requirements to the Federal research environment will be especially challenging ¹ . Applied constructively, GPRA can have a positive effect on the quality and innovativeness of our scientific endeavors. Conversely, serious detrimental effects can occur if it is applied incorrectly. Federal researchers and managers representing a cross-section of departments and agencies have met over the last six months in a Round Table forum to discuss the unique circumstances surrounding the development of performance measures for research programs. The following observations and model for research performance measures are the result of these discussions.

Purpose:

This paper articulates an approach that Government research organizations can use in applying the principles of GPRA to a wide range of federally supported research activities.

Background:

In 1993, Congress passed and the President signed into law P.L. 103-62, the Government Performance and Results Act (GPRA). The intent of the statute was to increase the effectiveness, efficiency, and accountability of the Federal Government.

GPRA requires each agency to:

develop a five-year strategic plan;
establish national goals;
identify outcome and output performance measures, with an emphasis on outcome measures;
develop performance indicators;
beginning with the Fiscal Year 1999 budget submission, include a one-year performance plan that defines what progress will be made in that fiscal year toward achieving the goals in the strategic plan; and
each year, beginning at the end of Fiscal Year 1999, prepare a report that assesses actual performance against planned performance.

Observations from the Forum:

(1) The results of research program performance can be measured. The indicators that can be used will vary between basic and applied research programs.

(2) The Federal research community recognizes the importance and desirability of measuring performance and results and reporting them to the executive and legislative branches of government and to the public. Such measures are also useful in the internal management of these programs.

(3) Measures can be developed proactively by research organizations in consultation with their customers and stakeholders. Careful identification of the full range of customers, stakeholders, and partners will aid the selection of appropriate performance measures.

(4) It is appropriate and in the public interest that the Federal research community define how their achievements will be measured and begin using the agreed-upon measures as soon as possible.

(5) The cause-effect relationships between research outputs and their eventual outcomes are complex. Often, it is extremely difficult to quantify these relationships empirically--even though obvious logical relationships exist between the outputs and outcomes. The difficulties arise from (a) the long time delays that often occur between research results and their eventual impacts, (b) the fact that a specific outcome is usually the result of many factors, not just a particular research program or project, and (c) the fact that a single research output often has several outcomes, often unforeseen, not a single unique outcome (see the attachment for examples). Consequently, the cause-effect relationship between research outputs and their resultant outcomes should be described in terms of logical causality. Quantitative empirical demonstrations should not be required and are, often, not even possible.

(6) As envisioned in GPRA, strategic planning is a prerequisite to performance measure development. Performance measures should be derived directly from a research program's goals and objectives. They should measure the extent to which specific goals and/or objectives are being accomplished.

(7) Performance measures should have value to the program measured. In fact, measurements currently made for internal program management will frequently provide key data for performance measures suitable for GPRA.

A Performance Measurement Model for Research:

The following model, initially formulated by the Army Research Laboratory and expanded by the government-wide Research Round Table, describes an approach that addresses GPRA requirements and improves the management of performance. It presents a method of evaluation that is both equitable and informative about research and development programs. It applies to all types of research, which can be arrayed on a continuum extending from the most basic research through specific applied research. Depending on where a program falls on this continuum, certain types of evaluation methods may be more pertinent than others. As research moves toward the applied end of the continuum, more specific outcome measures can be identified. No single measure can be used to assess the success of research.

(1) Research can be evaluated using the following matrix. It arrays dimensions of performance (relevance, productivity, quality) by assessment methods (peer review, metrics, customer evaluation):

TABLE

--------Dimensions of Performance--------
Assessment Methods Relevance Productivity Quality

Peer Review XX XX XX
Metrics XX XX XX
Customer Evaluation XX XX XX

XX to be entered as:

++ = Very Useful
+ = Somewhat Useful
o = Less Useful

--------Dimensions of Performance--------
Assessment Methods	Relevance	Productivity	Quality
Peer Review	XX	XX	XX
Metrics	XX	XX	XX
Customer Evaluation	XX	XX	XX

Because of unique circumstances and interpretation, each agency may fill in the table and apply the model differently. One of the assessment methods (peer review, metrics, customer evaluation) may be more or less valuable depending on the nature of the work being done.

(2) Definitions:




Relevance: 	The degree to which the program (or project) adds value
		and is responsive, timely, and pertinent to customers' needs.


Productivity: 	The degree to which work yields useful results.


Quality: 	The degree to which work is considered to be scientifically excellent. 


Peer Review: 	There are three types of peer review that address
		different aspects of performance.


		Prospective peer review generally addresses the relevance of proposed
		research and can be used to ensure the relevance of the research
		to the agency mission. Prospective review can also be an indicator
		of the quality of the research hypothesis, especially in the context
		of the competition for awards. 


		In-process peer review examines ongoing research. It can serve
		as a quality check and a relevance check of projects and programs
		while they are underway. It has particular usefulness for assessment
		of the scientific quality and performance of intramural or Federal
		laboratory research that may not have undergone peer review for
		project selection. 


		Retrospective peer review generally addresses the scientific quality
		of research that has been conducted. 


Metrics:	Standards of measurement that rely on counts of discrete
		entities to infer levels of accomplishment; e.g. improved health
		status, increased production, bibliometrics (publications and
		references), or degrees awarded.


Customer 	Customers are any individuals who directly or indirectly
Evaluation: 	use the products of research. Customer evaluation is the opinion
		of one or more customers about either 
		(1) the extent to which a research program directly benefits the
		customer or  (2) the extent to which the research is perceived as beneficial
		to the public.

(3) The degree of usefulness of the information that each of the three assessment methods provides with respect to the dimensions of relevance, productivity, and quality depends on the particular research work being conducted. For example, in basic research, there may not be a specific customer identified since the purpose of much of this work is to add to the body of knowledge in science. In that case, customer evaluation would be very difficult to obtain. In applied research, there is more likelihood to be a specific customer, so the information about relevance and productivity is more useful. The table needs to be filled in (++, +, o) for each particular research program being evaluated. Attached are some examples of the usefulness of different types of measures for specific basic and applied research programs.

(4) The assessment methods in the model--peer review, metrics, and customer evaluation--will often be used together in a performance evaluation process.

(5) Research outcomes are often not quantifiable. Therefore, research measures should always be accompanied by narrative in order to provide full opportunity for explanation, presentation of anecdotal evidence of success and discussion of the nature of non-metric peer review and customer evaluation measures.

(6) Dogmatic tracking of metrics should be avoided when experience determines they are not useful. Although it is important to be consistent in the types of metrics and goals that are tracked, flexibility in dropping or adding metrics could prove to be very beneficial to arrive at the most useful set of metrics.

(7) Aggregation of measures of individual research projects to the level of an overall program can be accomplished if consistent peer review and customer evaluation protocols, as well as metrics, are used across projects and time. The amount of aggregation needed depends on the audience for the measure; i.e. high-level, external reporting demands greater aggregation, while internal program management needs little if any aggregation. In any case, the amount of aggregation should relate to how one describes progress toward achieving goals.

(8) The information from this model should be reported and be understandable to lay as well as scientific audiences.

Conclusion:

The introduction and application of meaningful and accurate performance measures into Federal agency research programs represents both a significant opportunity and a challenge. Performance measures can become a powerful tool to assist in the management of these programs and to help meet the objectives of GPRA. Accordingly, the Research Round Table offers its full support to the development of such measures for use in reporting to the executive and legislative branches of government and to the public, as well as for internal management. At the same time, it is important to recognize the complexity of the cause-effect relationship between research outputs and their eventual outcomes. These complexities make it difficult to establish quantifiable measures that consistently measure program performance, and they create a potential for incorrect application with a subsequent detrimental effect on the quality and innovativeness of our scientific endeavors.

As a starting point for developing performance measures, the Research Round Table offers the model outlined in this paper. This model identifies dimensions of measurement and methods for obtaining the necessary inputs. It stresses the value of both quantifiable data and narrative statements. The Round Table participants recognize the recommended approach as an evolving process for measuring performance that takes advantage of experimentation and
innovation, encourages sharing of successful efforts, and allows mistakes to be made and new directions taken.

Participant List:

HHS 			USDA			NASA		
   AHCPR		   ARS			DOJ		
   FDA/CBER		   CSREES		DOT		
   FDA/CDER		   ERS			DOEd		
   FDA/CDRH		   FS			NIST		
   FDA/CFSAN		   NASS			LOC		
   FDA/CVM					NSA		
   FDA/NCTR		Army					
   FDA/OPE		   ARL			Other:		
   FDA/ORA		   COE			Fed Focus	
   NIH/NIA					MSI		
   NIH/NIAAA		EPA			NAPA		
   NIH/NIAMS		DOE					
   NIH/NIDCD		NRC					
   NIH/NCI		DOI					
   NIH/NHLBI		   IO					
   NIH/OD		   NBS					
   NIH/OSPE		   NOAA					
   OASH			   USGS					
   OS			   USBM

The Round Table discussed issues relating to both basic and applied research. It also concluded that most, if not all, of its findings apply to classic "development" activities in the research and development environment. But, since development activity was not fully explored with respect to performance measurement, the paper addresses research alone.

Attachment to Research Round Table Discussion Paper
Examples of Unforeseen Research Outcomes

(1) Research done on rat brain tumors was later shown to have an important role in human breast cancer. In fact, of the several genes now known to be involved in human breast cancer, all but one have been identified while working on something other than breast cancer.

(2) Fundamental agricultural research on Agrobacterium, a common soil bacterium that causes crown gall disease in plants, led to the discovery that the tumor-like growth occurs because the bacterium transfers some of its genetic material (DNA) to the host plant. This discovery led to a new genetic tool that was instrumental in making bioengineering of improved crop plants possible.

(3) AIDS research also contributed to knowledge in other fields including virology, immunology, microbiology, and molecular biology. Research has led to a better understanding of the immune system, new approaches to vaccine development, novel diagnostic techniques, and new methods for evaluating drug treatments.

(4) The basic research conducted prior to the AIDS epidemic allowed researchers to more quickly establish the link between the human immunodeficiency virus and AIDS, develop a blood test for the virus and develop treatments, such as AZT, for those suffering from the disease.

(5) The Michelson-Morley experiments on the speed of light in different directions provide a spectacular example of extremely important unforeseen outcomes, leading as they did to Einstein's formulation of the theory of relativity.

(6) The technology developed for recycling cobalt from scrap jet engines using double membrane cells for electrorefining is now to be used to upgrade the national cobalt stockpile, saving taxpayers millions of dollars.

(7) The 1960's breakthrough of deciphering the genetic code has led to the identification of genes linked to illnesses such as breast and colon cancer, Huntington's and Alzheimer's disease, and the inception of gene therapy treatments.