In subsequent pages, this appendix presents the following examples:
United States Department of Agriculture (USDA)
The USDA is engaged in a comprehensive effort to develop goals
and assessment methods for its portfolio of research programs.
Outcomes such as environmentally sound and economical production
of food and fiber, globally competitive U.S. agricultural industries,
improved human health and well-being through better nutrition,
and sustained rural communities are the result of the total investment
in agricultural research and not just one program.
The example given below illustrates how the USDA is working to
address the challenge of assessing one of its portfolio of programs,
the National Research Initiative Competitive Grants Program (NRICGP).
It shows how planning for assessment begins with a statement of
the purpose of the program from which specific goals are derived
and how planning is undertaken in the context of broader goals
for agricultural research. Specific goals, in turn, provide the
basis for development of assessment methods.
The NRICGP has the unique roles within the USDA to:
The NRICGP presents the USDA with the same challenge faced by
other mission agencies and the National Science Foundation, namely
how to assess fundamental research aimed at expanding knowledge
without a particular near-term application in mind. Yet the Department's
other predominately mission-linked research programs depend on
the foundation knowledge provided by the research supported by
this competitive grants program. In addition, grants awarded by
the NRICGP are usually only one source of support for the research
carried out by the researcher(s), with other sources provided
by other USDA research programs, other federal programs, and state
appropriations to land grant universities.
The goals listed below are an attempt first to identify specifically
what is expected of the NRICGP based on the unique roles of this
program within the broader portfolio of USDA research programs,
and second to identify goals more generally of the broader portfolio
of USDA research programs for which major contributions are expected
from the NRICGP. The performance measures are examples based on
what specifically would be measured for the NRICGP.
The goals of the NRICGP are:
The goals of USDA-funded research for which the NRICGP plays
a major role are:
Performance measures for the NRICGP are:
USDA experience to date. While it is too soon to derive
general lessons from the USDA experience in development of specific
goals and performance measures for its competitive grants program,
this example may be useful to other mission agencies with a similar
broad and comprehensive array of programs.
Department of Energy
This example provides a set of criteria and measures that have
been developed by the Department of Energy (DoE) for performance
evaluation of basic research performed at the national laboratories
and at other government-owned, contractor-operated facilities.
The set illustrates the use of multiple indicators for setting
target levels of performance and assessing subsequent performance.
The major elements of the scientific research performance criteria
and measures at DoE are:
Sub-criteria that apply to all research programs are:
Sub-criteria that apply to multi-investigator, multi-disciplinary,
integrated research programs are:
National Science Foundation
National Science Foundation
National Science Foundation
National Institutes of Health
National Institute of Standards and Technology (NIST)
National Institute of Standards and Technology (NIST)
Research Round Table Discussion Paper
Attachment to Research Round Table Discussion Paper
National Research Initiative Competitive Grants Program:
One of a Portfolio of USDA-Funded Agricultural Research
Programs
Scientific Research Performance Criteria and Measures
Sub-criteria that apply to design and construction of research
facilities are:
DoE experience to date. In formulating these criteria and
measures, DoE encountered a cooperative and supportive response
from the internal and external science community. Consequently,
the Department is negotiating these criteria into new and renewal
contracts for the national laboratories and other government-owned,
contractor-operated facilities that perform basic research. The
facilities include single purpose laboratories such as high energy
physics accelerators, as well as the multi-program national laboratories.
In these contracts:
Science and Technology Centers Program Pilot Project:
Use of an Alternative GPRA Format
This example describes the GPRA pilot project
for the Science and Technology Centers (STC) Program of the National
Science Foundation (NSF). The example illustrates the use of an
alternative approach allowed under GPRA. The GPRA alternative
approach requires clear criteria for determining if a program
is "minimally effective" or "successful,"
but it does not require performance indicators that are quantifiable
and measurable.
The STC activity supports twenty five university-based research
centers in a variety of scientific areas. The program supports
cutting-edge interdisciplinary research that requires the advantages
of larger scale and more stable funding provided by a center.
The activity also supports education, knowledge transfer from
academic researchers to industry, government agencies, and other
sectors of society, and knowledge transfer among academic institutions.
The center mode complements other modes of research support (awards
to individual investigators or small research groups, for example)
by providing higher levels of longer-term support for collaborative
activities that cross disciplinary and institutional barriers.
Designing performance indicators for this GPRA pilot project was
a collaborative effort among evaluation specialists and the program
staff that oversee awards to the STC. The resulting informal advisory
group agreed that the STC Program goals were not just to support
research, knowledge transfer, and education activities. In addition,
the program intended to pursue these goals in ways that were distinct
from, and complementary to, the activities supported by grants
to individual investigators and small groups. The performance
indicators therefore had to reflect not only on quality and quantity
of impact, but on the uniqueness of the program's contributions
as well.
After a less than satisfactory attempt to create quantitative
performance measures in Fiscal Year 1994, the program decided
to create a more qualitative set of indicators. For its Fiscal
Year 1995 performance plan, the STC Program pilot project proposed
to use the alternative approach under GPRA. The law states:
If an agency, in consultation with the Director of the Office
of Management and Budget, determines that it is not feasible to
express performance goals for a particular program activity in
an objective, quantifiable and measurable form, the Director of
the Office of Management and Budget may authorize an alternative
form. Such alternative form shall--
with sufficient precision and in such terms that would allow for
an accurate, independent determination of whether the program
activity's performance meets the criteria of the description;
...
The STC performance plan articulated three goal areas for the
Program:
Goal 1: Address challenging and far-reaching interdisciplinary
research problems that require the greater resources and longer-term
support of a center; create new knowledge.
Goal 2: Transfer knowledge and technology to industry, government
agencies and laboratories, and other sectors of society through
partnerships and collaborative activities; transfer knowledge
to other academic institutions through exchange programs, workshops,
and other leadership activities.
Goal 3: Produce graduates at all levels with unique interdisciplinary
capabilities in science; increase participation by women and minorities;
improve science education and research training.
The informal GPRA advisory group then confronted the issue of
how to judge the success of the program based on the performance
of the centers supported in the program. There was no obvious
precedent for this sort of portfolio assessment; so, the following
operational solution was proposed:
The STC activity would be considered minimally effective if:
80% of the Centers are successful in reaching one goal, and
50% of the Centers are successful in reaching two or more goals.
The STC activity would be considered successful if:
90% of the Centers are successful in reaching one goal, and
75% of the Centers are successful in reaching two goals, and
20% of the Centers are successful in reaching all three goals.
This proposal has several implications. First, it eliminates competition
among the centers to be "best," and it promotes cooperation.
Second, this proposal recognizes that some level of failure is
acceptable--even necessary--if the program and the centers are
being asked to take risks.
Unfortunately, the problem of defining "success" in
reaching a goal remained. The informal advisory group tried to
design definitions of success that would be credible but would
provide the centers with some flexibility. The solution was to
create two qualitative indicators for each goal area and to describe
"significant progress" and "outstanding progress"
for each indicator. A center would be viewed as reaching a goal
(for the three goal areas of research, knowledge transfer, and
education) if an expert panel considered the center to have made
significant progress on both indicators for that goal or outstanding
progress on one of the indicators. This would allow centers to
invest deeply in a single innovative effort in a particular goal
area or to spread their efforts among several initiatives.
Experience to date. However, the informal advisory group
agreed that this performance plan was beyond the ability of the
STC Program at least initially. Data gathering, analysis, and
evaluation for the pilot project are currently being conducted
by an evaluation contractor. As a result, it is too soon to derive
general lessons from this pilot project.
An example of indicators for significant progress and outstanding
progress is shown below for Goal 1, interdisciplinary research:
Indicator 1a, prominence and impact of center in emerging research
areas, as perceived by non-center university investigators and
non-academic researchers, and as evidenced by patterns of research
productivity
Significant progress: Most of the center's research products (publications
and presentations at meetings) appear in the field's most respected
peer-reviewed vehicles (journals, professional society meetings)
or in vehicles influential among industry, governmental agencies,
and other knowledge transfer partners
(e.g., trade journals, policy forums, industrial conventions),
as judged by non-center specialists.
Outstanding progress: In addition to meeting the standard for
significant progress, several of the center's research products
are counted among the most influential contributions affecting
the current direction of the field, as judged by non-center specialists.
Indicator 1b, interdisciplinary nature and duration of Center-based
research
Significant progress: The Center's research agenda consists primarily
of ambitious research programs that are being pursued for at least
three years using perspectives, approaches, and techniques from
several disciplines, as judged by non-Center investigators in
the field.
Outstanding progress: The Center's research agenda consists primarily
of ambitious research programs that are being pursued for at least
three years using unique and pioneering combinations of perspectives,
approaches, and techniques from several disciplines, as judged
by non-Center investigators in the field.
Facilities Pilot Project
As part of the National Science Foundation (NSF) involvement
with GPRA, we chose to assess the operational performance of our
national facilities.
(1) Assessment of National Facilities
The Foundation supports a number of user facilities, such
as telescopes and accelerators, in several different disciplinary
directorates. The facilities have a common purpose: as phrased
in the NSF strategic plan, "to enable the United States to
uphold a position of world leadership" in selected fields
of science. Each facility is planned in response to needs in a
specific field, and each one starts from a different technical
baseline.
(2) Description of the Method
In our first attempt at a performance plan, the participants
in the pilot project developed five generic goals for their facilities.
When facilities directors were asked to produce whatever data
they thought was useful in relation to these five goals, there
were difficulties (see below) but some interesting ideas emerged.
The pilot project leader then called the group together, along
with some representatives of other major facilities that NSF supports,
and the group developed a dozen generic performance indicators
under three broad performance concepts:
The key to the plan was to think about the performance measures
in terms of percentage change from a baseline. The baseline number
could be different for each facility, and even measured in different
metrics. To help standardize, the group had to invent a term,
"user units," to refer generically to entities like
beam time and observing hours. In the end, however, the percentage
change from each facility could be folded into a percent change
for the whole portfolio of facilities. The portfolio concept allows
for variation among the facilities in their indicators for any
particular year. When an individual facility experiences bad weather,
for example, its figures may droop; but that individual variation
will play only a small role in the Foundation-wide average. When
old or ineffective facilities are dropped and new facilities added
through the regular peer review process, the indicators for the
portfolio should improve.
Thinking in terms of the portfolio concept was a major challenge
for the group because NSF normally devotes so much attention to
the evaluation of individual facilities. The performance indicators
for the individual facilities would, of course, be available to
program managers and site visit teams, but, when used at that
level, they would be interpreted in context, with regard to the
performance expectations for that particular site. At the same
time, improving the generic, aggregate performance characteristics
of the portfolio became a goal for the Foundation as a whole.
(3) Lessons and Insights
Our first attempt at developing goals was only partially successful.
We found that national facility directors had difficulties in
translating these goals into practical performance measures and
indicators. The involvement of the facility directors as part
of a team approach was found to be far more effective, and resulted
in a "buy-in" from the groups involved.
NSF is continuing to develop these concepts, working with the
leaders of these facilities.
Paradigm Shifts as Performance Indicators
The National Science Foundation (NSF) is engaged in
a wide-ranging interactive process with staff to develop new performance
indicators for NSF programs. During this process, NSF staff have
suggested many creative ideas for further exploration. One of
these would use observed shifts in research paradigms to help
identify outcomes or impacts of research undertaken in the past.
Background for applying this approach to examining outcomes or
impacts of research in computer science is summarized in this
example.
Historically, the scientific method has been based on theory and
experimentation. A researcher would propose a theory to explain
some phenomenon, an experiment would be formulated to test the
theory, and observations of the experiment would be used to validate
the theory and/or propose modifications to the theory to account
for the observations.
With the advent of computing, the possibility of a new paradigm
emerged: computer simulations could be used in place of experiments,
with results of a simulation playing the role of observations,
leading to validation of theories or raising questions that required
modification to theories. In some cases the computation could
be used to complement experiments, in others it could actually
replace them. One of the earliest recognitions of role of computation
in the scientific method occurred in the fluid dynamics community,
particularly aeronautics. For example, it is simply not possible
to run an experiment with a manned vehicle entering the earth's
atmosphere at hypersonic speeds in order to determine the effects
of heating. These "experiments" were done via computer
simulation and the result was that the space shuttle flew without
ever being tested in reentry.
There are innumerable other cases where computer simulation has
replaced experimentation and for a variety of reasons. As with
the space shuttle, it may simply not be possible to conduct the
appropriate experiment. A scientific example of this is the study
of the initial conditions required for the formation of galaxies;
one simply can't run the experiment. Another case occurs when
the experiment or the gathering of the data changes the environment,
thereby altering the experiment. For example, if one is trying
to determine the tensile strength of a pure material, the inclusion
of sensors to measure the buildup of stresses alters the material,
and thus the experiment. Finally, there are cases where the experiment
cannot or should not be done because it is life-threatening. An
example is nuclear testing where no full scale experiments have
been permitted for years and where a major computational program
has been proposed to support the simulations necessary to evaluate
the integrity of the nuclear stockpile.
There are many other potential areas where the paradigm shift
has not occurred for lack of either computer power or acceptance
on the part of the public. For example, drug design is still based
primarily on the experience of the scientist and on experimentation.
It is possible to do simulations of relatively small molecules
with the accuracy required to study "docking," the process
by which a drug binds with the existing biological structure,
but dealing with complex molecules such as proteins is still in
the future. Even further in the future is the use of simulations
to replace clinical testing to determine the effects of a drug.
Here, not only do we lack computational power, but also public
acceptance.
It should be clear that these paradigm shifts, where a field or
discipline accepts computer simulations as a partner in the scientific
method, are an indicator of the advancement of computing technology
and thus can be used as a performance indicator for computational
science. It should also be clear that it is very difficult to
predict in what fields and when such a shift will occur. Such
a prediction is predicated not only on a variety of technological
advancements, which are difficult enough to forecast, but also
on societal acceptance, be it from a relevant industry or the
public at large. Thus, paradigm shifts as performance indicators
are most effective if viewed by looking back to determine if and
when they happened rather than trying to look forward to predict
when they might occur. In this sense they are best suited to the
"alternative" form of performance assessment where they
could be used to help determine if an organization is exhibiting
"minimally effective" or "successful" performance.
It is relatively easy for a panel of experts to assess to what
degree computation is influencing and being used by a particular
field, and to determine when it is accepted by that field as an
equal component of the scientific method.
Such acceptance of a paradigm shift has taken place recently in
cosmology, where computation is now being used to test existing
theories and to help guide formulation of new ones. As identified
by a leading researcher in the field, several developments came
together to enable this shift. First, well defined physical theories
which make testable predictions possible were developed. Second,
numerical algorithms which can accurately simulate the formation
of cosmological structures, such as galaxies and clusters of galaxies,
starting from primordial initial conditions were developed and
refined; and these can now be combined into predictive numerical
codes. Third and most recently, computing power and memory have
reached a level where it is possible to model the universe in
full three dimensions, with time evolution, rather than in just
two dimensions. This last development is a crucial step generally
in a paradigm shift, as virtually all physical phenomena of interest
are three dimensional with some degree of time dependence.
NSF plans to explore development of shifts in research paradigm
as a performance indicator. It is too soon to report general methodological
lessons at this time.
Cost Savings Resulting from Biomedical Research
This example illustrates how the National Institutes
of Health (NIH) have used economic methods to compute savings
flowing from biomedical research. Over the years, NIH has transmitted
estimated benefits to Congress and the public through hearings,
publications, and other media.
The NIH publication, Cost Savings Resulting from NIH Research
Support (2nd edition), summarizes 34 case studies, including:
The estimated savings are based on the difference between estimated
direct plus indirect costs for a particular disease
before and after the medical innovation. Direct
costs include the costs of medical resources required
to provide health care in response to the illness or condition,
as well as nonmedical costs (custodial care, special diets, tutors,
transportation, special equipment, governmental and voluntary
community support programs) associated with the condition. Indirect
costs represent the productivity lost to society as a
result of premature mortality or lost work days due to morbidity.
In what is called the "human capital" approach, such
costs are valued in terms of lost earnings and expressed in terms
of dollars. Since the value of extra years of life or of reduced
pain and suffering due to the medical innovation is not
estimated, the estimates represent a conservative approach to
valuing the benefits of biomedical R&D.
The case studies reported in Cost Savings Resulting from NIH
Research Support do not include basic research. However, fundamental
science (whether it is basic or applied research) produces the
continuing insight and understanding about the mechanisms of life
and disease which are needed for the R&D and innovation that
lead to health care improvement. The linkages between fundamental
research and health care advances are complicated, long-term,
and impossible to allocate completely and clearly. Nevertheless,
the NIH calculations of savings flowing from biomedical research
provide insight about the size and importance of health care innovations
enabled by fundamental science.
In general, when the retrospective study period is long enough,
the appropriate data are available, and the trail of connections
between fundamental research and eventual long-run impacts is
sufficiently clear, then economic methods can help to illuminate
the sorts of contributions to over-arching national goals that
are enabled by fundamental research.
Economic Impacts of Research in Metrology
In the early 1980s, NIST began to examine rates of
return realized from investment in different types of technology
and in different stages of a technology's development by measuring
and quantifying the economic returns from the Institute's research
and services. The results, summarized below, provide another perspective
on the value that U.S. taxpayers realize from their investment
in NIST. Conducted by independent researchers under contract to
NIST, the studies have estimated the "social" rate of
return, which is the aggregate rate of return to all investment
in the technology generated by specific projects. The general
methodology used in studies of the economic impact of NIST laboratory
research is to compute the aggregate flow of benefits over time
and the aggregate costs for a particular investment. Then, the
present value of benefits and the present value of costs can be
computed in order to solve for the benefit-cost ratio. Or the
implicit internal rate of return can be computed from the costs
of the investment and the aggregate flow of benefits over time.
In these studies, the internal rate of return is computed in the
same way that rates of return are calculated from the time flows
of costs and benefits associated with a particular project or
investment in the business and financial communities. Estimated
internal rates of return are one indicator that corporations use
in making choices among alternative investment projects, including
R&D projects. Economists use the same approach when they estimate
the "private" rate of return to a particular firm on
its original investment and the "social" rate of return
on that investment--where the "social" rate of return
refers to benefits accruing to the firm and to all others, regardless
of whether they were involved in the original investment process.
The estimates summarized below are conservative because they are
based only on quantifiable benefits realized by companies and
consumers. Important qualitative benefits, such as enabling industry
standards or opening new avenues of research, are not included
in the estimates.
The first step in estimating costs and benefits for a particular
project is to conduct intensive interviews of the relevant NIST
manager(s) to determine the scope and nature of the project's
technical output and the stages of the economic process (R&D,
production, and marketing) where technical infrastructure produced
by the project would be absorbed. Next, the costs of the project
itself, the costs to industry of applying the project results
over time, and the economic benefits realized over time are estimated.
The NIST project cost and the net benefit series over time can
then be solved for the project's social rate of return.
For example, a study of the NIST project that developed the technical
basis for product-acceptance test methods in the optical fiber
industry began by identifying the fiber parameters for which more
accurate test methods were needed. Next, private sector firms
were surveyed to determine how the new test methods were being
used (e.g., to control production or to facilitate sale of the
product). The cost savings realized by the entire industry were
estimated from the data collected. These aggregate economic (social)
benefits were netted against costs. The net benefit stream and
the original investment cost were solved for the internal rate
of return. In addition to providing estimates of quantifiable
benefits, these studies have also documented qualitative benefits,
such as impacts on future R&D decisions.
Summary Results. The studies and their estimated rates
of return are:
The median rate of return for these twelve economic impact studies
is 147 percent. Across all project areas, the rate of return ranges
from 41 percent to 428 percent, indicating that significant benefits
flow back to the U.S. economy and society. These rates compare
favorably with those reported in studies of returns on other public
investments in technology and on private-sector R&D investments.
Key features of four of these studies are highlighted below.
Conductivity of semiconductors 63%
Wire bonding of semiconductor components 140%
Electrical resistance of semiconductors 181%
Electromigration in interconnects 117%
Electromagnetic interference 266%
Power and energy calibration services 428%
Coordinate measuring machines 97%
Optical fiber 423%
Spectral radiometry 145%
Real-time control system architecture 149%
Integrated services digital network 156%
Software conformance testing 41%
Research on Power and Energy Measurement and Calibration Services.
Results of this study illustrate how the benefits of NIST
measurements fan out across industry and markets. The Institute
maintains the U.S. standard for the watthour; and it conducts
research to improve the measurement accuracy of the 2,000 standard
watthour meters used to calibrate the more than 2 million watthour
meters sold in the U.S. each year. In all, U.S. utilities monitor
more than 100 million watthour meters that record customers' power
usage, totaling more than 2,700 billion kilowatt hours annually
and generating industry revenues exceeding $180 billion. Through
its research, NIST has enabled a tenfold increase in the measurement
accuracy of watthour meters, reducing the uncertainty to 0.005
percent. The result has been an increase in the accuracy of customers'
bills, which translates into even greater assurance that consumers
are charged only for the power that they actually use. This improvement
and other benefits due to the traceability of meter measurements
to national standards have produced sizable returns to U.S. taxpayers.
A 1994 analysis estimates that total benefits exceed costs by
a ratio of 41 to 1. The estimated social rate of return was 428
percent.
Research on Optical Fiber Standards. NIST provided basic
measurement technology, as well as technical assistance in developing
industry standards, for the optical fiber industry in the 1980s.
The rate of return to NIST-conducted research in this area was
423 percent. Over the period studied, NIST provided technical
support to the industry as it promulgated 22 standards for this
complex area of technology. A 1991 economic impact study noted
that while the direct economic benefits to the industry were primarily
from substantial reductions in market transaction costs, an important
indirect effect of the NIST work was "a much faster rate
of growth for the optical fiber market and hence for the U.S.
optical fiber industry." The president of the Telecommunications
Industry Association said: "Without the NIST assistance and
leadership, the U.S. fiber optics industry would not be in the
competitive position it is today."
Research in Electromagnetic Compatibility/Interference Metrology.
The problem of interference among electronic and electrical devices
has grown enormously with the proliferation of these devices.
A 1991 study of the economic impact of NIST research in electromagnetic
compatibility/interference metrology conducted over the previous
decade found that organizations using NIST's research improved
research efficiency and reduced transaction costs. Based on these
cost-saving benefits, the study showed an estimated spillover
rate of return of more than 260 percent for this program.
Research on Methods to Prevent Failures of Integrated Circuits.
As semiconductor devices become increasingly dense (a state-of-the-art
microprocessor, for example, contains more than 3 million transistors),
design and manufacturing challenges also become increasingly severe.
NIST worked with the U.S. semiconductor industry in the 1980s
to develop improved methods to test for a specific problem--electromigration--that
causes the thin metal wires connecting integrated circuit components
to fail. Benefits to this industry, including reduced production
and transaction costs as well as improved research efficiencies,
led to an estimated rate of return of 117 percent for this NIST
project.
Fundamental Research. NIST also pursues fundamental research
to meet future industrial needs. Although we cannot estimate the
aggregate future benefits that will flow from the eventual applications
of knowledge produced by the results of current research, we can
identify some probable areas of impact. For example, NIST has
a program of future-looking research in which it develops the
tools and fundamental understanding that will help it anticipate
and respond to measurement needs arising from advances in science
and technology and intensifying international competition. Examples
of recent research likely to have an economic impact are:
NIST-Industry Linkages. The NIST laboratories plan and
carry out their research in collaboration with industry. As a
result, the federal investment yields critically needed measurement
methods and other infrastructural technologies that open the way
to advances in research, improvements in processes and products,
efficiencies in the marketplace, and other benefits reaped by
companies and industries, and, through them, the economy.
Assessment Panels for Merit Review with Peer Evaluations
The seven technical Laboratories of NIST undergo annual
assessments by external panels convened by the National Research
Council. The panels consist of scientists, engineers and technical
managers from academia, industry and government. In 1995, the
panels had a total membership of 144 with about 25% from academia,
60% from industry and 15% from government. These assessment panels
visit the Laboratories twice a year, once as a group and once
on an individual basis where they have an opportunity to interact
directly with scientific staff. They produce written evaluations
of performance, missions, and short- and long-term goals for each
Laboratory as a whole and for each division. The written evaluation
is a detailed report with about 20-30 recommendations and questions
for the Laboratory to address before the panels reconvene.
Activities of the panels include: reviewing the technical programs
of NIST with respect to the needs of U.S. scientific and technological
communities; making reports to the NIST Director and briefing
the statutory Visiting Committee on Advanced Technology; and apprising
them of the balance and general effectiveness of the programs
of NIST. The panels also assist NIST in examining emerging technologies
expected to require research in metrology. The panels make recommendations
with regard to the following types of questions:
In Fiscal Year 1995, the evaluation panels focused on six specific
issues of concern:
This process for evaluating NIST programs has been used effectively
for 37 years. A major review of both the process and product is
being undertaken this year.
August 1, 1995
Developing and Presenting Performance Measures for Research
Programs
The implementation of the Government Performance
and Results Act (GPRA) of 1993 will provide many challenges to
managers across the Federal Government. Adapting GPRA requirements
to the Federal research environment will be especially challenging
1 .
Applied constructively, GPRA can have a positive effect on the
quality and innovativeness of our scientific endeavors. Conversely,
serious detrimental effects can occur if it is applied incorrectly.
Federal researchers and managers representing a cross-section
of departments and agencies have met over the last six months
in a Round Table forum to discuss the unique circumstances surrounding
the development of performance measures for research programs.
The following observations and model for research performance
measures are the result of these discussions.
Purpose:
This paper articulates an approach that Government research
organizations can use in applying the principles of GPRA to a
wide range of federally supported research activities.
Background:
In 1993, Congress passed and the President signed into law
P.L. 103-62, the Government Performance and Results Act (GPRA).
The intent of the statute was to increase the effectiveness, efficiency,
and accountability of the Federal Government.
GPRA requires each agency to:
Observations from the Forum:
(1) The results of research program performance can be measured.
The indicators that can be used will vary between basic and applied
research programs.
(2) The Federal research community recognizes the importance and
desirability of measuring performance and results and reporting
them to the executive and legislative branches of government and
to the public. Such measures are also useful in the internal management
of these programs.
(3) Measures can be developed proactively by research organizations
in consultation with their customers and stakeholders. Careful
identification of the full range of customers, stakeholders, and
partners will aid the selection of appropriate performance measures.
(4) It is appropriate and in the public interest that the Federal
research community define how their achievements will be measured
and begin using the agreed-upon measures as soon as possible.
(5) The cause-effect relationships between research outputs and
their eventual outcomes are complex. Often, it is extremely difficult
to quantify these relationships empirically--even though obvious
logical relationships exist between the outputs and outcomes.
The difficulties arise from (a) the long time delays that often occur between research results
and their eventual impacts, (b) the fact that a specific outcome is usually the result of
many factors, not just a particular research program or project,
and (c) the fact that a single research output often has several outcomes,
often unforeseen, not a single unique outcome (see the attachment
for examples). Consequently, the cause-effect relationship between
research outputs and their resultant outcomes should be described
in terms of logical causality. Quantitative empirical demonstrations
should not be required and are, often, not even possible.
(6) As envisioned in GPRA, strategic planning is a prerequisite
to performance measure development. Performance measures should
be derived directly from a research program's goals and objectives.
They should measure the extent to which specific goals and/or
objectives are being accomplished.
(7) Performance measures should have value to the program measured.
In fact, measurements currently made for internal program management
will frequently provide key data for performance measures suitable
for GPRA.
A Performance Measurement Model for Research:
The following model, initially formulated by the Army Research
Laboratory and expanded by the government-wide Research Round
Table, describes an approach that addresses GPRA requirements
and improves the management of performance. It presents a method
of evaluation that is both equitable and informative about research
and development programs. It applies to all types of research,
which can be arrayed on a continuum extending from the most basic
research through specific applied research. Depending on where
a program falls on this continuum, certain types of evaluation
methods may be more pertinent than others. As research moves toward
the applied end of the continuum, more specific outcome measures
can be identified. No single measure can be used to assess the
success of research.
(1) Research can be evaluated using the following matrix. It arrays
dimensions of performance (relevance, productivity, quality) by
assessment methods (peer review, metrics, customer evaluation):
Assessment Methods Relevance Productivity Quality Peer Review XX XX XX
Metrics XX XX XX
Customer Evaluation XX XX XX
XX to be entered as:
++ = Very Useful
+ = Somewhat Useful
o = Less Useful
(2) Definitions:
Relevance: The degree to which the program (or project) adds value
and is responsive, timely, and pertinent to customers' needs.
Productivity: The degree to which work yields useful results.
Quality: The degree to which work is considered to be scientifically excellent.
Peer Review: There are three types of peer review that address
different aspects of performance.
Prospective peer review generally addresses the relevance of proposed
research and can be used to ensure the relevance of the research
to the agency mission. Prospective review can also be an indicator
of the quality of the research hypothesis, especially in the context
of the competition for awards.
In-process peer review examines ongoing research. It can serve
as a quality check and a relevance check of projects and programs
while they are underway. It has particular usefulness for assessment
of the scientific quality and performance of intramural or Federal
laboratory research that may not have undergone peer review for
project selection.
Retrospective peer review generally addresses the scientific quality
of research that has been conducted.
Metrics: Standards of measurement that rely on counts of discrete
entities to infer levels of accomplishment; e.g. improved health
status, increased production, bibliometrics (publications and
references), or degrees awarded.
Customer Customers are any individuals who directly or indirectly
Evaluation: use the products of research. Customer evaluation is the opinion
of one or more customers about either
(1) the extent to which a research program directly benefits the
customer or (2) the extent to which the research is perceived as beneficial
to the public.
(3) The degree of usefulness of the information that each of the
three assessment methods provides with respect to the dimensions
of relevance, productivity, and quality depends on the particular
research work being conducted. For example, in basic research,
there may not be a specific customer identified since the purpose
of much of this work is to add to the body of knowledge in science.
In that case, customer evaluation would be very difficult to obtain.
In applied research, there is more likelihood to be a specific
customer, so the information about relevance and productivity
is more useful. The table needs to be filled in (++, +, o) for
each particular research program being evaluated. Attached are
some examples of the usefulness of different types of measures
for specific basic and applied research programs.
(4) The assessment methods in the model--peer review, metrics,
and customer evaluation--will often be used together in a performance
evaluation process.
(5) Research outcomes are often not quantifiable. Therefore, research
measures should always be accompanied by narrative in order to
provide full opportunity for explanation, presentation of anecdotal
evidence of success and discussion of the nature of non-metric
peer review and customer evaluation measures.
(6) Dogmatic tracking of metrics should be avoided when experience
determines they are not useful. Although it is important to be
consistent in the types of metrics and goals that are tracked,
flexibility in dropping or adding metrics could prove to be very
beneficial to arrive at the most useful set of metrics.
(7) Aggregation of measures of individual research projects to
the level of an overall program can be accomplished if consistent
peer review and customer evaluation protocols, as well as metrics,
are used across projects and time. The amount of aggregation needed
depends on the audience for the measure; i.e. high-level, external
reporting demands greater aggregation, while internal program
management needs little if any aggregation. In any case, the amount
of aggregation should relate to how one describes progress toward
achieving goals.
(8) The information from this model should be reported and be
understandable to lay as well as scientific audiences.
Conclusion:
The introduction and application of meaningful and accurate
performance measures into Federal agency research programs represents
both a significant opportunity and a challenge. Performance measures
can become a powerful tool to assist in the management of these
programs and to help meet the objectives of GPRA. Accordingly,
the Research Round Table offers its full support to the development
of such measures for use in reporting to the executive and legislative
branches of government and to the public, as well as for internal
management. At the same time, it is important to recognize the
complexity of the cause-effect relationship between research outputs
and their eventual outcomes. These complexities make it difficult
to establish quantifiable measures that consistently measure program
performance, and they create a potential for incorrect application
with a subsequent detrimental effect on the quality and innovativeness
of our scientific endeavors.
As a starting point for developing performance measures, the Research
Round Table offers the model outlined in this paper. This model
identifies dimensions of measurement and methods for obtaining
the necessary inputs. It stresses the value of both quantifiable
data and narrative statements. The Round Table participants recognize
the recommended approach as an evolving process for measuring
performance that takes advantage of experimentation and
innovation, encourages sharing of successful efforts, and allows
mistakes to be made and new directions taken.
Participant List:
HHS USDA NASA
AHCPR ARS DOJ
FDA/CBER CSREES DOT
FDA/CDER ERS DOEd
FDA/CDRH FS NIST
FDA/CFSAN NASS LOC
FDA/CVM NSA
FDA/NCTR Army
FDA/OPE ARL Other:
FDA/ORA COE Fed Focus
NIH/NIA MSI
NIH/NIAAA EPA NAPA
NIH/NIAMS DOE
NIH/NIDCD NRC
NIH/NCI DOI
NIH/NHLBI IO
NIH/OD NBS
NIH/OSPE NOAA
OASH USGS
OS USBM
The Round Table discussed issues relating to both basic and applied research. It also
concluded that most, if not all, of its findings apply to classic "development" activities in the
research and development environment. But, since development activity was not fully
explored with respect to performance measurement, the paper addresses research alone.
Examples of Unforeseen Research Outcomes
(1) Research done on rat brain tumors was later shown to have
an important role in human breast cancer. In fact, of the several
genes now known to be involved in human breast cancer, all but
one have been identified while working on something other than
breast cancer.
(2) Fundamental agricultural research on Agrobacterium,
a common soil bacterium that causes crown gall disease in plants,
led to the discovery that the tumor-like growth occurs because
the bacterium transfers some of its genetic material (DNA) to
the host plant. This discovery led to a new genetic tool that
was instrumental in making bioengineering of improved crop plants possible.
(3) AIDS research also contributed to knowledge in other fields
including virology, immunology, microbiology, and molecular biology.
Research has led to a better understanding of the immune system,
new approaches to vaccine development, novel diagnostic techniques,
and new methods for evaluating drug treatments.
(4) The basic research conducted prior to the AIDS epidemic allowed
researchers to more quickly establish the link between the human
immunodeficiency virus and AIDS, develop a blood test for the
virus and develop treatments, such as AZT, for those suffering
from the disease.
(5) The Michelson-Morley experiments on the speed of light in
different directions provide a spectacular example of extremely
important unforeseen outcomes, leading as they did to Einstein's
formulation of the theory of relativity.
(6) The technology developed for recycling cobalt from scrap jet
engines using double membrane cells for electrorefining is now
to be used to upgrade the national cobalt stockpile, saving taxpayers
millions of dollars.
(7) The 1960's breakthrough of deciphering the genetic code has
led to the identification of genes linked to illnesses such as
breast and colon cancer, Huntington's and Alzheimer's disease,
and the inception of gene therapy treatments.