Assessing Fundamental Science: II. Assessment in the Context of GPRA

II. Assessment in the Context of GPRA

Underlying principles. Science agencies must devise assessment strategies that are appropriate to the nature of scientific processes and to the enabling role of fundamental science in support of over-arching national goals. Appropriate assessment strategies should encourage the most effective use of American scientific prowess. They should be designed to sustain and advance the excellence and responsiveness of the research system in order to enable the nation generally and the scientific community specifically to respond to surprises, pursue detours, and revise program agendas in response to new scientific information and technical opportunities essential to the future well-being of all our people.

In thinking about performance indicators, it is important to remember that people will respond to them as performance incentives. However, since such responses occur regardless of whether the indicators are balanced or skewed, any assessment methods that are adopted should accurately reflect the priorities of the program being evaluated. Performance indicators that establish incentives that are counterproductive should be avoided because they are worse than no indicators at all. Accordingly, performance indicators should provide positive incentives and encourage risk taking. The assessment of results should focus on those measures and information that are useful to, and will be used by, program managers.

Since one purpose of GPRA is to provide a tool that is useful to program managers and stakeholders, agencies should avoid assessments that would be inordinately burdensome or costly.

Existing measures and related methods allow the capture of important elements of research output, but significant aspects of such output and most elements of the eventual outcomes and impacts cannot be readily quantified using straight-forward measurement techniques. Capture of the dynamic complexities of fundamental science and its relationships to national goals cannot be achieved through simple quantitative measures that might flow from strict reliance on a linear assessment model developed for manufacturing processes.

Management of the science enterprise to assure world-class research has been built historically on the concept of merit review with peer evaluation--that is, proposed research projects are reviewed by scientific experts and funded based on their scientific merit. Under this system, potential projects or programs are evaluated against the standard of excellent research at the frontier of knowledge. A form of merit review with peer evaluation can also be used for retrospective evaluation of an agency's fundamental science program or programs. The results of such retrospective reviews can be combined with other sources of information about program performance to prepare retrospective program assessments.

Efforts to assess the varied and complex dimensions of fundamental science should use multiple sources and types of evidence. In addition to merit review with peer evaluations, retrospective performance reports might draw on quantitative indicators, qualitative indicators, descriptive indicators or narrative text, examples of outstanding accomplishments and of more typical levels of achievement, information about context, and findings from special studies and analyses.

Since conventional assessment methods focus on currently observable outputs, outcomes, and impacts of research programs, they can seriously understate the total value of any fundamental science program because the future applications of fundamental research are not and cannot be fully anticipated. A period of experimentation will be required to develop better methods. Starting with the GPRA pilot studies now underway, the science agencies have launched a range of experimental and special efforts aimed at development of an effective set of assessment tools for their programs.

GPRA is intended not just as a better tool for planning and management but also as an aid to improved communication among program participants, Congress, and the public. Agencies should produce assessment reports that will inform future policy development and subsequent refinement of program plans. They should use assessment activities to communicate program results to the public and their elected representatives.

Basic principles for assessment of fundamental science in individual agencies are summarized in the box on the following page:

Principles for Assessment of Fundamental Science Programs:
Begin with a clearly defined statement of program goals.

Develop criteria intended to sustain and advance the excellence and responsiveness of the research system.

Establish performance indicators that are useful to managers and encourage risk taking.

Avoid assessments that would be inordinately burdensome or costly or that would create incentives that are counter productive.

Incorporate merit review and peer evaluation of program performance.

Use multiple sources and types of evidence; for example, a mix of quantitative and qualitative indicators and narrative text.

Experiment in order to develop an effective set of assessment tools.

Produce assessment reports that will inform future policy development and subsequent refinement of program plans.

Communicate results to the public and elected representatives.

**Principles for Assessment of Fundamental Science Programs:**
Begin with a clearly defined statement of program goals.
Develop criteria intended to sustain and advance the excellence and responsiveness of the research system.
Establish performance indicators that are useful to managers and encourage risk taking.
Avoid assessments that would be inordinately burdensome or costly or that would create incentives that are counter productive.
Incorporate merit review and peer evaluation of program performance.
Use multiple sources and types of evidence; for example, a mix of quantitative and qualitative indicators and narrative text.
Experiment in order to develop an effective set of assessment tools.
Produce assessment reports that will inform future policy development and subsequent refinement of program plans.
Communicate results to the public and elected representatives.

GPRA system. Under GPRA and the management philosophy upon which it is based, primary elements of an effective management system are strategic plans that lay out long-term goals and the resources required to meet them, annual performance plans that link agency operations to long-term agency goals, and performance reports that provide feedback to managers, policy makers, and the public as to what was actually accomplished with the resources expended. The results stated in the performance reports should be linked to the goals of the prior year's plan and include the findings of any program evaluations completed during that year.

GPRA calls for submission of strategic plans, performance plans, and performance reports at the agency level. This paper focuses on activities at the agency level--but, within the larger context of Federal goals for fundamental science. During the course of the "Assessment Process" (described in Appendix A), the Research Subcommittee of the Committee on Fundamental Science determined that techniques for developing plans and assessing results at the inter-agency level are in their infancy.

Although performance reports are required annually, there is an emerging understanding that annual reports regarding research in fundamental science can include achievements based on a wide mix of past activities. For example, each yearly report might include both (1) a program's progress toward annual goals that year and (2) the program's accomplishments over, for example, the twenty prior years. It should be possible to use a combination of performance reporting, program evaluation, and supplemental analyses to communicate the cumulative nature of scientific progress and the complexity of linkages over time.

The GPRA legislation and accompanying documents depict for each agency an integrated system of planning, management, and assessment that links different levels of agency activities. The annual performance reports on key agency programs would draw on information available from detailed program evaluations within the agency and supporting analyses. These annual performance reports would use measures and information that had been useful to and, indeed, used by program managers.

Performance reports. Since agency/departmental decision-makers, OMB, and members of Congress do not want to be inundated with reams of data and technical narrative, clear concise material must be prepared for the performance report for each program.

Under GPRA, a program is an activity or project listed in the Federal budget; however, GPRA gives agencies the option to aggregate or dis-aggregate activities as long as that process does not omit or minimize the significance of any major function or operation of the agency. In practice, the definition of a program seems to be evolving to include a major function or operation of an agency or a major mission-directed goal that cuts across agency components or organizations.

In light of the scarcity of data in relation to the complexity of outcomes of fundamental research, it will be necessary to draw on multiple sources and types of evidence to present a balanced picture of program accomplishments in the annual performance reports. In addition to evidence provided by retrospective merit review with peer evaluation, reports might include quantitative and qualitative indicators, descriptive indicators, examples of outstanding research accomplishments, information about more typical outcomes, material about context, information from other reviews and special assessment panels, and findings from special studies and analyses.

GPRA "alternative" approach. The framers of GPRA recognized that, in rare instances, it may not be feasible to measure the results of a Federal program quantitatively. A program of basic research is cited as such an example (U.S. Senate 1993, page 5). If an agency, in consultation with the Director of OMB, determines that it is not feasible to express performance goals for a particular program in an objective, quantifiable, measurable form, the Director of OMB may authorize an alternative approach. Even using such an alternative form, GPRA will require a clear statement of a program's goals and clear standards for identifying whether the program is successful or not. The National Science Foundation's Science and Technology Centers Program explored this alternative approach in a GPRA pilot study, described in Appendix C

Detailed program evaluations. The framers of GPRA intended that evaluation practices that enhance and improve programs would be supported strongly by each agency. In view of the breadth of Federal goals and the diversity of fundamental science activities, evaluation practices should be tailored to the mission and institutional structure of each individual agency and the particular fields of science that it pursues. One size does not fit all. Nevertheless, there are basic characteristics of effective program evaluation which should be common to all agencies:

An effective program evaluation begins with a clearly defined statement of the objectives and priorities of the program and a definition of the stakeholders and next-stage users and/or customers. It examines the structure of the program's portfolio and maintains a focus on the overall program rather than on individual units or projects. It considers the cost of the program in relation to its results.
Good program evaluation gathers systematic evidence on program performance relative to program objectives and on important potential side effects (if any) of the program in order to determine whether it is doing well or poorly. Balanced assessment of the various dimensions of program performance should draw on multiple lines of evidence (quantitative and descriptive) for drawing conclusions.
Effective program evaluation uses assessors, who have relevant scientific expertise and experience in the type of research being done, and input from stakeholders, next-stage users, and/or customers who will use or have a stake in the results of the research being done.
Useful evaluations are expressed clearly with straightforward explanations to aid the understanding of lay readers.

Appendix C provides an example of criteria and measures used in detailed program evaluation in the Department of Energy.

Also appearing in Appendix C is a discussion paper by the Research Round Table, an ad hoc group of Federal researchers and managers. The paper proposes a strategy for combining information from customer surveys and other sources in order to evaluate research performance, using a model initially formulated by the Army Research Laboratory.