Hide Graphics
  
  Powered by Google   Advanced Search
National Institutes of Health Office of Extramural Research
NIH Building 1 Back to NIH Home Page
Back to OER Home Page

Minutes of the Meeting of the Peer Review Oversight Group

July 18, 1996, 4:00-6:00 pm - July 19, 1996, 9:00 am - 4:30 pm

Conference Room 6, Building 31, NIH Campus, Bethesda, MD

In charging the Peer Review Oversight Group (PROG), Dr. Varmus emphasized the importance of peer review, and of having on this committee people representing program, review, and the scientific community. He indicated that the purpose of the committee is to address issues of review policy common to the entire NIH, rather than to focus on specific applications or study sections. Making decisions about extramural grants is one of the most important things done at NIH; therefore high quality scientific peer review is crucial. Dr. Varmus added that he did not anticipate that they would recommend changes abruptly or without experimentation, but looks forward to hearing from the committee frequently. He thanked both Dr. Baldwin for agreeing to chair the committee and the Division of Research Grants (DRG) for their willingness to be a major participant.

Dr. Baldwin presented the agenda, explaining that there were few topics in order to allow for in-depth discussion and for topic-selection from the group, and that the PROG will advise on but not manage review at NIH. She indicated that there are some issues that need not be discussed by PROG, as they have already been put in place: streamlined review, limited number of amended applications, and empowering special reviewers to assign scores -- these were relatively clear decisions which did not require or lend themselves to pilot studies or to major deliberation. PROG is not a rule- or policy-making body, but an advisory group. They can suggest pilots that the DRG or specific ICs may volunteer to pilot, and PROG may then make recommendations based on the data obtained. She emphasized that while PROG will discuss areas where peer review might be improved, scientific peer review at NIH is not a system in chaos; rather it is a model for peer review systems worldwide. In a time of reinvention and self-examination, we are seeking to make a good system better.

Possible Topics for PROG Consideration:

The first day of the meeting was a discussion of possible topics and procedures for future work by the PROG. The need for peer review to be dynamic and adaptable given changes in science was noted as a general area in which PROG could contribute. Topics mentioned included scientific progress and how that might be helped or hindered by the separation of review and program at NIH; the quality of review, and how that can be measured; similarities or differences among DRG and the Institute and Center (IC) review; how science maps to specific review groups; expertise within study sections (breadth vs. depth); how to continue to review those applications currently submitted but simultaneously create a nurturing environment for the next wave of science; how to manage review in low-volume areas of science; identification of high-risk research (including how to define it); how to combat the innate conservatism in science and in study sections (or how to collect data to determine whether it really exists); and how reviewers deal with non-scientific issues.

Some issues will inevitably be concerns of both PROG and the DRG Advisory Committee: Scientific Review Administrator (SRA) training; selection and supervision of SRAs; the roles of review and program staff; the reviewer selection and approval process; reviewer training/ retraining; travel to scientific meetings for SRAs; balance between new and senior reviewers on study sections; communication among study sections; and role and training of study section chairs.

Rating of Grant Applications:

There was discussion of the timing of any implementation of recommendations from the Rating of Grant Applications (RGA) report. Not everything can or needs to be piloted, but some pilots should be done, and some of these will need to begin before the closing date for public comment. Dr. Baldwin pointed out that the starting point for peer review is always science, and that we need to be sure that peer review is always fair and perceived as fair. She went on to state that no decision has yet been made on any of the RGA recommendations, but as NIH moves into the decision mode it seems clear that, however it is obtained, there is a need for a single score for each application. If this were produced by an algorithm, that algorithm would be made public; whatever the conclusion of the RGA deliberations, it will be communicated to the scientific community.

Dr. Baldwin summarized the responses that have been obtained from within the NIH and from the extramural scientific community to date. Some recommendations appear to be non-controversial; these include the idea of having a higher score as the better score. Also, there is an emerging consensus against standardization by reviewer. There seems to be general acceptance of using criteria, but there may be a need to add one, and there is enthusiasm expressed for a review-assigned global score.

The group discussed the criteria recommended by the RGA committee (significance, approach and feasibility) and those recommended by Dr. Yamamoto (impact, feasibility, creativity/uniqueness, and investigator/environment), and whether and how those overlapped. It was noted that these are not conceptually very different. The RGA criterion of feasibility roughly equates to Dr. Yamamoto's investigator/ environment. There was some enthusiasm for the term impact instead of significance. The RGA term significance includes originality, while Dr. Yamamoto's impact criterion does not, but he would add creativity/ uniqueness. The argument for inclusion of creativity was that without it as a separate criterion, smaller laboratories doing especially creative work would tend to lose out to large, well-established laboratories which would always score well in the areas of approach and/or environment; thus this criterion might be viewed as helping young investigators to be treated fairly. It was also noted that this criterion might provide a way to deal with emerging areas of science. The group felt that some word-crafting is needed to clearly define whatever criteria might be used, and that the criteria set would likely be a hybrid of those discussed. It was generally agreed that the criteria need to be broad enough to cover all mechanisms.

Review by criteria was thought to be a reasonable idea; group members felt having criteria would help to guide and focus reviewers' discussion, and having criteria specifically addressed in critiques/ summary statements might increase the useful information provided to those making funding decisions. This might be enhanced by asking reviewers to include a final sentence indicating how the criteria contributed to the global score, if a global score were assigned by reviewers.

The group also discussed whether these criteria should be individually scored. It was noted that contracts are scored by criterion without difficulty, but there was some concern that such a scoring approach could limit flexibility. The idea of using a non-linear letter grade scale was met with mixed enthusiasm, and there were concerns such a system would be received negatively by the community of scientific investigators, and that it would not really send a clear message. Several of the group's members felt that the clearest message would be the written critiques of the reviewers, but there was concern expressed that reviewers often are reluctant to directly and clearly state negatives, e.g. that the proposed research is boring or lacks creativity. It was unanimously agreed that clarity in critiques is crucially important, and that if some sort of score or rating is given to each criterion, the written text must be consistent with that rating.

The opinion was expressed that an overall score taps the reviewer's scientific expertise and judgement, and it is important not to tie the overall score to an algorithm as this limits the reviewers' flexibility in responding to the unique aspects of each application. With a global score, criterion weights can change project by project, and additional factors can be included. There was some discussion of the rating scale, and some brief information was provided on an experiment currently taking place in NIGMS using rounded scores, in an attempt to eliminate what is considered to be overly fine precision. Additional information on this experiment will be provided to PROG members when it becomes available, and can be considered at the next meeting.

Dr. Baldwin summarized the sense of the group as follows. There is enthusiasm for using four explicit criteria, although there are differences of opinion about the exact wording for labeling and defining these, and whether to use adjectival descriptors or letter grades to rate them. There is enthusiasm for having reviewers assign a global score; a decision is still needed on the number of points on the rating scale, and it appears that there is little if any opposition to a reversal of the scale, with the higher number representing the better score. The issue of standardizing by reviewer did not generate enthusiasm and was tabled. The committee will communicate regarding pilots to be performed, and the next (Nov. 20-21) meeting will involve making decisions on these issues and recommending what would be an optimal "change package" so that changes could be implemented for fiscal year 1998.

Other review issues were mentioned that were not specifically related to the recommendations in the RGA report: Changing the instructions for grant application preparation to include a paragraph on each of the three/four criteria, rather than or in addition to the abstract, and changing reviewer instructions such that all reviewers read that portion of all applications.

Integration of Peer Review of NIAAA, NIDA, and NIMH with DRG:

Dr. Baldwin pointed out that the integration of the National Institute on Alcohol Abuse and Alcoholism (NIAAA) grant application review with that of the DRG offers a concrete example, and may serve as a model for the other institutes, although it is not the only way that integration can be accomplished. Drs. Faye Calhoun and Ken Warren of the NIAAA and Donna Dean of DRG presented the steps which were followed in the integration process. These included a great deal of planning and open communication not only among NIH staff but also between NIH staff and the members of the scientific community, through professional association meetings and less formal mechanisms. Essentially, four study sections (two from NIAAA and two from DRG) were combined and restructured into four new study sections in DRG based on the science to be reviewed, and one study section in NIAAA merged with a Special Emphasis Panel in DRG to form a new study section. Dr. Calhoun pointed out that the process of merging and restructuring study sections within DRG is not a new process, but having it happen across DRG and an institute is new. This required cooperation and communication, which will continue now that the integration has been accomplished.

This effort involved individual research project grant (R01 and R29) and fellowship applications. Reviewers from the original review groups suggested the recasting of scientific groupings for the new study sections, and reviewers from all of the "original" review groups were used in the newly formed study sections, and experienced, highly respected former reviewers were invited as special reviewers. While the new study sections have been functioning for only one or two rounds, the integration seems to be well-received by the scientific community.

Dr. Leshner commented on the importance of the issue, and said the National Institute on Drug Abuse (NIDA) sees this as the integration of their review into the entire NIH review process, adding that their review process does not differ from that of the DRG. He stated that NIDA reviews 1200-1400 applications per year, so that the impact of even half of these on DRG will be substantial, in terms of volume and in terms of scientific overlap with existing DRG study sections. He estimated that there also would be scientific overlap with 8-10 other Institutes and Centers (ICs).

It was suggested that perhaps the next step in review integration needs to be within specific scientific areas, and Dr. Dean pointed out that biopsychology might be a reasonable area to consider since it is a fairly circumscribed scientific area and one in which DRG currently has a staff vacancy. Another suggested area is basic neuroscience, which is estimated to involve approximately 150 applications in each of NIDA and the National Institute of Mental Health (NIMH) and would involve other ICs. It was pointed out that this might be a good starting point for addressing the issue of how decisions are made as to what portions of review should be performed in ICs and what in DRG. Dr. Baldwin pointed out that this is an NIH-wide issue in which all can benefit: in the NIAAA/ DRG integration, they now have four new study sections better able to review applications. She added that DRG should not be viewed as static by the drug and mental health communities. Through these integration efforts, we should be able to develop a process through which emerging and smaller areas of science can be more easily managed within the review process, and areas of scientific overlap that could appropriately be reviewed within the DRG but are currently being reviewed within ICs may be identified. Dr. B

[OER Home | NIH Home | DHHS | FirstGov]
[Accessibility | Privacy Notice | Disclaimer | Contact Us]
[Site Search | Site Map | Document Index | Help Downloading Files]
Web Posting:  12/11/2000
Webmaster





Return to Page Top