Skip to Content

Current Bibliographies in Medicine 96-8


Unified Medical Language System® (UMLS®)


January 1986 through December 1996

280 Citations

Prepared by
Catherine R. Selden, M.L.S.
Betsy L. Humphreys, M.L.S.

U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES
Public Health Service
National Institutes of Health

National Library of Medicine
Reference Section
8600 Rockville Pike
Bethesda, Maryland 20894

1997


National Library of Medicine Cataloging in Publication

Selden, Catherine
Unified Medical Language System (UMLS): January 1986 through December 1996 : 280 citations / prepared by Catherine R. Selden, Betsy L. Humphreys. -- Bethesda, Md. (8600 Rockville Pike, Bethesda 20894) : U.S. Dept. of Health and Human Services, Public Health Service, National Institutes of Health, National Library of Medicine, Reference Section ; Pittsburgh, PA : Sold by the Supt. of Docs., U.S. G.P.O., 1997.
-- (Current bibliographies in medicine ; 96-8)

1. Unified Medical Language System - bibliography 2. Vocabulary, Controlled - bibliography 3. Natural Language Processing - bibliography I. Humphreys, Betsy L. II. National Library of Medicine (U.S.). Reference Section III. Title IV. Series

02NLM: ZW 1 N272 no. 96-8


Contents

Series Note

Foreword

Introduction

Search Strategy

Sample Citations

Bibliography

Return to title page

Series Note

Current Bibliographies in Medicine (CBM) is a continuation in part of the National Library of Medicine's Literature Search Series, which ceased in 1987 with No. 87-15. In 1989 it also subsumed the Specialized Bibliography Series. Each bibliography in the new series covers a distinct subject area of biomedicine and is intended to fulfill a current awareness function. Citations are usually derived from searching a variety of online databases. Citations are usually derived from searching a variety of online databases. NLM databases utilized include MEDLINE®, AVLINE®, BIOETHICSLINE®, CANCERLIT®, CATLINE®, HEALTHSTARtm, POPLINEtm and TOXLINE®. The only criterion for the inclusion of a particular published work is its relevance to the topic being presented; the format, ownership, or location of the material is not considered.

Comments and suggestions on this series may be addressed to:

Karen Patrias, Editor
Current Bibliographies in Medicine
Reference Section
National Library of Medicine
Bethesda, MD 20894
Phone: 301-496-6097
Fax: 301-402-1384
Internet: ref@nlm.nih.gov

This bibliography, CBM 96-8, is the last publication in this series for calendar year 1996.

Ordering Information:

Current Bibliographies in Medicine is sold by the Superintendent of Documents, U.S. Government Printing Office, P.O. 371954, Pittsburgh, PA 15250-7954. Orders for individual bibliographies in the series ($5.50, $6.88 foreign) should be sent to the Superintendent of Documents citing the title, CBM number, and the GPO List ID number.

Internet Access:

The Current Bibliographies in Medicine series is also available at no cost to anyone with Internet access through the Library's World Wide Web site at http://www.nlm.nih.gov/pubs/resources.html.

Use of funds for printing this periodical has been approved by the Director of the Office of Management and Budget through September 30, 1997.

Return to title page | Return to table of contents

FOREWORD

This bibliography marks the tenth anniversary of the National Library of Medicine's Unified Medical Language System® (UMLS®) project, a long-term research and development effort with the ambitious goal of enabling computer systems to "understand" medical meaning. The project was proposed to Congress as essential to the development of advanced health information systems -- and as requiring an initial 5-10 year development phase which would cost $1-3 million per year. The Congress responded with generous and faithful support.

In 1986, we foresaw a future with widespread access to more powerful, less expensive computers, improved telecommunications, and a huge array of diverse machine-readable biomedical information sources. In such a future, health professionals and researchers would be able to obtain information relevant to practice or research decisions when and where needed -- but only if automated systems could interpret their inquiries correctly, identify databases likely to have information relevant to these inquiries, and retrieve the pertinent information from those sources. The UMLS project set out to design and build Knowledge Sources that could be used by computer programs to overcome the barriers to effective information retrieval caused by disparities in language and by the scattering of information across many databases and systems. We understood from the beginning that designing and building the Knowledge Sources would in fact be easier and less expensive than maintaining them over time in order to reflect new biomedical discoveries and concepts. For this reason, an institution like NLM was considered to be more appropriate for directing the UMLS project than a university department operating under short-term grant support.

Ten years later our predictions regarding computers, communications, and biomedical databases have proven to be roughly accurate, and the explosive growth of the Internet and the World-Wide Web has both simplified and magnified the problems the UMLS is designed to address. NLM has issued annual editions of the UMLS Knowledge Sources since 1990 and, in 1996, sent them to more than 700 system developers around the world. Use of the UMLS Knowledge Sources allows many different computer systems, including NLM's own Internet Grateful Med, to behave as if they had at least a limited understanding of the medical meaning in words and phrases used by questioners. Interest in controlling the language used in computer-based patient records, as an aid to decision support, quality control, and research, has spread from the medical informatics community to those delivering and paying for health care. In this environment, there is increasing interest in applying the UMLS products.

A long-term project of the breadth and complexity of the UMLS requires a team with knowledge and skills from many fields. From its inception, the UMLS effort has involved internal research and development by NLM staff, competitively awarded contracts and grants for research assistance from many U.S. informatics research groups, and volunteer UMLS users worldwide, who have tested the Knowledge Sources in different environments and provided valuable suggestions for their improvement. The UMLS project is also indebted to the producers of many important biomedical vocabularies and classifications who provided their terminologies to NLM for incorporation in the UMLS Metathesaurus.

Set out below are the research groups and principal investigators who have been NLM's major partners in developing and testing the UMLS Knowledge Sources:

Literally dozens of individuals from these and other institutions have made substantial contributions to the UMLS project. We thank them all for what has been a gratifying and productive association.

The UMLS has benefited from the talent, energy, and insight of many leaders in medical informatics and medical librarianship, most of whom are well-represented in this bibliography. One scientist whose number of UMLS publications does not adequately reflect his importance to the project is Marsden Scott Blois, M.D., Ph.D. Dr. Blois was awarded one of the first UMLS research contracts in 1986; he died in 1988 just after the project's exploratory phase concluded. His seminal thinking about medical information [1,2] greatly influenced the UMLS project and the design of the UMLS Metathesaurus. His NLM colleagues and friends believe that, like us, he would have been pleased -- but not satisfied -- with the UMLS achievements of the past decade.

Donald A.B. Lindberg, M.D.
Director, National Library of Medicine

Betsy L. Humphreys, M.L.S.
Assistant Director for Health Services Research Information, National Library of Medicine

1. Blois MS. Information and medicine: the nature of medical descriptions. Berkeley: University of California Press; 1984.
2. Blois MS. Medicine and the nature of vertical reasoning. N Engl J Med 1988 Mar 31;318(13):847-51.

Return to title page | Return to table of contents

INTRODUCTION

Unified Medical Language System

In 1986, the National Library of Medicine (NLM) began a long-term research and development project to build the Unified Medical Language System (UMLS®). The purpose of the UMLS is to aid the development of systems that help health professionals and researchers retrieve and integrate electronic biomedical information from a variety of sources. The UMLS approach involves the development of machine-readable Knowledge Sources that can be used by a wide variety of applications programs to compensate for differences in the way concepts are expressed in different machine-readable sources and by different users, to identify the information sources most relevant to a user inquiry, and to negotiate the telecommunications and search procedures necessary to retrieve information from these sources. The goal is to make it easy for users to link disparate information systems, including computer-based patient records, bibliographic databases, factual databases, and expert systems.

The UMLS project is directed by a multi-disciplinary team of NLM staff and involves medical informatics research groups across the United States working under competitively awarded contracts and grants. More than 700 volunteer users receive the annual editions of the UMLS products free of charge under the terms of a license agreement. The Knowledge Sources are iteratively refined and expanded based on feedback from those applying each successive version.

There are four UMLS Knowledge Sources: the Metathesaurus®, the SPECIALISTtm Lexicon, a Semantic Network and an Information Sources Map. Most heavily used to date, the Metathesaurus provides a uniform, integrated distribution format for more than 30 biomedical vocabularies and classifications, linking many different names for the same concepts. The Lexicon contains syntactic information for many Metathesaurus terms, component words, and English words, including verbs, that do not appear in the Metathesaurus. The Semantic Network contains information about the types or categories (e.g., "Disease or Syndrome," "Virus") to which all Metathesaurus concepts have been assigned and the permissible relationships among these types (e.g., "Virus" causes "Disease or Syndrome"). The Information Sources Map or directory contains both human-readable and machine-"processable" information about the scope, location, vocabulary, syntax rules, and access conditions of biomedical databases of all kinds. The references in this bibliography cover the structure and semantics of the UMLS Knowledge Sources, their development and maintenance, and assessments of their coverage and utility for particular purposes. Some of this literature reflects understandable confusion about the relationship of the Metathesaurus to its constituent vocabularies.

The UMLS Knowledge Sources were designed as multi-purpose tools, to facilitate the development of more effective biomedical information systems. As intended, they have been applied in a wide variety of research and development environments to many different tasks, including vocabulary development, knowledge representation, clinical data capture, linking patient data to knowledge sources, curriculum analysis, natural language processing, automated indexing, and information retrieval. This bibliography covers the full range of UMLS applications.

Particularly in its early years, but also more recently, the UMLS project commissioned exploratory and ancillary studies on such topics as user information needs, methods of organizing and merging vocabulary information, and information retrieval techniques and also developed specialized tools for use in the research effort. The bibliography also includes published articles describing these efforts.

NLM staff members and outside observers have viewed the UMLS project as an important complement to other initiatives affecting access to information and information technology, including Integrated Advanced Information Management Systems (IAIMS), the High Performance Computing and Communications Program, and the emerging National Information Infrastructure. Some commentators have been critical of the UMLS's purpose and its approach. The bibliography includes discussions of the relationship of the UMLS to other programs as well as commentaries on its potential value.

The references in this bibliography include journal articles, book chapters, technical reports, dissertations, and conference papers that include substantive discussions of the UMLS project, the UMLS Knowledge Sources, UMLS applications, and related studies carried out under the auspices of the UMLS project. In general, meeting abstracts, letters, comments, and editorials are included only if they present research findings or express opinions about the project not reflected elsewhere. Although both English and foreign language publications are cited, most references are to English language publications. References are arranged by subject and appear under only one topic. Abstracts have been included if permission was granted by copyright holders. Special copyright notices appear at the end of the abstracts if requested by copyright holders.

The compilers wish to thank Marlyn Schepartz, National Library of Medicine, for her expert production and editorial assistance; Alexa T. McCray, Ph.D. for help in organizing the references; and the many UMLS researchers who reviewed an early draft of the bibliography and provided additional relevant references.

Return to title page | Return to table of contents

Search Strategy

A variety of online databases are usually searched in preparing bibliographies in the CBM series. To assist you in updating or otherwise manipulating the material in this search, the strategy used for the NLM's MEDLINE database is given below. Please note that the search strategies presented here differ from individual demand searches in that they a re generally broadly formulated and irrelevant citations edited out prior to printing.

SS1 = UNIFIED MEDICAL LANGUAGE SYSTEM OR VOCABULARY, CONTROLLED OR
NATURAL LANGUAGE PROCESSING
SS2 = (TW) UNIFIED AND MEDICAL AND LANGUAGE AND SYSTEM OR UMLS
SS3 = (TW) NATURAL AND LANGUAGE AND PROCESSING OR METATHESAURUS
SS4 = SEMANTIC@NETWORK
SS5 = KNOWLEDGE@SOURCES#
SS6 = INFORMATION@SOURCES@MAP
SS7 = 1 OR 2 OR 3 OR 4 OR 5 OR 6

GRATEFUL MED and INTERNET GRATEFUL MED

To make online searching easier and more efficient, the Library offers GRATEFUL MED, microcomputer-based software that provides a user-friendly interface to most NLM databases. This software was specifically developed for health professionals and features multiple choice menus and "fill in the blank" screens for easy search preparation. GRATEFUL MED runs on an IBM PC (or IBM-compatible) with DOS 2.0 or Windows or on a Macintosh, and requires a Hayes (or Hayes-compatible) modem. It may be purchased from the National Technical Information Service in Springfield, Virginia, for $29.95 (plus $3.00 per order for shipping). For your convenience, an order blank has been enclosed at the back of this bibliography.

INTERNET GRATEFUL MED is available from the World Wide Web. The user with Internet access and an NLM user account need only point a compatible Web browser (Netscape Navigator is strongly recommended) to http://igm.nlm.nih.gov. No other software at the user end is required.

Return to title page | Return to table of contents

Sample Citations

Citations are formatted according to the rules established for Index Medicus ®. Sample journal and monograph citations appear below. For journal articles written in a foreign language, the English translation of the title is placed in brackets; for monographs, the title is given in the original language. In both cases the language of publication is shown by a three letter abbreviation appearing at the end of the citation.

Journal Article:

Example:
Cimino JJ. Use of the Unified Medical Language System in patient care at the Columbia-Presbyterian Medical Center. Methods Inf Med 1995 Mar;34(1-2):158-64.

Order, with separating punctuation:
Authors. Article Title. Abbreviated Journal Title Date;Volume(Issue):Pages.

Book Chapter:

Example:
McCray AT. Representing biomedical knowledge in the UMLS semantic network. In: Broering NC, editor. High-performance medical libraries: advances in information management for the virtual era. Westport (CT): Meckler; 1993. p. 31-44.

Order, with separating punctuation:
Chapter Authors. Chapter Title. Book Editor. Book Title. Place of Publication: Publisher; Date. Pages.

For details of the formats used for references, see the following publication:
Patrias, Karen. National Library of Medicine recommended formats for bibliographic citation. Bethesda (MD): The Library; 1991 Apr. Available from: NTIS, Springfield, VA; PB91-182030.

Return to title page | Return to table of contents

TABLE OF CONTENTS FOR UMLS CBM

Overview and Conceptual Foundations

UMLS Knowledge Sources

UMLS Applications

Preliminary and Ancillary Studies


The UMLS in Relation to Other Programs

Commentaries and Opinions about UMLS

Return to title page

Overview and Conceptual Foundations


Hattery M. UMLS: guide and translator in the land of medical research. Inf Retr Libr Autom (US) 1992 Feb;27(9):1-6.

Humphreys BL, Lindberg DA. Building the Unified Medical Language System. Proc Annu Symp Comput Appl Med Care 1989:475-80. The National Library of Medicine's Unified Medical Language System (UMLS) project has moved from a period of background studies and exploration of alternatives to the actual construction of the first versions of important UMLS components. This paper discusses the UMLS development strategy and assumptions, describes briefly the UMLS components as currently envisioned, and then focuses on the content of the first version of the UMLS metathesaurus (Meta-1), its central vocabulary component. Copyright by and reprinted with permission of the American Medical Informatics Association.

Humphreys BL, Lindberg DA. The UMLS project: making the conceptual connection between users and the information they need. Bull Med Libr Assoc 1993 Apr;81(2):170-7. Conceptual connections between users and information sources depend on an accurate representation of the content of available information sources, an accurate representation of specific user information needs, and the ability to match the two. Establishing such connections is a principal function of medical librarians. The goal of the National Library of Medicine's Unified Medical Language System (UMLS) project is to facilitate the development of conceptual connections between users and relevant machine-readable information. The UMLS model involves a combination of three centrally developed Knowledge Sources (a Metathesaurus, a Semantic Network, and an Information Sources Map) and a variety of smart interface programs that make use of these Knowledge Sources to help users in different environments find machine-readable information relevant to their particular practice or research problems. The third experimental edition of the UMLS Knowledge Sources was issued in the fall of 1992. Current priorities for the UMLS project include developing applications that make use of the Knowledge Sources and using feedback from these applications to guide ongoing enhancement and expansion of the Knowledge Sources. Medical librarians are involved heavily in the direction of the UMLS project, in the development of the Knowledge Sources, and in their experimental application. The involvement of librarians in reviewing, testing, and providing feedback on UMLS products will increase the likelihood that the UMLS project will achieve its goal of improving access to machine-readable biomedical information. Copyright by and reprinted with permission of the Medical Library Association.

Humphreys BL, Lindberg DA. The Unified Medical Language System project: a distributed experiment in improving access to biomedical information. Medinfo 1992;7(Pt 2):1496-500. The goal of the US National Library of Medicine's UMLS project is to overcome the barriers to information access caused by the variety of ways the same biomedical concepts are expressed and by the fragmentation of useful biomedical information among disparate databases and systems. The UMLS strategy focuses on the development of new knowledge sources that can be used by a variety of intelligent programs to compensate for differences in the terminology employed by users and information sources and in database structure and content. The early versions of the UMLS Knowledge Sources are intended for use by system developers. They are available free of charge under the terms of an experimental agreement and have been distributed to 150 sites throughout the world. Priorities for annual enhancement to the UMLS components will be based on feedback received from those who are applying these new tools to a variety of information access problems.

Humphreys BL, Lindberg DA, Hole WT. Assessing and enhancing the value of the UMLS Knowledge Sources. Proc Annu Symp Comput Appl Med Care 1991:78-82. The goal of the UMLS Project is to give practitioners and researchers easy access to machine-readable information from diverse sources. Assessment of the first experimental versions of the UMLS Knowledge Sources is essential to measuring progress toward that goal and to identifying needed enhancements. As of July 30, 1991, copies of the first edition of the UMLS Knowledge Sources had been distributed to 143 individuals and institutions; 66 had provided initial feedback information. The information received indicates that the UMLS Knowledge Sources will undergo broad testing in the patient care, medical education, library service, and product development environments. Preliminary data support the hypothesis that expanded coverage of routine clinical concepts is needed. Key enhancements planned for 1992 and beyond include expanded coverage of ICD-9-CM and CPT. Copyright by and reprinted with permission of the American Medical Informatics Association.

Humphreys BL, Schuyler PL. The Unified medical language system: Moving beyond the vocabulary of bibliographic retrieval. In: Broering NC., editor. High-performance medical libraries: advances in information management for the virtual era. Westport (CT): Meckler; 1993. p. 31-44. The National Library of Medicine's (NLM) Medical Subject Headings (MeSH) is an extensive biomedical thesaurus used to index, catalog, and retrieve citations to the biomedical literature. It is one of a number of source vocabular ies for the Unified Medical Language System (UMLS), a major NLM research and development program designed to help users to retrieve and integrate information from a variety of disparate information sources. The information sources of interest include bibl iographic databases, patient records systems, factual databanks, and knowledge bases. The UMLS project has produced three new Knowledge Sources: a Metathesaurus of concepts and terms from vocabularies and classifications used in different types of biomedi cal information sources: a Semantic Network of sensible relationships among the broad semantic types or categories to which all Metathesaurus concepts are assigned; and an information Sources Map that describes the scope, content, and access conditions fo r publicly available biomedical information sources. The UMLS Knowledge Sources are intended for use by system developers and can be accessed by a variety of interface programs to interpret user inquiries, identify sources of information relevant to these queries, and retrieve the relevant information. A number of specific projects are underway to assess the usefulness of the current versions of the UMLS Knowledge Sources and to provide feedback that can guide their future development.

Lindberg DA, Humphreys BL. Computer systems that understand medical meaning. In: Scherrer JR, Cote RA, Mandil SH, editors, Computerized natural medical language processing for knowledge representation. Proceedings of the IFIP-IMIA WG6 International Wo rking Conference; 1988 Sep 12-15; Geneva, Switzerland. Amsterdam: North-Holland; 1989. p. 5-17. The National Library of Medicine has begun the development of the Unified Medical Language System (UMLS). The UMLS project is an effort to build an increasing ly intelligent automated system that understands biomedical terms and their interrelationships and uses this understanding to help users retrieve and organize information from machine-readable sources. Compared to many other efforts to build systems that understand medical meaning, the UMLS emphasizes breadth of scope, even at the sacrifice of depth of understanding. To some degree, it must attempt to encompass all of biomedicine and to provide access to many different types of automated information. This great breadth is partially offset by the defined and, in a sense, limited purpose of the UMLS project. The goal of the UMLS is to facilitate the retrieval and integration of information from a variety of machine-readable information sources, including de scriptions of the biomedical literature, clinical records, factual databanks and medical knowledge bases. The UMLS will compensate for the differences in the terminologies used in these disparate systems and for variations in the language employed by user s themselves.

Lindberg DA, Humphreys BL. Toward a unified medical language. In: EFMI - European Federation for Medical Informatics. Medical Informatics Europe '87. Proceedings of the 7th International Congress; 1987 Sep 21-25; Rome, Italy. Rome: Luigi Pozzi; 1987 . p. 23-31.

Lindberg DA, Humphreys BL. The UMLS knowledge sources: tools for building better user interfaces. Proc Annu Symp Comput Appl Med Care 1990:121-5. The current focus of the National Library of Medicine's Unified Medical Language System (UMLS) project is the development, testing, and evaluation of the first versions of three new knowledge sources: the Metathesaurus, the Semantic Network, and the Information Sources Map. These three knowledge sources can be used by interface programs to conduct an intellig ent interaction with the user and to make the conceptual link between the user's question and relevant machine readable information. NLM is providing experimental copies of the initial versions of the UMLS knowledge sources in exchange for feedback on way s they can and should be improved. The hope is that the results of such experimentation will provide both immediate improvements in biomedical information service and useful suggestions for enhancements to the UMLS. Copyright by and reprinted with permiss ion of the American Medical Informatics Association.

Lindberg DA, Humphreys BL, McCray AT. The Unified Medical Language System. Methods Inf Med 1993 Aug;32(4):281-91. In 1986, the National Library of Medicine began a long-term research and development project to build the Unified Medical Language System (UMLS). The purpose of the UMLS is to improve the ability of computer programs to understand the biomedical meaning in user inquiries and to use this understanding to retrieve and integrate relevant machine-readable information for users. Underlying the U MLS effort is the assumption that timely access to accurate and up-to-date information will improve decision making and ultimately the quality of patient care and research. The development of the UMLS is a distributed national experiment with a strong ele ment of international collaboration. The general strategy is to develop UMLS components through a series of successive approximations of the capabilities ultimately desired. Three experimental Knowledge Sources, the Metathesaurus, the Semantic Network, an d the Information Sources Map have been developed and are distributed annually to interested researchers, many of whom have tested and evaluated them in a range of applications. The UMLS project and current developments in high-speed, high-capacity intern ational networks are converging in ways that have great potential for enhancing access to biomedical information.

Paterson G. UMLS knowledge sources and Canada: an overview. Bibl Medica Can 1996 Spring;17(3):102-4.

Schuyler P. Integrated access to medical and pharmacological information: the unified medical language system at the National Library of Medicine. In: Proceedings: 1990 International Chemical Information Conference; Montreux, Switzerland: New York: Springer-Verlag; 1990. p. 195-203.

Squires SJ. Access to biomedical information: the unified medical language system. Libr Trends 1993 Summer;42(1):127-52. The National Library of Medicine (NLM) is engaged in a long-term project to develop a Unified Medical Language System (UMLS) that will retrieve and integrate information from a variety of information resources. Two UMLS components use fundamental aspects of controlled vocabulary structure and management and their relationship to information retrieval that have general interest for librarianship. The UMLS project is described, along with its initial deployment in retrieval environments. Reprinted with permission from Library Trends. Copyright 1993 The Board of Trustees of the University of Illinois.

Return to title page | Return to table of contents

UMLS Knowledge Sources


Joubert M, Miton F, Fieschi M, Robert JJ. A conceptual graphs modeling of UMLS components. Medinfo 1995;8(Pt 1):90-4. The Unified Medical Language System (UMLS) of the U.S. National Library of Medicine is a complex collection of terms, concepts, and relationships derived from standard classifications. Potential applications would benefit from a high level representation of its components. This paper proposes a conceptual representation of both the Metathesaurus and the Semantic Network of the UMLS based on conceptual graphs. It shows that the addition of a dictionary of concepts to the UMLS knowledge base allows the capability to exploit it pertinently. This dictionary defines more precisely the core concepts and adds constraints on their use. Constraints are dedicated to guide an intelligent browsing of the UMLS knowledge sources.

Lipow SS, Campbell KE, Olson NE, Tuttle MS, Erlbaum MS, Fuller LF, Sherertz DS, Nelson SJ, Cole WG. Formal properties of the metathesaurus: An update. Proc Annu Symp Comput Appl Med Care 1995:944.

McCray AT. Representing biomedical knowledge in the UMLS semantic network. In: Broering NC., editor. High-performance medical libraries: advances in information management for the virtual era. Westport (CT): Meckler; 1993. p. 45-55. The Unified Medical Language System (UMLS) Semantic Network is one of three Knowledge Sources currently available as part of the National Library of Medicine's (NLM) UMLS Project. The purpose of the Network is to provide a consistent categorization of all concepts found in the Metathesaurus and to provide useful links between these concepts at the level of the semantic types. The Semantic Network is closely tied to the other two UMLS Knowledge Sources, the Metathesaurus and the Information Sources Map. Taken together, these three Knowledge Sources provide powerful tools for enhancing biomedical information retrieval.

McCray AT. UMLS semantic network. Proc Annu Symp Comput Appl Med Care 1989:503-7. The UMLS network of semantic types is one component of NLM's evolving Unified Medical Language System. This paper discusses the role of the semantic network in the overall system, then describes the evolution and current status of the network, and finally, concludes with a discussion of plans for further development. Copyright by and reprinted with permission of the American Medical Informatics Association.

McCray AT, Hole WT. The scope and structure of the first version of the UMLS Semantic Network. Proc Annu Symp Comput Appl Med Care 1990:126-30. The authors discuss the UMLS Semantic Network, one of three UMLS knowledge sources currently under development by the National Library of Medicine. They describe the structure and content of the network, and discuss the relationship between the network and the first version of the UMLS Metathesaurus. They address the assumptions and process involved in assigning semantic types to Metathesaurus concepts and conclude with a description of the distribution format of this knowledge source. Copyright by and reprinted with permission of the American Medical Informatics Association.

McCray AT, Nelson SJ. The representation of meaning in the UMLS. Methods Inf Med 1995 Mar;34(1-2):193-201. The UMLS knowledge sources provide detailed information about biomedical naming systems and databases. The Metathesaurus contains biomedical terminology from an increasing number of biomedical thesauri, and the Semantic Network provides a structure that encompasses and unifies the thesauri that are included in the Metathesaurus. This paper addresses some fundamental principles underlying the design and development of the Metathesaurus and Semantic Network. It begins with a description of the formal properties of the semantic network. It continues with consideration of the principle of semantic locality and how this is reflected in the UMLS knowledge sources. The paper concludes with a discussion of the issues involved in attempting to reuse knowledge and the potential for reuse of the UMLS knowledge sources.

Nelson SJ, Fuller LF, Erlbaum MS, Tuttle MS, Sherertz DD, Olson NE. The semantic structure of the UMLS Metathesaurus. Proc Annu Symp Comput Appl Med Care 1992:649-53. Meta-1.1, the UMLS metathesaurus, represents medical knowledge in the forms of names of concepts and links between those concepts. The representations of the semantic neighborhood of a concept can be thought of as dimensions of the property of semantic locality and include term information (broader, narrower, or otherwise related), the co ntextual information (parent-child, siblings in a hierarchy), the semantic types, and the co-occurrence data (links discovered empirically from concepts used to index the medical literature.) The degree of redundancy of each of these dimensions was invest igated by reviewing the extent of multiple presentations of concepts which appear as related to a given concept. The degree of overlap was surprisingly small. While the co-occurrence data finds some of the links represented by other dimensions, those link s are but minute fractions of the vast amount of co-occurrence derived links. Because parent-child relationships are often subsumptive (or categorical) in nature, it might be expected that siblings usually share the same semantic types. While true in the aggregate, the wide variance in percent of types shared may reflect the intended usages of the source vocabularies. Noun phrases were extracted from the definitions of 40 concepts in Meta-1 in order to assess systematically the coverage of important conce pts by Meta-1, and to assess whether the links between these definitional concepts, which may have special value, and the concept being defined were indeed present. Out of 161 of these definitional concepts, 29 were not represented in Meta-1, and 37 of th ose represented in Meta-1 had no direct link to the concept they were defining. Copyright by and reprinted with permission of the American Medical Informatics Association.

Nelson SJ, Tuttle MS, Cole WG, Sherertz DD, Sperzel WD, Erlbaum MS, Fuller LL, Olson NE. From meaning to term: semantic locality in the UMLS Metathesaurus. Proc Annu Symp Comput Appl Med Care 1991:209-13. The Unified Medical Language System Metathesaurus represents the results of a synthesis of existing biomedical naming systems (thesauri). The naming and other information about the meanings in the Metathesaurus can be used to find the preferred naming of that meaning in the source chosen by the user, by exploiting the property of semantic locality. The aspects of semantic locality in the Metathesaurus which can be thus exploited are the terms, the semantic types, the use of that term in a source context, and the co-occurrence of terms in MEDLINE. To find how a meaning is named in the source of choice, a user must exploit one of these aspects of semantic locality, entering a term somehow related to the term being sought, and navigating to the preferred term. While the first three of these aspects of semantic locality are normative, the last is empirical. Testing of the utility of the aspects of semantic locality in information retrieval would require a uniform interface with 1, no Metathesaurus, 2, the Metathesaurus without the aspects in question, and 3, the Metathesaurus including all the aspects. Other potential uses of empirically derived semantic locality include defining or suggesting potentially relevant concepts in a given situation. Copyright by and reprinted with permission of the American Medical Informatics Association.

Schuyler PL, Hole WT, Tuttle MS, Sherertz DD. The UMLS Metathesaurus: representing different views of biomedical concepts. Bull Med Libr Assoc 1993 Apr;81(2):217-22. The UMLS Metathesaurus is a compilation of names, relationships, and associated information from a variety of biomedical naming systems representing different views of biomedical practice or research. The Metathesaurus is organized by meaning, and the fundamental unit in the Metathesaurus is the concept. Differing names for a biomedical meaning are linked in a single Metathesaurus concept. Extensive additional information describing semantic characteristics, occurrence in machine-readable information sources, and how concepts co-occur in these sources is also provided, enabling a greater comprehension of the concept in its various contexts. The Metathesaurus is not a standardized vocabulary; it is a tool for maximizing the usefulness of existing vocabularies. It serves as a knowledge source for developers of biomedical information applications and as a powerful resource for biomedical information specialists. Copyright by and reprinted with permission of the Medical Library Association.

Tuttle M, Sherertz D, Olson N, Erlbaum M, Sperzel D, Fuller L, Nelson S. Using Meta-1-the 1st version of the UMLS Metathesaurus. Proc Annu Symp Comput Appl Med Care 1990:131-5. The National Library of Medicine (NLM) is developing the Unified Medical Language System (UMLS) to provide uniform access to the world's biomedical knowledge. The foundation of the UMLS is a metathesaurus of concept names, or terms. Meta-1, the first version of the Metathesaurus, was synthesized from existing biomedical nomenclatures and classification systems, and it contains in excess of 100000 terms, including all those from MeSH and DSM, and a portion of those from SNOMED, ICD, CPT, LCSH, COSTAR and other sources. These names are arranged and labeled so as to help answer the questions, 'What is it called?' and 'Where can I find out more about it?' The first question is referred to as the naming problem, and the second as the location problem.' Meta-1 is a source of lexical diversity and semantic locality with which to address these problems in biomedicine. While the NLM will be using Meta-1 in the UMLS, non-NLM developers and users may wish to use Meta-1 to help solve their own naming and location problems. Copyright by and reprinted with permission of the American Medical Informatics Association.

Tuttle MS, Nelson SJ, Fuller LF, Sherertz DD, Erlbaum MS, Sperzel WD, Olson NE, Suarez-Munist ON. The semantic foundations of the UMLS metathesaurus. Medinfo 1992;7(Pt 2):1506-11. The United States National Library of Medicine (NLM) has issued two (annual) versions of the Unified Medical Language System (UMLS) Metathesaurus, with the third scheduled for Fall, 1992. The UMLS project is a long-term initiative intended to develop, for use by both care-givers and researchers, a uniform interface to biomedical knowledge available in electronic form. The project has been in a component evaluation phase since the release of Meta-1.0, the first version of the Metathesaurus, and an accompanying Semantic Network, in October, 1990. The Metathesaurus, so called because it is a synthesis and enhancement of existing naming and classification systems, is the central naming component of the UMLS, a place where both users and programs can retrieve the names of biomedical concepts, and information about how the names and concepts relate to one another, and how they are used in selected machine readable sources.

Tuttle MS, Olson NE, Campbell KE, Sherertz DD, Nelson SJ, Cole WG. Formal properties of the Metathesaurus. Proc Annu Symp Comput Appl Med Care 1994:145-9. The Metathesaurus is a machine-created, human edited and enhanced synthesis of authoritative biomedical terminologies. Its formal properties permit it to be a) exploited by computers, and b) modified and enhanced without compromising that usage. If further constraints were imposed on the existence and identity of Metathesaurus relationships, i.e., if every Metathesaurus concept had a genus and a differentia, then the Metathesaurus could be converted into an Aristotelian Hierarchy. In this sense, a genus is a concept that classifies another concept, and a differentia is a concept that distinguishes the classified concept from all other concepts in the same class. Since, in principle, these constraints would make the Metathesaurus easier to leverage and maintain computationally, it is interesting to ask to what degree the maintenance and enhancement procedures now in place are producing a Metathesaurus that is also an Aristotelian Hierarchy. Given a liberal interpretation of the current Metathesaurus schema, the proportion of the Metathesaurus that is Aristotelian in each annual version is increasing in spite of dramatic concurrent increases in the number of Metathesaurus concepts. Without formality there is no modifiability nor scalability. We need formal methods and computer-based tools that can help us with the task [of controlled medical vocabulary construction]. We need research in which controlled vocabulary development is the focus rather than a stepping stone for work on other theories and applications. Copyright by and reprinted with permission of the American Medical Informatics Association.

Yang Y, Chute CG. A schematic analysis of the Unified Medical Language System. Proc Annu Symp Comput Appl Med Care 1991:204-8. The UMLS is a complex collection of medical terms and relationships derived from standard classifications. Appreciating the scope and layout of these relations from text descriptions of relational schema is difficult. The graphical technique of Logical Data Structure (LDS) representation was employed to illustrate the UMLS schema as a data abstraction, affording additional insights that might otherwise escape notice. An LDS representation of the Metathesaurus offers the following advantages: 1) the separation of a viewpoint from physical data structures enables a global outline of the contents; 2) the graphical map makes the interrelation of data visible; and 3) the logical entities explicitly reflect the decision-making which was implicit or ambiguous in the relational scheme. Copyright by and reprinted with permission of the American Medical Informatics Association.

Return to title page | Return to table of contents

Sherertz DD, Olson NE, Tuttle MS, Erlbaum MS. Source inversion and matching in the UMLS Metathesaurus. Proc Annu Symp Comput Appl Med Care 1990:141-5. One of three knowledge sources being developed as NLM'S UMLS is a biomedical thesaurus, called the Metathesaurus. It contains inter-term relationships across six biomedical nomenclatures and classification systems, derivable from lexical mapping techniques. The first public version, META-1, was built in two stages: source inversion and source matching. Versions for the six sources were obtained in machine-readable form. Source specific techniques were derived empirically to analyze the information structure and content of each source. The results of each analysis were used to guide the inversion of the corresponding source, resulting in a homogeneous representation for all sources. The core concepts of META-1 come primarily from MEDLINE index terms (MeSH). Previous work on lexical mapping developed algorithmic methods to link concepts in different sources. These methods were refined iteratively, and used to implement a META-1 matching engine. The initial version of META-1 was constructed with this engine, by matching the META-1 core concepts to the other sources. This version of META-1 was edited and enhanced by domain experts, after the inclusion of supplementary information, to produce the first publicly released version of META-1. Copyright by and reprinted with permission of the American Medical Informatics Association.

Sherertz DD, Olson NE, Tuttle MS, Sperzel WD, Erlbaum MS, Fuller LF. The META-1 engine: a database methodology used in building the UMLS METATHESAURUS. Medinfo 1992;7(Pt 1):144-9. Three knowledge sources are being developed as part of the US NLM's UMLS project. The largest of these is a biomedical thesaurus, called the METATHESAURUS. META-1 and META-1.1 (or META-1*), the first two versions of the METATHESAURUS, contain term attributes and relationships across a number of biomedical nomenclatures and classification systems. Entries in META-1* result from human experts editing entries computed by an explicit database methodology. A database engine is implemented to manage the steps used to build up the entries of META-1* before editing, and to control the application of facts generated by editing the computed META-1* entries. This engine maintains and manipulates a database of facts about the entries in META-1*, and, prior to editing produces a METATHESAURUS that is probably correct, in the sense that all of the inter-term relationships are derivable solely from data within the sources. The methodology used by this engine controls the management of complexity in the METATHESAURUS, as facts change and evolve, allowing many iterations of META-1* to be computed and analyzed.

Sherertz DD, Tuttle MS, Blois MS, Erlbaum MS. Intervocabulary mapping within the UMLS: the role of lexical matching. Proc Annu Symp Comput Appl Med Care 1988:201-6. Within the NLM's UMLS Project, one challenge is mapping concepts from one information resource to another. While a complete solution to this problem requires construction of a comprehensive biomedical thesaurus, the present research provides evidence that considerable progress can be made with a straightforward lexical approach. Furthermore, such a lexical approach is the only practical way to begin construction of, and maintain, any such thesaurus. Related research has demonstrated the regularity of word usage within the context of biomedicine. This regularity suggests that mapping between biomedical information resources that have a constrained vocabulary can use lexical matching techniques with considerable success. A method has been developed to map 'phrases' from candidate sources to MeSH. In one experiment, this method attempts to map 834 disease names from the disease descriptions composed at UCSF for the UMLS. In a second experiment, the same method attempts to map disease attributes from these diseases. Copyright by and reprinted with permission of the American Medical Informatics Association.

Sherertz DD, Tuttle MS, Olson NE, Erlbaum MS, Nelson SJ. Lexical mapping in the UMLS metathesaurus. Proc Annu Symp Comput Appl Med Care 1989:494-9. A critical knowledge source being developed as part of the NLM's UMLS (National Library of Medicine's Unified Medical Language System) project is a biomedical thesaurus, called the metathesaurus. Central to the metathesaurus will be interterm relationships, across several biomedical nomenclatures and classification systems, which are derivable from lexical mapping techniques. Previous UMLS research on intervocabulary mapping elaborated these techniques. During the Fall of 1988, they were extended and used to build Meta-0, a 2000-concept demonstration metathesaurus. Meta-0 was composed primarily of the most frequently occurring MEDLINE index terms from MeSH (Medical Subject Headings), and MeSH will be the main source of concepts for Meta-1, the initial public version of the metathesaurus. Review of Meta-0 suggested several refinements to the methodology for building Meta-1. These include labeling MeSH entry terms as lexical variants or synonyms before linking them to other sources. Later work refined algorithmic methods that detect lexical variants in MeSH. Copyright by and reprinted with permission of the American Medical Informatics Association.

Sperzel D, Erlbaum M, Fuller L, Sherertz D, Olson N, Schuyler P, Hole W, Savage A, Passarelli P, Tuttle M. Editing the UMLS Metathesaurus: review and enhancement of a computer knowledge source. Proc Annu Symp Comput Appl Med Care (1990):136-40. The paper describes the editing of Meta-1, the first official version of the National Library of Medicine's (NLM) Unified Medical Language System (UMLS) Metathesaurus. After a preliminary version of Meta-1 was generated by automated techniques, it was edited by domain experts. The goal of editing was to enhance approximately 30000 Metathesaurus entries and to correct 'errors of commission' introduced by the automated techniques. Enhancements were made by assigning semantic types (such as 'Disease or Syndrome' or 'Virus') and lexical tags (such as 'eponym') to the Meta-1 entries. The production of Meta-1 required balancing the costs of human and computational resources appropriately, and it may illustrate a paradigm for the construction of large biomedical information resources. The tool-supported process of editing Meta-1, as well as some of the issues that arose during this endeavor, are presented. Despite its large scale, the Meta-1 editing task was accomplished within the specified constraints. Copyright by and reprinted with permission of the American Medical Informatics Association.

Sperzel WD, Tuttle MS. Updating the UMLS metathesaurus: A model. Proc Annu Symp Comput Appl Med Care 1989:488-93. The Unified Medical Language System (UMLS) is intended to support uniform access of machine-readable biomedical information resources. The foundation of the UMLS is a metathesaurus, which will link terms in different biomedical nomenclatures. Because the resources and nomenclatures continue to evolve, the metathesaurus must evolve with them. Thus, an important criterion for the design of the metathesaurus is the accommodation of change. A model of such accommodation is presented. A key design decision was the representation of the metathesaurus, and prospective updates, as a database of "facts." Particular emphasis is placed on database operations that use the results of internomenclature lexical matching to collapse entries from different nomenclatures into metathesaurus entries. The implementation of this model for a simplified version of the metathesaurus is described. Copyright by and reprinted with permission of the American Medical Informatics Association.

Sperzel WD, Tuttle MS, Olson NE, Erlbaum MS, Saurez-Munist O, Sherertz DD, Fuller LF. The Meta-1.2 engine: a refined strategy for linking biomedical vocabularies. Proc Annu Symp Comput Appl Med Care 1992:304-8. This paper presents a preliminary description of the database schema and associated procedures that are the foundation for the engine that will produce Meta-1.2. Meta-1.2 is the next incarnation of the Metathesaurus, which is one of the principal components of the National Library of Medicine's Unified Medical Language System (UMLS). We use the word engine as a generic term that includes a database and the programs that operate on it. While this design builds heavily upon previous work, it incorporates some major changes in philosophy. A major hypothesis is that the simple representation described here is suitable for any controlled vocabulary in the biomedical domain. Indeed, this hypothesis is central to a strategy for producing future versions of the Metathesaurus and for supporting collaboration with people who wish to contribute additional terms and relationships to the Metathesaurus. Another change involves the representation of classes and relationships. The revised database schema includes an explicit representation of the source or authority for relationships, which is analogous to the way that the sources of terms have been represented since the first version of the Metathesaurus. A sequence of steps utilizing the new representations to produce the Metathesaurus is presented. Copyright by and reprinted with permission of the American Medical Informatics Association.

Suarez-Munist ON, Tuttle MS, Olson NE, Erlbaum MS, Sherertz DD, Lipow SS, Cole WG, Keck KD, Davis AN, Hole WT, et al. MEME II supports the cooperative management of terminology. Proc AMIA Fall Symp 1996:84-8. Health care enterprises need enterprise-wide terminologies to compare, reuse and repurpose health care descriptions. But once they are created, these terminologies need to be maintained and enhanced to sustain their utility and that of the descriptions encoded with them. MEME II (Metathesaurus Enhancement and Maintenance Environment, Version II) supports the required activities and enables enterprises to leverage their investment in terminology and descriptions by permitting remote -- extra-enterprise -- enhancements to terminology to be incorporated locally, and local -- intra-enterprise -- enhancements to be shared remotely. MEME II represents all changes to terminologies as data, or "actions" that can be interpreted by an "action engine." These actions, or messages, represent semantic "units of work" that can be interpreted by other copies of MEME II. The exchange of update messages increases the likelihood that "the comparability of terminology-based health care descriptions can be sustained. Copyright by and reprinted with permission of the American Medical Informatics Association.

Tuttle MS, Blois MS, Erlbaum MS, Sherertz DD, Nelson SJ. Toward a bio-medical thesaurus: Building the foundation of the UMLS. Proc Annu Symp Comput Appl Med Care 1988:191-5. The Unified Medical Language System (UMLS) is being designed to provide a uniform user interface to heterogeneous machine-readable biomedical information resources, such as bibliographic databases, genetic databases, expert systems and patient records. Such an interface will have to recognize different ways of saying the same thing and provide links to related ways of saying things. One way to represent the necessary associations is by using a domain thesaurus. As no such thesaurus exists, and because, once built, it will be both sizable and in need of continuous maintenance, its design should include a methodology for building and maintaining it. A methodology utilizing lexically expanded schema inversion and a design, called T. Lex, is proposed, which forms an approach to the problem of defining and building a biomedical thesaurus. It is argued that the semantic locality implicit in such a thesaurus will support model-based reasoning in biomedicine. Copyright by and reprinted with permission of the American Medical Informatics Association.

Tuttle MS, Sherertz DD, Erlbaum MS, Olson NE, Nelson SJ. Implementing meta-1: The first version of the UMLS metathesaurus. Proc Annu Symp Comput Appl Med Care 1989:493-7. The Unified Medical Language System (UMLS) is being designed to provide uniform access to computer-based resources in biomedicine. For the foreseeable future, the foundation of the UMLS will be a metathesaurus of concepts, synthesized from existing sources, including MeSH, SNOMED, ICD-9-CM, CPT-4, DSM-III, and other biomedical nomenclatures and classification systems. In Meta-1, the first version of the metathesaurus, the synthesis is being implemented using a three-part methodology: concept names (terms) and intrasource relationships, such as synonymy, have been extracted from each source and converted to a homogeneous representation, intersource lexical matches have been used to combine terms from different sources into metathesaurus entries; and some 30000 of these entries, those containing MeSH terms and a selected sample of terms from other domains, will be reviewed by humans, enhanced, and modified, as appropriate. This methodology must eventually support incremental development and an audit trail, and it must preserve relationships added during human review. The 30,000 Meta-1 entries will contain in excess of 60,000 biomedical terms, and these terms will participate in more than 100,000 thesaurus relationships. These "normative" relationships will be supplemented by "empirical" relationships computed from certain UMLS resources. The first of the empirical relationships will be counts of the occurrence and co-occurrence of Meta-1 concepts in MEDLINE. Copyright by and reprinted with permission of the American Medical Informatics Association.

Tuttle MS, Sherertz DD, Erlbaum MS, Sperzel WD, Fuller LF, Olson NE, Nelson SJ, Cimino JJ, Chute CG. Adding your terms and relationships to the UMLS Metathesaurus. Proc Annu Symp Comput Appl Med Care 1991:219-23. The National Library of Medicine's Unified Medical Language System Metathesaurus contains the richest single corpus of biomedical names in existence. Yet, developers wishing to make use of the Metathesaurus will be confronted by users who want to add local terminology and relationships not already represented there. We urge developers to fill those needs, while, at the same time, they plan for the many consequences of unilateral Metathesaurus enhancement. Foremost among these consequences is the need to maintain local enhancements across subsequent releases of the Metathesaurus. These problems are illustrated via examples of candidate Metathesaurus enhancement terms in use at the Columbia-Presbyterian Medical Center (CPMC), at the Mayo Clinic, and in Current Disease Descriptions (CDD). Sharing and reuse of Metathesaurus enhancement methods may permit local enhancements to be used at other sites, and it may permit the global Metathesaurus utilization effort to benefit from economies of scale. Copyright by and reprinted with permission of the American Medical Informatics Association.

Tuttle MS, Suarez-Munist ON, Olson NE, Sherertz DD, Sperzel WD, Erlbaum MS, Fuller LF, Hole WT, Nelson SJ, Cole WG, et al. Merging terminologies. Medinfo 1995;8(Pt 1):162-6. A terminology is a systematic, authoritative collection of concept names, or terms, in some domain. No single terminology names all th0e important concepts in biomedicine. One approach to creating a more comprehensive biomedical terminology is to merge existing biomedical terminologies, as the UMLS Metathesaurus has done for the last six years. Because existing terminologies may overlap--for example, one terminology may name a concept also named by another terminology--the terminologies in the Metathesaurus must be merged. Some terminologies suggest merges through their structure or content e.g., they suggest synonyms or connections to other terminologies; other merges can be suggested by algorithm. Regardless, all merges in the Metathesaurus must be approved by a human editor with appropriate domain knowledge. By the time Meta-'96 is released early in 1996, one prototype and seven released versions of the Metathesaurus will have been produced by a sequence of four qualitatively different methods, named for the way in which they merge terms: #1 Term Rewrite Rules, #2 Transitive Closure on Facts, #3 Fact-at-a-Time Concept Merging, and #4 Action-at-a-Time Object Processing. The development of each method has been constrained by the annual Metathesaurus release schedule. The first two methods made the best use of limited computational resources, and the last two make better use of human editing resources.

Return to title page | Return to table of contents

Bishop CW, Ewing PD. Transferring knowledge from one system to another. Proc Annu Symp Comput Appl Med Care 1994:967. Although knowledge is contained in many systems, moving it from one system to another is not an easy task because each system is tailored in its own unique way and because knowledge configurations are usually copyrighted. To populate our FRAMEMED knowledge base we turned to the NLM Metathesaurus as a readily-available open source of knowledge. We were disappointed by the greatly variable granularity of the concepts and the lack of definitions that could be borrowed. Some reference books in electronic form seem attractive but reformatting will require excessive human intervention and copyright negotiation. Copyright by and reprinted with permission of the American Medical Informatics Association.

Campbell JR, Kallenberg GA, Sherrick RC. The clinical utility of META: an analysis for hypertension. Proc Annu Symp Comput Appl Med Care 1992:397-401. To evaluate the clinical completeness of the National Library of Medicine Metathesaurus(META), we coded the conceptual information found in 2000 problem oriented (SOAP) notes for hypertension from one COSTAR site. To minimize the effects of practice idiosyncracy, we analyzed an additional 500 notes from a second, geographically remote site. Concepts occurring at either site numbered 1337. We classified concepts occurring at both sites as core concepts and these numbered 121. We attempted to find a matching concept of the proper semantic type in META for each of the items. All matching was done by program with a manual review by a physician. The overall success rate for matching was: [table: see text]. We observed the greatest frequency of unmatched concepts in physical examination, medications, symptoms, personal behavior, non-medical therapies and counselling. We conclude that the current release of META is not sufficiently rich to describe the process of care in the ambulatory management of hypertension. However, the construction and breadth of the current scheme holds promise for medical knowledge representation and translation. Copyright by and reprinted with permission of the American Medical Informatics Association.

Campbell JR, Payne TH. A comparison of four schemes for codification of problem lists. Proc Annu Symp Comput Appl Med Care 1994:201-5. We set out to evaluate the completeness of four major coding schemes in representation of the patient problem list: the Unified Medical Language System (UMLS, 4th edition), the Systematized Nomenclature of Medicine (SNOMED International), the Read coding system (version 2), and the International Classification of Diseases (9th Clinical Modification)(ICD-9-CM). We gathered 400 problems from patient records at primary care sites in Omaha and Seattle. Matching these against the best description found in each of the coding schemes, we asked five medical faculty reviewers to rate the matches on a five-point Likert scale assessing their satisfaction with the results. For the four schemes, we computed the following rates of dissatisfaction, satisfaction, and average scores: [table: see text]. From this analysis, we conclude that UMLS and SNOMED performed substantially better in capturing the clinical content of the problem lists than READ or ICD-9-CM. No scheme could be considered comprehensive. Depending on the goal of systems developers, UMLS and SNOMED may offer different, and complementary, advantages. Copyright by and reprinted with permission of the American Medical Informatics Association.

Chute CG, Cohn SP, Campbell KE, Oliver DE, Campbell JR. The content coverage of clinical classifications. For The Computer-Based Patient Record Institute's Work Group on Codes & Structures. J Am Med Inform Assoc 1996 May-Jun;3(3):224-33. BACKGROUND AND OBJECTIVE: Patient conditions and events are the core of patient record content. Computer-based records will require standard vocabularies to represent these data consistently, thereby facilitating clinical decision support, research, and efficient care delivery. To address whether existing major coding systems can serve this function, the authors evaluated major clinical classifications for their content coverage. METHODS: Clinical text from four medical centers was sampled from inpatient and outpatient settings. The resultant corpus of 14,247 words was parsed into 3,061 distinct concepts. These concepts were grouped into Diagnoses, Modifiers, Findings, Treatments and Procedures, and Other. Each concept was coded into ICD-9-CM, ICD-10, CPT, SNOMED III, Read V2, UMLS 1.3, and NANDA; a secondary reviewer ensured consistency. While coding, the information was scored: 0 = no match, 1 = fair match, 2 = complete match. RESULTS: ICD-9-CM had an overall mean score of 0.77 out of 2; its highest subscore was 1.61 for Diagnoses. ICD-10 scored 1.60 for Diagnoses, and 0.62 overall. The overall score of ICD-9-CM augmented by CPT was not materially improved at 0.82. The SNOMED International system demonstrated the highest score in every category, including Diagnoses (1.90), and had an overall score of 1.74. CONCLUSION: No classification captured all concepts, although SNOMED did notably the most complete job. The systems in major use in the United States, ICD-9-CM and CPT, fail to capture substantial clinical content. ICD-10 does not perform better than ICD-9-CM. The major clinical classifications in use today incompletely cover the clinical content of patient records; thus analytic conclusions that depend on these systems may be suspect. Copyright by and reprinted with permission of the American Medical Informatics Association.

Chute CG, YangY, Tuttle MS, Sherertz DD, Olson NE, Erlbaum MS. A preliminary evaluation of the UMLS Metathesaurus for patient record classification. Proc Annu Symp Comput Appl Med Care 1990:161-5. The UMLS project seeks to provide a unified interface to biomedical knowledge resources. Patient medical records are an enormous repository of clinical intervention and outcome, and are drawing increasing attention in the pursuit of quality assurance, outcomes research, and epidemiologic analysis. The authors sought to evaluate an unedited version of the preliminary UMLS Metathesaurus, Meta-1, for the automated coding of medical diagnosis and surgical procedures. Identical evaluations were undertaken using SNOMED and the Mayo Clinic indexing lexicon. Meta-1 performed comparably to the comparison clinical indexing system, although all systems exhibited problems associated with clinical attribute levels and modifier combinations. Copyright by and reprinted with permission of the American Medical Informatics Association.

Cimino JJ. Representation of clinical laboratory terminology in the Unified Medical Language System. Proc Annu Symp Comput Appl Med Care 1991:199-203. The Unified Medical Language System (UMLS) was examined to determine its coverage of clinical laboratory terminology in use at the Columbia-Presbyterian Medical Center (CPMC). The Metathesaurus (Meta-1) contains exact matches for 30% of 1460 CPMC laboratory terms and near matches for an additional 42%, with better coverage of atomic-level concepts (substance terms) than complex ones (tests and panels). The Semantic Network includes types for representing laboratory procedures (2), measured substances (at least 56) and sampled substances (at least 14), but no type to represent specimens. Few of the UMLS semantic relationships are applicable to the CPMC vocabulary. These results have implications for the utility of the UMLS for linking clinical databases to electronic medical information sources. Copyright by and reprinted with permission of the American Medical Informatics Association.

Evans DA. The language of medicine and the modeling of information. In: Evans DA, Patel VL , editors. Advanced models of cognition for medical training and practice; Berlin: Springer-Verlag; 1992. p. 43-67.

Friedman C. The UMLS coverage of clinical radiology. Proc Annu Symp Comput Appl Med Care 1992:309-13. The informational content of clinical radiology reports was examined to determine the coverage of the Unified Medical Language System (UMLS) in relation to the terminology used by physicians in the Radiology Department of Columbia Presbyterian Medical Center (CPMC). The UMLS semantic network contained 17 semantic types which were compatible with the types of clinical information in the reports. The type of semantic categories missing from the UMLS consisted mainly of modifier information relating to certainty, degree, and change type of information. This type of information formed a substantial part of the domain. Although most of the informational categories were found in the UMLS semantic network, most of the domain terms were not. Our results strongly suggest that the UMLS could be a significant tool for developing clinical text processing applications if it were extended to cover clinical domains. Copyright by and reprinted with permission of the American Medical Informatics Association.

Fuller L, Hole W, Olson N, Schuyler P, Tuttle M. Drug and chemical entries in Meta-1. Proc Annu Symp Comput Appl Med Care 1990:146-50. Thirty-two percent of the concepts in the National Library of Medicine's UMLS Metathesaurus, Meta-1, are the names of biomedically important chemicals. The paper describes the origin of the chemical terms included in Meta-1, the structure and information content of these records, and the potential uses of these concepts to access chemical information in biomedical databases. Data is also presented quantitatively describing the subjects of the articles in the biomedical literature and the frequency with which articles include chemical subjects. Copyright by and reprinted with permission of the American Medical Informatics Association.

Huff SM, Warner HR. A comparison of Meta-1 and HELP terms: implications for clinical data. Proc Annu Symp Comput Appl Med Care 1990:166-9. Terms from the HELP System's vocabulary were matched with Meta-1 terms on a word by word basis as well as on a phrase by phrase basis with the goal of exploring what steps might need to be taken if some future version (Meta-N) of the UMLS Metathesaurus were to be used to represent clinical data. Word by word matching revealed that 54% of HELP words were present in Meta-1, while 8% of HELP phrases had a corresponding phrase. The words that did not match in HELP were mostly adjectives and adverbs after taking into account misspellings and abbreviations. Phrase matches were low because of the inclusion of adjectives and adverbs in HELP clinical terms. If some future version of the Metathesaurus is to be used for representation of clinical data additional terms are needed as well as a grammar that permits construction of clinical phrases that include modifiers and time references. Copyright by and reprinted with permission of the American Medical Informatics Association.

Lange LL. Representation of everyday clinical nursing language in UMLS and SNOMED. Proc AMIA Fall Symp 1996:140-4. Everyday clinical nursing language is informal and idiosyncratic. Whether the everyday language of nurses can be represented by standardized vocabulary systems, such as the UMLS and SNOMED, was the focus of the study. Computer systems that allow clinicians to pick terms that are familiar are likely to be better accepted and thus more effective than systems that impose formal terminologies on users. Nursing phrases were extracted from handwritten shift notes, reduced to atomic-level terms, and matched to UMLS and SNOMED. Exact matches were obtained for 56% of terms in UMLS and 49% in SNOMED. Fifty-nine semantic types and 24 different source vocabularies were represented by the terms. Nursing vocabularies were represented by only 5% of source vocabulary citations. Copyright by and reprinted with permission of the American Medical Informatics Association.

Lindberg CH. A comparison of the language of patient charts at Presbyterian University Hospital, Pittsburgh, PA., and the Unified Medical Language System of the National Library of Medicine (online medical records) [dissertation]. Pittsburgh (PA): University of Pittsburgh; 1994. 176 p. Available from: University Microforms, Ann Arbor, MI; 9426718. This dissertation matches the language of online medical records at the Presbyterian University Hospital in Pittsburgh with the language of the Unified Medical Language System (UMLS), an experimental thesaurus of biomedicine developed by the National Library of Medicine. In the future, the UMLS may serve as an interface between medical charts and the information in databases such as Medline. The main research question is whether the UMLS captures the concepts found in medical charts and can thus be used to bridge the gap between chart terminology and MeSH, the vocabulary used to access Medline. A sample of 50 records from two diagnoses was compared to the UMLS. A statistical formula selected potentially useful non-matches, which were then analyzed to determine whether the missing concepts are wholly or partially present in another form in MeSH. Seventy percent of the missing terms are either broader terms than MeSH headings or synonyms of chart terms/concepts. Approximately 19% of the dropped terms could not form a strong map to MeSH. The study indicates that, since so many chart terms are found in some form, a chart-generated search can link to many related concepts in MeSH, but a significant number of chart concepts must be added to the UMLS or cross referenced to MeSH. Provided by UMI.

Moving toward international standards in primary care informatics: clinical vocabulary. Conference Summary Report. 1995 Nov 1-2; New Orleans. Rockville (MD): Agency for Health Care Policy and Research; 1996 Oct. 35 p. (AHCPR pub. no. 96-0069) Available also from http://www.ahcpr.gov/research/pcinform/.

Mullins HC, Scanland PM, Collins D, Treece L, Petruzzi P Jr., Goodson A, Dickinson M. The efficacy of SNOMED, Read codes, and UMLS in coding ambulatory family practice clinical records. Proc AMIA Fall Symp 1996:135-9. This study was initially developed as a traditional quantitative study to determine the level of match of identified clinical terms in three (3) clinical vocabularies. To address concerns raised by a review of the literature and our own experience, a supplemental study to collect qualitative data was added. Dictated progress notes from a stratified sample of patient visits over a period of four (4) years were used to obtain a representative sample of terms. A total of 144 progress notes were selected taking into consideration the usual demographics plus additional variables. From the 144 clinical notes, 864 terms were extracted and evaluated by level of match. The within-term effect was highly significant (F=58.69, p<-.001), indicating significant differences in the mean level of match for the three coding systems. Qualitative findings suggest that this and other published studies may not answer questions about the "efficacy of available clinical vocabularies in coding ambulatory family practice clinical records", and additional studies are needed which must be carefully structured and utilize a standardized procedure. Copyright by and reprinted with permission of the American Medical Informatics Association.

O'Keefe KM, Sievert M, Mitchell JA. Mendelian inheritance in man: diagnoses in the UMLS. Proc Annu Symp Comput Appl Med Care 1993:735-9. Because they deal with many distinct but rare inheritance diseases, geneticists have difficulty translating from their codes to other biomedical coding schemes. The objective of this research was to investigate the potential uses and difficulties of using the UMLS Metathesaurus for genetic diagnoses and to make recommendations to UMLS developers for improvements in UMLS for common genetic disorders. The 110 most common Mendelian Inheritance in Man disorders from the Missouri Genetic Disease Program over the period of one year were translated into MeSH, ICD and SNOMED. The more common diseases are more likely to be mapped than the rarer ones. Diseases with a proven genetic inheritance pattern are more likely to be mapped than those with speculated inheritance patterns. Approximately one third of all diagnoses were not mapped across all three coding schemes in Meta-1.2. The ICD coding scheme was found to be too broad to be meaningful for genetic diagnosis or epidemiological purposes. MeSH and SNOMED need to be made more specific and complete, and all of the new version of SNOMED needs to be included in the Metathesaurus. Copyright by and reprinted with permission of the American Medical Informatics Association.

Payne TH, Martin DR. How useful is the UMLS metathesaurus in developing a controlled vocabulary for an automated problem list? Proc Annu Symp Comput Appl Med Care 1993:705-9. We are developing a set of problem list phrases to be used in the automated problem list of a prototype clinical computing system. Because of the large number of terms in the Unified Medical Language System (UMLS) and the links between them, we are experimenting with the use of the UMLS as the foundation for our problem list phrase set. We have found the UMLS to be very useful for this project, but that it lacks many phrases clinicians wish to include in the problem list. Internal linkages between phrases provided in the UMLS are not well suited to our needs. We plan to continue our use of the UMLS but to add problem list phrases and linkages between phrases to support browsing and decision support applications. Copyright by and reprinted with permission of the American Medical Informatics Association.

Rosenberg KM, Coultas DB. Acceptability of Unified Medical Language System terms as substitute for natural language general medicine clinic diagnoses. Proc Annu Symp Comput Appl Med Care 1994:193-7. The acceptability of using the Unified Medical Language System (UMLS) concept phrases to substitute for physicians' diagnosis statements was investigated. Physician diagnosis statements recorded in the University of New Mexico's General Medicine Clinic were input into a computer program that automatically finds the best matching UMLS concept phrases. The computer program written in C++ integrates UMLS searching and browsing with a graphical user interface. Five attending physicians in the Department of Internal Medicine rated the acceptability of the UMLS concept phrase as a substitute for the original physician statement. One hundred and ninety-five patients' notes were examined with 447 diagnosis statements recorded of which 271 statements were unique. Attending physicians rated their satisfaction with the automated UMLS substitutes on a scale of 1 (extremely dissatisfied) to 5 (extremely satisfied). Intrarater (mean 0.94) and interrater correlations (mean 0.75) were high. The mean rating was 4.0 (quite satisfied). Most (73%) of the substitution were satisfactory (rating of 4 or 5), 16% were neutral (rating of 3), and 21% were unsatisfactory (rating of 1 or 2). A review of the substitutions showed a frequent lack of clinical modifier terms in UMLS as has been previously described. Comparison to a previous study shows the broader term coverage of UMLS to be a more acceptable source of diagnosis codes than using International Classification of Diseases revision 9 alone. These results suggest that UMLS can be an effective tool for coding unconstrained physician diagnoses. Copyright by and reprinted with permission of the American Medical Informatics Association.

Sato L, McClure RC, Rouse RL, Schatz CA, Greenes RA. Enhancing the Metathesaurus with clinically relevant concepts: anatomic representations. Proc Annu Symp Comput Appl Med Care 1992:388-91. To create a comprehensive taxonomy for medical concepts it is necessary to identify gaps and reconcile differences that exist between clinical, bibliographic, and other source vocabularies. As part of the Unified Medical Language System project, we have proposed enhancements to the Metathesaurus by the inclusion of terms from two source vocabularies with different unique perspectives or views. This process has disclosed a number of issues that arise as complexity increases. These issues must be resolved if the resultant Metathesaurus is to support the variety of uses for which it is intended. Copyright by and reprinted with permission of the American Medical Informatics Association.

Warren JJ, Campbell JR, Palandri MK, Stoupa RA. Analysis of three coding schemes: can they capture nursing care plan concepts? Proc Annu Symp Comput Appl Med Care 1994:962.

Zielstorff RD, Cimino C, Barnett GO, Hassan L, Blewett DR. Representation of nursing terminology in the UMLS Metathesaurus: a pilot study. Proc Annu Symp Comput Appl Med Care 1992:392-6. To see whether the National Library of Medicine's Metathesaurus (tm) includes terminology relevant to clinical nursing practice, two widely used nursing vocabularies were matched against the Meta. The two nursing vocabularies are 1) the North American Nursing Diagnosis List of Approved Diagnoses; and 2) the Omaha System, a vocabulary of problems and interventions developed by the Omaha Visiting Nurses Association. First, the terms were scanned against Meta in their native form, with phrases and combinations intact. This produced a relatively low percentage of exact matches (12%). Next, the terms were separated into core concepts and modifiers and the analysis was repeated. The percentage of exact matches to terms in Meta increased to 32%. However, the semantic types of the split terms often were not equivalent to the semantic types of the phrases from which the split terms were derived; also, in some cases, terms returned as exact matches had different meanings in Meta. Automatic scanning for lexical matches is a helpful first step in searching for vocabulary representation in Meta, but term-by-term search for context, semantic type and definition is essential. However, it seems clear that representation of nursing terminology in the Metathesaurus needs to be expanded. Copyright by and reprinted with permission of the American Medical Informatics Association.

Return to title page | Return to table of contents

Masys DR. An evaluation of the source selection elements of the prototype UMLS Information Sources Map. Proc Annu Symp Comput Appl Med Care 1992:295-8. The Information Sources Map (ISM) is a component of the National Library of Medicine's Unified Medical Language System (UMLS) project. The ISM is intended to provide both human-readable and machine-interpretable information about the content, scope, and access conditions for various information sources such as databases, expert systems, and the organizations which make these information sources available. Automated source selection is supported by three types of indexing in the ISM: Medical Subject Heading (MeSH) terms and subheadings; Semantic Types from the UMLS Semantic Network; and Semantic Type Relations, which depict pairs of semantic types joined by a relationship chosen from the Semantic Network. This paper reports a study of the recall and precision of the source selection elements in the prototype version of the ISM. Copyright by and reprinted with permission of the American Medical Informatics Association.

Masys DR, Humphreys BL. Structure and function of the UMLS information sources map. Medinfo 1992;7(Pt 2):1518-21. A health professional seeking information which is represented in computerized format faces formidable difficulties in determining which information sources may be relevant to a particular question, and in gaining access to that information. The first version of a new UMLS data file called the Information sources map (ISM) has been created to address this problem. The ISM is intended to provide both human-readable and machine-interpretable information about the content, scope, and access conditions for various information sources such as databases, expert systems, and the organizations which make these information sources available. The first prototype of the ISM features indexing elements which support automated source selection; subsequent versions will encode the logic necessary to connect to and retrieve information from the sources represented in the ISM file.

Miller PL, Clyman JI, Frawley SJ, Paton JA, Powsner SM, Roderer N, Shifman MA. NetMenu and a prototype UMLS information sources map. Proc Annu Symp Comput Appl Med Care 1993:957. This paper describes NetMenu and an Information Sources Map (ISM), two tools under development to help the biomedical user find out about a range of network-based information sources, and to connect automatically to a chosen source. The prototype ISM is developed as part of the Unified Medical Language System (UMLS) project of the National Library of Medicine. Both NetMenu and the ISM are operational at Yale University School Medicine and are available for use elsewhere. Copyright by and reprinted with permission of the American Medical Informatics Association.

Miller PL, Frawley SJ, Wright L, Roderer NK, Powsner SM. Lessons learned from a pilot implementation of the UMLS information sources map. J Am Med Inform Assoc 1995 Mar-Apr;2(2):102-15. OBJECTIVE: To explore the software design issues involved in implementing an operational information sources map (ISM) knowledge base (KB) and system of navigational tools that can help medical users access network-based information sources relevant to a biomedical question. DESIGN: A pilot biomedical ISM KB and associated client-server software (ISM/Explorer) have been developed to help students, clinicians, researchers, and staff access network-based information sources, as part of the National Library of Medicine's (NLM) multi-institutional Unified Medical Language System (UMLS) project. The system allows the user to specify and constrain a search for a biomedical question of interest. The system then returns a list of sources matching the search. At this point the user may request 1) further information about a source, 2) that the list of sources be regrouped by different criteria to allow the user to get a better overall appreciation of the set of retrieved sources as a whole, or 3) automatic connection to a source. RESULTS: The pilot system operates in client-server mode and currently contains coded information for 121 sources. It is in routine use from approximately 40 workstations at the Yale School of Medicine. The lessons that have been learned are that: 1) it is important to make access to different versions of a source as seamless as possible, 2) achieving seamless, cross-platform access to heterogeneous sources is difficult, 3) significant differences exist between coding the subject content of an electronic information resource versus that of an article or a book, 4) customizing the ISM to multiple institutions entails significant complexities, and 5) there are many design trade-offs between specifying searches and viewing sets of retrieved sources that must be taken into consideration. CONCLUSION: An ISMKB and navigational tools have been constructed. In the process, much has been learned about the complexities of development and evaluation in this new environment, which are different from those for Gopher, wide area information servers (WAIS), World-Wide-Web (WWW), and MOSAIC resources. Copyright by and reprinted with permission of the American Medical Informatics Association.

Miller PL, Paton JA, Clyman JI, Powsner SM. Prototyping an institutional IAIMS/UMLS information environment for an academic medical center. Bull Med Libr Assoc 1992 Jul;80(3):281-7. The paper describes a prototype information environment designed to link network-based information resources in an integrated fashion and thus enhance the information capabilities of an academic medical center. The prototype was implemented on a single Macintosh computer to permit exploration of the overall information architecture and to demonstrate the various desired capabilities prior to full-scale network-based implementation. At the heart of the prototype are two components: a diverse set of information resources available over an institutional computer network and an information sources map designed to assist users in finding and accessing information resources relevant to their needs. The paper describes these and other components of the prototype and presents a scenario illustrating its use. The prototype illustrates the link between the goals of two National Library of Medicine initiatives, the Integrated Academic Information Management System (IAIMS) and the Unified Medical Language System (UMLS). Copyright by and reprinted with permission of the Medical Library Association.

Miller PL, Wright LW, Frawley SJ, Clyman JI, Powsner SM. Selecting relevant information resources in a network-based environment: the UMLS information sources map. Medinfo 1992;7(Pt 2):1512-7. The paper describes experience in building a prototype information sources map (ISM) as part of the Unified Medical language System (UMLS) project of the National Library of Medicine. A test set of 112 representative medically-related information sources was compiled. The purpose was to explore what type of coded information should be included in the ISM to help select sources relevant to a user query. For each ISM entry (describing a particular information source), two general types of coded information were included. (1) Coding using MeSH terms and Meta-1 semantic types was used to characterize the source's subject content. (2) Additional coded information related to the source's use. The paper discusses experience in coding the information sources to create the prototype ISM, and describes a study to assess the utility of the different coded information in selecting sources relevant to a user query.

Shifman MA, Clyman JI, Paton JA, Powsner SM, Roderer NK, Miller PL. NetMenu. Experience in the implementation of an institutional menu of information sources. Proc Annu Symp Comput Appl Med Care 1993:554-8. NetMenu is a program developed at Yale University, which enables a straightforward access to online information systems. NetMenu has been deployed in several diverse settings within the medical center. In the hospital, NetMenu functions as a front-end for the clinical workstation, providing access to the hospital information system, the clinical laboratory computer, a drug database and several bibliographic databases. The medical libraries utilize NetMenu for both medical education workstations and for scholarly information workstations. This paper describes the initial experience in the implementation, support, and maintenance of NetMenu as an institutional menu of information sources. Copyright by and reprinted with permission of the American Medical Informatics Association.

Silverstein SM, Miller PL, Cullen MR. An information sources map for Occupational and Environmental Medicine: guidance to network-based information through domain-specific indexing. Proc Annu Symp Comput Appl Med Care 1993:616-20. This paper describes a prototype information sources map (ISM), an on-line information source finder, for Occupational and Environmental Medicine (OEM). The OEM ISM was built as part of the Unified Medical Language System (UMLS) project of the National Library of Medicine. It allows a user to identify sources of on-line information appropriate to a specific OEM question, and connect to the sources. In the OEM ISM we explore a domain-specific method of indexing information source contents, and also a domain-specific user interface. The indexing represents a domain expert's opinion of the specificity of an information source in helping to answer specific types of domain questions. For each information source, an index field represents whether a source might provide useful information in an occupational, industrial, or environmental category. Additional fields represent the degree of specificity of a source in individual question types in each category. The paper discusses the development, design, and implementation of the prototype OEM ISM. Copyright by and reprinted with permission of the American Medical Informatics Association.

Return to title page | Return to table of contents

McCray AT, Divita G. ASN.1: defining a grammar for the UMLS knowledge sources. Proc Annu Symp Comput Appl Med Care 1995:868-72. The Unified Medical Language System (UMLS) project provides resources on an experimental basis to the research community. In 1995 the four UMLS Knowledge Sources have been provided in an additional data format, Abstract Syntax Notation One (ASN.1). The benefits of ASN.1 are that it provides a standard, formal grammar for complex data and allows exchange of that data in a way which is independent of the particular software and hardware environment in which the data are created and stored. The paper begins with an introduction to the ASN.1 standard itself. It continues with a discussion of the ASN.1 implementation of the UMLS Knowledge Sources and some of the consequences for the newly released UMLS Knowledge Source Server. It concludes with a discussion of some of the benefits of using ASN.1 encoded data. Copyright by and reprinted with permission of the American Medical Informatics Association.

McCray AT, Razi A. The UMLS Knowledge Source server. Medinfo 1995;8(Pt 1):144-7. The UMLS Knowledge Source server is an evolving tool for accessing information stored in the UMLS Knowledge Sources. The system architecture is based on the client-server paradigm wherein remote site users send their requests to a centrally managed server at the U.S. National Library of Medicine. The client programs can run on platforms supporting the TCP/IP communication protocol. Access to the system is provided through a command-line interface and through an Application Programming Interface.

McCray AT, Razi AM, Bangalore AK, Browne AC, Stavri PZ. The UMLS knowledge source server: a versatile internet-based research tool. Proc AMIA Fall Symp 1996:164-8. The National Library of Medicine's Unified Medical Language System (UMLS) project regularly distributes a set of Knowledge Sources to the research community. In 1995 the UMLS data were made available for the first time through the Internet-based UMLS Knowledge Source Server. The server can be accessed through three different client interfaces. The World Wide Web interface allows users to browse and explore the data and to see how those data are organized in the UMLS. The command-line interface is best suited for batch processing, and the application programming interface allows developers at remote sites to embed calls in their application programs to the Knowledge Source Server. Copyright by and reprinted with permission of the American Medical Informatics Association.

McCray AT, Srinivasan S, Browne AC. Lexical methods for managing variation in biomedical terminologies. Proc Annu Symp Comput Appl Med Care 1994:235-9. Access to biomedical terminologies is hampered by the high degree of variability inherent in natural language terms and in the terminologies themselves. The lexicon, lexical programs, databases, and indexes included with the 1994 release of the UMLS Knowledge Sources are designed to help users manage this variability. We describe these resources and illustrate their flexibility and usefulness in providing enhanced access to data in the UMLS Metathesaurus. Copyright by and reprinted with permission of the American Medical Informatics Association.

Nelson SJ, Sherertz DD, Tuttle MS, Erlbaum MS. Using MetaCard: a Hypercard browser for biomedical knowledge sources. Proc Annu Symp Comput Appl Med Care 1990:151-4. As part of the Unified Medical Language System (UMLS) project a large metathesaurus (Meta-1) was built. We have adapted a Hypercard browser of Meta-1 (MetaCard) to enable a user to continue the browsing process, extending from the Metathesaurus to a variety of different knowledge sources. These knowledge sources include Current Disease Description (CDD), Physicians Data Query (PDQ), and Mendelian Inheritance in Man (MIM). Metacard can also be linked to Grateful Med, the NLM program which is used to search MEDLINE. A user can, with minimal training, use MetaCard to access these four different knowledge sources. The links (how one goes from one knowledge source to another) have been built on the basis of disease names. In organizing these links, it was helpful to use CDD as an additional source of knowledge about how diseases are named in various sources. Further plans are to expand the use of the Metathesaurus in building links to knowledge sources. The question with each method of linking used will be to what extent the method provides a robust linkage whose utility can be anticipated. While future refinements or developments of links may give additional functionality, the current linkages are sufficient to provide a useful browsing tool. We believe that this is so because much of the knowledge in each of these sources is organized around diseases. Navigation is easy because of the similarities of the Hypercard interfaces to each of the knowledge sources: a common set of conventions (e.g. point and click) helps make it work. Copyright by and reprinted with permission of the American Medical Informatics Association.

Sherertz D, Tuttle M, Cole W, Erlbaum M, Olson N, Nelson S. A HyperCard implementation of Meta-1.: The first version of the UMLS Metathesaurus. Proc Annu Symp Comput Appl Med Care 1989:1017-8. The Unified Medical Language System (UMLS) is being designed to provide uniform access to computer-based resources in biomedicine. For the foreseeable future, the foundation of the UMLS will be a "metathesaurus of concepts," synthesized from existing biomedical nomenclatures. Meta-1, the first version of the Metathesaurus, will contain all of MeSH, a selection of terms from primary care, clinical medicine, and other domains, and all terms from SNOMED, ICD-9-CM, and CPT-4 which "match" them--about 30,000 terms. In addition, Meta-1 contains information about the occurrence and co-occurrence of its terms in selected resources, such as MEDLINE. As Meta-1 will contain about 100MB of terms and relationships, it is unlikely that it will be "printed." Instead, some UMLS applications will support Metathesaurus "browsing." One way of browsing Meta-1 will be via the Apple Macintosh application called HyperCard. A demonstration of a HyperCard interface, called "Meta-Card" will first acquaint viewers with the contents of the pre-human-review version of Meta-1, and second, illustrate how an object-oriented interface can be programmed to support various visual metaphors, e.g. "click-to-get-more-information," and "click-to-follow-a-semantic-link," and the notion of a Metathesaurus esthetic. Copyright by and reprinted with permission of the American Medical Informatics Association.

Tuttle MS, Sperzel WD, Olson NE, Erlbaum MS, Suarez-Munist O, Sherertz DD, Nelson SJ, Fuller LF. The homogenization of the Metathesaurus schema and distribution format. Proc Annu Symp Comput Appl Med Care 1992:299-303. The third version of the UMLS Metathesaurus, Meta-1.2, to be released in October 1992, will have a simpler schema and simpler distribution formats than the first two versions, Meta-1.0 and Meta-1.1 released in October 1990 and 1991, respectively. For one thing, it will have only a single kind of entry (Concept), rather than three (Concept, Related, and Synonym). Further, the Relational Format, will consist of four logical relations, or tables, instead of the nearly three score different tables used to represent the same kind of information in Meta-1.1. These four tables will contain, respectively, (1) the names of each concept, (2) the relationships between concepts, (3) attributes of the concepts, and (4) a word-based index into the concept names. We argue that the new schema and formats provide a better conceptual model of the Metathesaurus, and represent the information contained there more uniformly. Even though these changes are incremental and evolutionary, both users and software developers should find the Meta-1.2 significantly easier to understand, and the information contained in it significantly easier to use. Copyright by and reprinted with permission of the American Medical Informatics Association.

Return to title page | Return to table of contents

UMLS Applications


Campbell KE, Musen MA. Representation of clinical data using SNOMED III and conceptual graphs. Proc Annu Symp Comput Appl Med Care 1992:354-8. None of the coding schemes currently contained within the Unified Medical Language System (UMLS) is sufficiently expressive to represent medical progress notes adequately. Some coding schemes suffer from domain incompleteness, others suffer from the inability to represent modifiers and time references, and some suffer from both problems. The recently released version of the Systematized Nomenclature of Medicine (SNOMED III) is a potential solution to the data-representation problem because it is relatively domain complete, and because it uses a generative coding scheme that will allow the construction of codes that contain modifiers and time references. SNOMED III does have an important weakness, however. SNOMED III lacks a formalized system for using its codes; thus, it fails to ensure consistency in its use across different institutions. Application of conceptual-graph formalisms to SNOMED III can ensure such consistency of use. Conceptual-graph formalisms will also allow mapping of the resulting SNOMED III codes onto relational data models and onto other formal systems, such as first-order predicate calculus. Copyright by and reprinted with permission of the American Medical Informatics Association.

Carenini G, Moore JD. Using the UMLS Semantic Network as a basis for constructing a terminological knowledge base: a preliminary report. Proc Annu Symp Comput Appl Med Care 1993:725-9. Sharing and reuse of knowledge bases is recognized in Artificial Intelligence and Medical Informatics as beneficial, but difficult. Reusing an existing knowledge base can save considerable time and effort during the knowledge engineering phase, and facilitates integration of systems. However, the degree to which knowledge can be shared among different applications is still mainly an empirical question. In this paper, we describe the preliminary results of our attempt to reuse the UMLS Semantic Network as an ontology for the knowledge base of a patient education system. Copyright by and reprinted with permission of the American Medical Informatics Association.

Cimino JJ. Controlled medical vocabulary construction: methods from the Canon Group [editorial]. J Am Med Inform Assoc 1994 May-Jun;1(3):296-7.

Cimino JJ. Formal descriptions and adaptive mechanisms for changes in controlled medical vocabularies. Methods Inf Med 1996 Sep;35(3):202-10. Comment in: Methods Inf Med 1996 Sep;35(3):211-7; Methods Inf Med 1996 Sep;35(3):218-9. (Eng). Standard controlled medical vocabularies are typically based on a coding scheme, while medical informatics applications tend to have a more formal conceptual foundation. When such applications attempt to use data coded with standard vocabularies, problems can arise when the standard vocabulary changes over time. A formal taxonomy is presented for describing the semantic changes which can occur in a vocabulary, such as simple addition, refinement, precoordination, disambiguation, redundancy, obsolescence, discovered redundancy, major name changes, minor name changes, code reuse, and changed codes. The taxonomy is described that used to effect change in one concept-based vocabulary (the Medical Entities Dictionary), and the utility of the approach is demonstrated by applying it to the changes appearing in the 1994 release of the International Classification of Diseases, Ninth Edition, with Clinical Modifications (ICD-9-CM).

Cimino JJ, Clayton PD. Coping with changing controlled vocabularies. Proc Annu Symp Comput Appl Med Care 1994:135-9. For the foreseeable future, controlled medical vocabularies will be in a constant state of development, expansion and refinement. Changes in controlled vocabularies must be reconciled with historical patient information which is coded using those vocabularies and stored in clinical databases. This paper explores the kinds of changes that can occur in controlled vocabularies, including adding terms (simple additions, refinements, redundancy and disambiguation), deleting terms, changing terms (major and minor name changes), and other special situations (obsolescence, discovering redundancy, and precoordination). Examples are drawn from actual changes appearing in the 1993 update to the International Classification of Diseases (ICD9-CM). The methods being used at Columbia-Presbyterian Medical Center to reconcile its Medical Entities Dictionary and its clinical database are discussed. Copyright by and reprinted with permission of the American Medical Informatics Association.

Cimino JJ, Clayton PD, Hripcsak G, Johnson SB. Knowledge-based approaches to the maintenance of a large controlled medical terminology. J Am Med Inform Assoc 1994 Jan-Feb;1(1):35-50. OBJECTIVE: Develop a knowledge-based representation for a controlled terminology of clinical information to facilitate creation, maintenance, and use of the terminology. DESIGN: The Medical Entities Dictionary (MED) is a semantic network, based on the Unified Medical Language System (UMLS), with a directed acyclic graph to represent multiple hierarchies. Terms from four hospital systems (laboratory, electrocardiography, medical records coding, and pharmacy) were added as nodes in the network. Additional knowledge about terms, added as semantic links, was used to assist in integration, harmonization, and automated classification of disparate terminologies. RESULTS: The MED contains 32,767 terms and is in active clinical use. Automated classification was successfully applied to terms for laboratory specimens, laboratory tests, and medications. One benefit of the approach has been the automated inclusion of medications into multiple pharmacologic and allergenic classes that were not present in the pharmacy system. Another benefit has been the reduction of maintenance efforts by 90%. CONCLUSION: The MED is a hybrid of terminology and knowledge. It provides domain coverage, synonymy, consistency of views, explicit relationships, and multiple classification while preventing redundancy, ambiguity (homonymy) and misclassification. Copyright by and reprinted with permission of the American Medical Informatics Association.

Cimino JJ, Hripcsak G, Johnson SB, Clayton PD. Designing an introspective, multipurpose, controlled medical vocabulary. Proc Annu Symp Comput Appl Med Care 1989:513-8. The medical vocabulary used in clinical information systems must be more than a simple list of terms. We agree that such a vocabulary must have synonymy, domain completeness, and multiple classifications providing consistent views and explicit relationships, while remaining unambiguous and non-redundant. We examine the abilities of existing controlled vocabularies (ICD9-CM, SNOMED, MeSH, CMIT, CPT4, COSTAR, HELP, DXPLAIN, and UMLS) to meet these goals and propose an enhanced vocabulary structure based on a directed, acyclic semantic net. This structure provides a representation which permits introspection by the vocabulary maintenance system responsible for providing a terminology which meets the seven requirements. The vocabulary, called the Medical Entities Dictionary (MED), will serve a variety of applications. Copyright by and reprinted with permission of the American Medical Informatics Association.

Cimino JJ, Hripcsak G, Johnson SB, Friedman C, Fink DJ, Clayton PD. UMLS as knowledge base- a rule-based expert system approach to controlled medical vocabulary management. Proc Annu Symp Comput Appl Med Care 1990:175-9. The National Library of Medicine is developing a Unified Medical Language System (UMLS) which addresses the need for integration of several large, nationally accepted vocabularies. This is important to the clinical information system under development at the Columbia-Presbyterian Medical Center (CPMC). The authors are using UMLS components as the core of their effort to integrate existing local CPMC vocabularies which are not among the source vocabularies of the UMLS. They are also using the UMLS to build a knowledge base of vocabulary structure and content such that logical rules can be developed to assist in the management of the integrated vocabularies. At present, the UMLS Semantic Network is used to organize terms which describe laboratory procedures. The authors have developed a set of rules for identifying undesirable conditions in the vocabulary. They have applied these rules to 526 laboratory test terms and have found ten cases (2%) of definite redundancy and sixty-eight cases (13%) of potential redundancy. The rules have also been used to organize the terminology in new ways that facilitate its management. Using the UMLS model as a vocabulary knowledge base allows the authors to apply an expert system approach to vocabulary integration and management. Copyright by and reprinted with permission of the American Medical Informatics Association.

Evans DA, Cimino JJ, Hersh WR, Huff SM, Bell DS. Toward a medical-concept representation language. The Canon Group. J Am Med Inform Assoc 1994 May-Jun;1(3):207-17. The Canon Group is an informal organization of medical informatics researchers who are working on the problem of developing a deeper representation formalism for use in exchanging data and developing applications. Individuals in the group represent experts in such areas as knowledge representation and computational linguistics, as well as in a variety of medical subdisciplines. All share the view that current mechanisms for the characterization of medical phenomena are either inadequate (limited or rigid) or idiosyncratic (useful for a specific application but incapable of being generalized or extended). The Group proposes to focus on the design of a general schema for medical-language representation including the specification of the resources and associated procedures required to map language (including standard terminologies) into representations that make all implicit relations visible, reveal hidden attributes, and generally resolve ambiguous or vague references. The Group is proceeding by examining large numbers of texts (records) in medical sub-domains to identify candidate concepts and by attempting to develop general rules and representations for elements such as attributes and values so that all concepts may be expressed uniformly. Copyright by and reprinted with permission of the American Medical Informatics Association.

Fowler J, Buffone G, Moreau D. The architecture of a distributed medical dictionary. Medinfo 1995;8(Pt 1):126-30. Exploiting high-speed computer networks to provide a national medical information infrastructure is a goal for medical informatics. The Distributed Medical Dictionary under development at Baylor College of Medicine is a model for an architecture that supports collaborative development of a distributed online medical terminology knowledge-base. A prototype is described that illustrates the concept. Issues that must be addressed by such a system include high availability, acceptable response time, support for local idiom, and control of vocabulary.

Humphreys BL. Comment on "Toward data standards for clinical nursing information". J Am Med Inform Assoc 1994 Nov-Dec;1(6):472-4.

Levesque Y, LeBlanc AR, Maksud M. MD Concept: a model for integrating medical knowledge. Proc Annu Symp Comput Appl Med Care 1994:252-6. Many integrated clinical information systems depend on large knowledge bases containing a dictionary of terms as well as specific information about each term and the relationships between terms. We propose a knowledge base model called MD Concept which is based on a semantic network and uses an object-oriented paradigm and relational tables. A prototype has been developed which integrates the Unified Medical Language System (UMLS) with other databases including the Systematized Nomenclature of Medicine (SNOMED II), the Diagnostic and Statistical Manual of Mental Disorders (DSM-IIIR) and a pharmaceutical database. We demonstrate how a user can easily navigate in this knowledge world using a browser. Copyright by and reprinted with permission of the American Medical Informatics Association.

Rada R. Maintaining thesauri and metathesauri. Int Classif 1990 ;17(3-4):158-164. Maintaining a thesuarus is a time-consuming task which should go hand-in-hand with the indexing of information and should be supported by software. To connect different docuement databases their respective thesauri should be related. The most straightforward way to support this by computer is to map the terms of one thesaurus to those of another. Such a mapping creates one kind of metathesaurus. As citation systems are extended to include full-text on-line , a new thesaurus may be used to index individual paragraphs of a document, and a metathesaurus may apply to a universe of paragraphs. To illustrate these principles several computer systems are described which help people maintain thesauri and metathesauri. Particular success has been had by the National Library of Medicine with its Medical Subject Headings and its Unified Medical Language System. Copyright International Society for Knowledge Organization.

Rosse C, Ben Said M, Eno KR, Brinkley JF. Enhancements of anatomical information in UMLS knowledge sources. Proc Annu Symp Comput Appl Med Care 1995:873-7. Although anatomical terminology forms a part of biomedical structured vocabularies, available sources lack the requisite granularity, semantic types and relationships for comprehensively and consistently representing anatomical concepts in machine readable form. Thoracic angiology was selected as a proof of concept experiment for in depth representation of symbolic information in gross anatomy through the enhancement of semantic types, concepts and relationships in UMLS. Provided the representation of concepts is comprehensive, hierarchies generated with four types of simple relationships are capable of displaying anatomical information from the systemic view point with sufficient detail to meet the needs of applications in basic science education and in the practice of surgical subspecialties. Copyright by and reprinted with permission of the American Medical Informatics Association.

Wigertz OB, Clayton PD, Hripcsak G, Linnarsson R. Knowledge representation and data model to support medical knowledge base transportability. In: Talmon JL, Fox J, editors. Knowledge based systems in medicine: methods, applications and evaluation. Proceedings of the Workshop System Engineering in Medicine; 1989 Mar 16-18; Maastricht, Netherlands. Berlin: Springer-Verlag; 1991. p. 80-90. The authors discuss the thesis that it is possible to design a data model and a representation of the medical decision-making knowledge in a sufficiently high level and modular format to ease transportability between institutions and systems. They discuss some of the prerequisites of this uniform representation of medical terms, medical data and medical logic. They discuss UMLS (unified medical language system) and its metathesaurus. The possible impact of the sharing of knowledge bases on the development and use of medical decision support systems is also considered.

Return to title page | Return to table of contents

Anderson CL, Hukill M, Wang W, Bangerter B, Kattelman M, Hartmann-Voss K. A practical approach to structuring data in an integrated expert system. Proc Annu Symp Comput Appl Med Care 1990:599-603. At the onset of creating an expert system integrated into existing clinical information systems, the need for a special data dictionary which could map to diverse terminology in a variety of domains at a variety of health care delivery sites was identified. Based upon research conducted on existing medical nomenclatures, a methodology for structuring the expert system data dictionary was produced. The dictionary is segmented into specific domains (e.g. pharmacy, laboratory, etc.) as determined by traditional health care specialties. Terms within each domain are categorized further with guidance from working conventions commonly found within that specialty (e.g. ICD-9-CM, CPT-4, etc.). Emphasis was placed on selecting protocols consistent with those used in the Unified Medical Language System to accommodate future translatability. Copyright by and reprinted with permission of the American Medical Informatics Association.

Chute CG, Cesnik B, van Bemmel JH. Medical data and knowledge management by integrated medical workstations: summary and recommendations. Int J Biomed Comput 1994 Jan;34(1-4):175-83. The health care professional workstation will function as an interface between the user and the patient data as well as an interface pertinent medical knowledge. Appropriate knowledge focus will require the workstation to recognize the concepts and structure of patient data, and understand the scope and access methods of knowledge sources. Issues are organized around five major themes: (i) structure, (ii) reliability and validation, (iii) views, (iv) location, and (v) ethical and legal. Conventional database representations can effectively address data structure and format variations that will inevitably persist in local data stores. The reliability of data and the validation of knowledge are critical issues that may determine the ultimate utility of clinical workstations. Alternative views of patient information and knowledge sources represent the true power of an intelligent data portal, represented by a well-designed clinical workstation. Both data and knowledge are optimally represented in decentralized information networks, although the confidentiality and ownership of this information must be respected. Evolutionary progress toward consistent representations of knowledge and patient data will be facilitated by the establishment of self-documentation standards for the developers of data encoding systems and knowledge sources, perhaps extended from the preliminary model afforded by the Unified Medical Language System (UMLS).

Cimino JJ. Data storage and knowledge representation for clinical workstations. Int J Biomed Comput 1994 Jan;34(1-4):185-94. The representation of patient information for use in clinical workstations is a complex problem. Ideally, it should be addressed in a way that allows multiple uses of the data, including simple manual review, sharing and pooling across institutions, and as input to knowledge-based decision support systems. To a great extent, this means coding information with controlled medical vocabularies, but it does not mean that all information must be codable before workstations are feasible. This paper defines some of the choices, both current and future, that are available to address the needs of controlled medical vocabularies for representing data and knowledge in clinical workstations and explores some of the implications of those choices.

Cimino JJ, Barnett GO. Automated translation between medical terminologies using semantic definitions. MD Comput 1990 Mar-Apr;7(2):104-9. Published erratum appears in MD Comput 1990 Jul-Aug;7(4):268. Automatic translation of medical terms from one controlled vocabulary into another is essential to the integration of diverse medical informatics systems. We have developed a strategy in which medical terms are represented in a standard format that provides semantic description of the terms. We demonstrate the representational power of our method by showing that a subset of medical terms (procedures) from diverse vocabularies can be described in this manner. We assess the potential usefulness of our approach for facilitating automatic translation by finding the closest match for MeSH cardiovascular procedures with ICD-9 procedures. Copyright 1990 Springer-Verlag.

Cimino JJ, Johnson SB, Hripcsak G, Sideli RV, Fink DJ, Friedman C, Clayton PD. One year's experience with the Unified Medical Language System (UMLS) in academia and patient care. Medinfo 1992;7(Pt 2):1501-5. The first edition of the Unified Medical Language System (UMLS), released in September 1990 by the National Library of Medicine consists of a metathesaurus of 78862 interrelated terms and a semantic network for categorizing these terms and representing additional relationships between them. The integration of information systems at the Columbia-Presbyterian Medical Center requires attention to the problem of reconciling the diverse controlled vocabularies used by each system. One of the purposes of the UMLS is to facilitate automatic methods for such reconciliation. The authors have been exploring the suitability of the UMLS for various aspects of the vocabulary management tasks, including vocabulary organization, vocabulary representation and automated vocabulary translation. The UMLS is demonstrated to have specific value for vocabulary representation and translation; however, many areas are identified where further development of its knowledge sources would be useful.

France FH. Standards for nomenclature (HIS). In: Bakker AR, Ehlers CTh, Bryant JR, Hammond WE, editors. Hospital information systems: scope-design-architecture. Proceedings of the IMIA Working Conference; 1991 Sep 7-11; Gottingen, Germany. Amsterdam: North-Holland; 1992. p. 167-74. Uniform HISs which allow one to retrieve medical terms with the same meaning are discussed. The ICD-10 (International Classification of Diseases, 10th Revision) is expected to be published in 1993 by the World Health Organisation (WHO), with an alphabetic index containing former ICD-9 codes (conversion table). Each country has to decide when ICD-10 will be in use for hospitalized patients. Mapping extensions of ICD-9 to ICD-10 will take some more time, as well as conversion tables to DSM III, SNOMED, or MeSH, as performed by UMLS (Unified Medical Language System of the National Library of Medicine-USA). ICD-9-CM is updated regularly in the USA, but insufficient to describe laboratory tests, X-Rays procedures, drugs; ICCS (International Classification of Clinical Services) contains such information, complementary to ICD-9-CM. ICCS might be tested for mapping other codes, as a kind of metacode, in order to detect differences between national nomenclatures, when they exist. Up to now it remains 'Canadian-American' rather than 'international'. Standards for nomenclature should also include rules for coding, as well as elements of semantics in order to enable similar interpretation.

Houtchens BA, Allen A, Clemmer TP, Lindberg DA, Pedersen S. Telemedicine protocols and standards: development and implementation. J Med Syst 1995 Apr;19(2):93-119.

Humphreys BL, Hole WT, McCray AT, Fitzmaurice JM. Planned NLM/AHCPR large-scale vocabulary test: using UMLS technology to determine the extent to which controlled vocabularies cover terminology needed for health care and public health. J Am Med Inform Assoc 1996 Jul-Aug;3(4):281-7. The National Library of Medicine (NLM) and the Agency for Health Care Policy and Research (AHCPR) are sponsoring a test to determine the extent to which a combination of existing health-related terminologies covers vocabulary needed in health information systems. The test vocabularies are the 30 that are fully or partially represented in the 1996 edition of the Unified Medical Language System (UMLS) Metathesaurus, plus three planned additions: the portions of SNOMED International not in the 1996 Metathesaurus, the Read Clinical Classification, and the Logical Observations Identifiers, Names, and Codes (LOINC) system. These vocabularies are available to testers through a special interface to the Internet-based UMLS Knowledge Source Server. The test will determine the ability of the test vocabularies to serve as a source of controlled vocabulary for health data systems and applications. It should provide the basis for realistic resource estimates for developing and maintaining a comprehensive "standard" health vocabulary that is based on existing terminologies. Copyright by and reprinted with permission of the American Medical Informatics Association.

Masys DR. Of codes and keywords: standards for biomedical nomenclature. Acad Med 1990 Oct;65(10):627-9.

McCormick KA, Lang N, Zielstorff R, Milholland DK, Saba V, Jacox A. Toward standard classification schemes for nursing language: recommendations of the American Nurses Association Steering Committee on Databases to Support Clinical Nursing Practice. J Am Med Inform Assoc 1994 Nov-Dec;1(6):421-7. The American Nurses Association (ANA) Cabinet on Nursing Practice mandated the formation of the Steering Committee on Databases to Support Clinical Nursing Practice. The Committee has established the process and the criteria by which to review and recommend nursing classification schemes based on the ANA Nursing Process Standards and elements contained in the Nursing Minimum Data Set (NMDS) for inclusion of nursing data elements in national databases. Four classification schemes have been recognized by the Committee for use in national databases. These classification schemes have been forwarded to the National Library of Medicine (NLM) for inclusion in the Unified Medical Language System (UMLS) and to the International Council of Nurses for the development of a proposed International Classification of Nursing Practice. Copyright by and reprinted with permission of the American Medical Informatics Association.

Pretschner DP. Data analysis and information modelling: objects codes, concepts. Radiat Prot Dosim 1995;57 (1-4):175-84.

Prokosch HU, Dudeck J, Michel A. Standards for data dictionaries (HIS). In: Bakker AR, Ehlers CTh, Bryant JR, Hammond WE, editors. Hospital information systems: scope-design-architecture. Proceedings of the IMIA Working Conference; 1991 Sep 7-11; Gottingen, Germany. Netherlands: North-Holland; 1992. p. 189-95. A controlled vocabulary is a basic requirement in the development and implementation of HIS. A medical data dictionary (MDD) has to provide a framework for such a controlled vocabulary and additionally the possibility to define semantic relationships and mappings to other MDDs. Several useful models for MDDs have been developed in the last years. Among these the UMLS seems to be the most comprehensive one. Unfortunately this project is still limited to the US. In order to achieve one standard framework for MDD development, researchers from Europe and all over the world will have to carefully analyze the work which has already been done within UMLS and try to finally agree to one common approach.

Prokosch HU, Kamm S, Wieczorek D, Dudeck J. Knowledge representation in pharmacology. A possible application area for the Arden Syntax? Proc Annu Symp Comput Appl Med Care 1991:243-7. In 1990 the Arden Syntax was proposed as a first version of a standardized syntax for the representation of medical knowledge. For the evaluation of the practicability of this first release we have analyzed the medical and pharmacological knowledge applied in the process of drug prescription. The separation of declarative (e.g. in a semantic network) and procedural knowledge is a basic issue of our research. We therefore propose to further extend the Arden syntax with declarative knowledge representation facilities. One way to do this may be the incorporation of a standardized medical data dictionary (e.g. the UMLS Metathesaurus) which promotes the representation of medical terms in a semantic network. Furthermore the problem of 'institution-specific knowledge', which is especially important for the issue of knowledge sharing between different institutions, is analyzed based on examples of knowledge modules for monitoring drug allergies and drug-drug-interactions. Copyright by and reprinted with permission of the American Medical Informatics Association.

Rocha RA, Beatriz MD, Huff SM. Automated translation between medical vocabularies using a frame-based interlingua. Proc Annu Symp Comput Appl Med Care 1993:690-4. The integration of clinical systems almost always requires a translation phase, where vocabularies are compared and the similar concepts are matched. The lack of standards in the area of medical concept representation makes this task very difficult. The authors describe the development of a frame-based application that automatically translates terms found in one vocabulary to another. The application implements an innovative scoring algorithm that ranks the best matches using an exponential scale. Preliminary results and the comparison against a manual process in the same domain are also discussed. Copyright by and reprinted with permission of the American Medical Informatics Association.

Rocha RA, Huff SM. Using digrams to map controlled medical vocabularies. Proc Annu Symp Comput Appl Med Care 1994:172-6. A program for matching between controlled medical vocabularies has been developed which adopts methods used in the domain of Information Retrieval. This program combines a stemmer based on fragments of words (digrams) with a similarity function. The proposed stemmer did not require any knowledge about word-formation rules and helped the identification of several kinds of word variants. The adopted similarity function assigned the highest score to the best candidate match in 99.0% of the cases. Copyright by and reprinted with permission of the American Medical Informatics Association.

Rocha RA, Huff SM, Haug PJ, Warner HR. Designing a controlled medical vocabulary server: the VOSER project. Comput Biomed Res 1994 Dec;27(6):472-507. The authors describe their experience designing a controlled medical vocabulary server created to support the exchange of patient data and medical decision logic. The first section introduces practical and theoretical premises that guided the design of the vocabulary server. The second section describes a series of structures needed to implement the proposed server, emphasizing their conformance to the design premises. The third section introduces potential applications that provide services to end users and also a group of tools necessary for maintaining the server corpus. In the fourth section, the authors propose an implementation strategy based on a common framework and on the participation of groups from different health-related domains. Copyright 1994 Academic Press.

Walker DC, Walters RF. Developing a multilingual index to access health-care terminologies. M Comput 1993 Sep;1(4):32-3, 36-42.

Walker DC, Walters RF, Cimino JJ, Dujols P, Li Ensheng, Giere W, Kiuchi T, Lamberts H, Moore WG, Roger FH, Satomura Y, Stitt FW. Internationalization of health care terminology. Medinfo 1992;7(Pt 2):1444-51. The Unified Medical Language System (UMLS) of the National Library of Medicine (NLM) provides a terminology interface for biomedical resources in the USA. A project to develop a multinational terminology resource could effectively extend the existing thesaurus of the UMLS, and help the internationalization of medical information. What the UMLS is at this moment, is summarized in this paper. Some terminologies with 'terminology browsers' are briefly described. The formation of a more encompassing multilingual master index to access various terminologies and their 'browsers' is advocated. An international body to initiate and coordinate any such project would be needed.

Zeng Q, Cimino JJ. Mapping medical vocabularies to the Unified Medical Language System. Proc AMIA Fall Symp 1996:105-9. This paper presents our work in automated mapping of medical vocabularies to the National Library of Medicine's Unified Medical Language System (UMLS). We used the UMLS Knowledge Source (KS) tool to map terms from several sources to UMLS Metathesaurus concepts. We compared performance of the KS tools with our own Minimal Representable Units Method (MRUM). The KS tools were able to map terms from 13% to 54% of the time, depending on the term set and the KS options used. Our MRUM method mapped between 96% and 99% of the terms. Based on our experience, we believe that questions remain about the best method by which the UMLS can be used to achieve automated term translation. Copyright by and reprinted with permission of the American Medical Informatics Association.

Zielstorff RD, Lang NM, Saba VK, McCormick KA, Milholland DK. Toward a uniform language for nursing in the US: work of the American Nurses Association Steering Committee on databases to support clinical practice. Medinfo 1995;8(Pt 2):1362-6. This paper reports on the work of the American Nurses Association Steering Committee on Databases to Support Clinical Practice, in existence since 1989. Responding to its broad charges, the Steering Committee has laid down the foundations for its work in declaring the nursing process as the framework for nursing data in database systems, and in endorsing the Nursing Minimum Data Set as the set of minimum elements for any system designed to carry health-related data that reflects nursing care. In addition, the Steering Committee has begun initiatives to: 1) promote the inclusion of nursing-related data in large health-related databases, and 2) develop a Uniform Language for nursing through a phased approach. The Steering Committee also works directly with the International Council of Nurses to promote the inclusion of nursing data in internationally used classification systems and to develop an international language that describes nursing care.

Return to title page | Return to table of contents

Burgun A, Bodenreider O, Denier P, Delamarre D, Botti G, Lukacs B, Mayeux D, Bremond M, Kohler F, Fieschi M, et al. Knowledge acquisition from the UMLS sources: application to the description of surgical procedures. Medinfo 1995;8(1):75-9. The re-usability of lexicons and knowledge in medicine is a crucial challenge. The Unified Medical Language System (UMLS) project has attempted to provide a repository of concepts, semantically categorized for biomedical domain. This paper describes some results about the relevance of UMLS structures for specific purposes. We have focused on the description of surgical procedures. Discussion concerns synonymy of terms, granularity of concepts, and ontology. A preliminary work on the exploitation of interconcept links by a computerized application reveals a heterogeneous implementation of those relationships. However, the UMLS provides a powerful knowledge base for developers.

Burgun A, Botti G, Lukacs B, Mayeux D, Seka LP, Delamarre D, Bremond M, Kohler F, Fieschi M, Le Beux P. A system that facilitates the orientation within procedure nomenclatures through a semantic approach. Med Inf (Lond) 1994 Oct-Dec;19(4):297-310.

Burgun A, Delamarre D, Botti G, Lukacs B, Mayeux D, Bremond M, Kohler F, Fieschi M, Le Beux P. Designing a sub-set of the UMLS knowledge base applied to a clinical domain: methods and evaluation. Proc Annu Symp Comput Appl Med Care 1994:968. The UMLS is a complex collection of interconnected biomedical concepts derived from standard nomenclatures. Designing a specific subset of the UMLS knowledge base relevant to a medical domain is a prerequisite for the development of specialized applications based on UMLS. We have developed a method based on the selection of the appropriate terms in original nomenclatures and the capture of a set of UMLS terms that are linked to them in the network to a certain degree. We have experimented it as the foundation for a concept base applied to urology. Results depend on the exhaustiveness of the relationships between the Meta-1 concepts. A preliminary analysis of the sub-base reveals that some adaptations of vocabulary and ontology are required for clinical applications. Copyright by and reprinted with permission of the American Medical Informatics Association.

Gangemi A, Galanti M, Galeazzi E, Rossi Mori A. Beyond UMLS: computational semantics for medical records. Medinfo 1992;7(Pt 1):703-8. Computational semantics is a promising approach for effective processing of clinical data, integrating medical record, information retrieval applications, statistical databases and knowledge based systems. The authors introduce 'semiotic codes', a generalization of medical language and artificial coding systems, and describe their power by a scale: (C1) non-combinatorial Codes (icons); (C2) combinatorial Codes, with finite set of signs (conventional coding systems); (C3) combinatorial Codes, with potentially infinite set of signs (combinatorial expressions); (C4) combinatorial Codes, with calculable synonymy (formal languages); (C5) creative Codes with unpredictable synonymy and homonymy (natural languages). In order to convert each potential Semiotic Code to another, one needs a system of class C4, that is computationally 'more powerful' than any other. It is based on formal expressions, allowing to compare definitions of concepts to decide their equivalence. It acts as an interlingua to represent medical concepts in the computer, in a neutral way with respect to other semiotic codes and applications.

Hardy B, Burgun A, Le Beux P. Accessing to knowledge base terms using UMLS concepts. In: Brender J, Christensen JP, Scherrer JR, McNair P, editors. Medical Informatics Europe '96: Human facets in information technologies. Washington: IOS Press; 1996. p. 164-8. (Studies in health technology and informatics; 34). This paper presents the first phase of a project of a terminology server based on the UMLS project and using both lexical mapping and conceptual relations. The choice of tools has been to build a user to data knowledge base using the PERL language associated the World Wide Web technology and relational data base (ORACLE tm).

Hersh WR, Campbell EH, Evans DA, Brownlow ND. Empirical, automated vocabulary discovery using large text corpora and advanced natural language processing tools. Proc AMIA Fall Symp 1996:159-63. A major impediment to the full benefit of electronic medical records is the lack of a comprehensive clinical vocabulary. Most existing vocabularies do not allow the full expressiveness of clinical diagnoses and findings that are often qualified by modifiers relating to severity, acuity, and temporal factors. One reason for the lack of expressivity is the inability of traditional manual construction techniques to identify the diversity of language used by clinicians. This study used advanced natural language processing tools to identify terminology in a clinical findings domain, compare its coverage with the UMLS Metathesaurus, and quantify the effort required to discover the additional terminology. It was found that substantial amounts of phrases and individual modifiers were not present in the UMLS Metathesaurus and that modest effort in human time and computer processing were needed to obtain the larger quantity of terms. Copyright by and reprinted with permission of the American Medical Informatics Association.

Nelson SA, Olson NE, Fuller L, Tuttle MS, Cole WG, Sherertz DD. Identifying concepts in medical knowledge. Medinfo 1995;8(Pt 1):33-6. The barrier word method of identifying nominal phrases in text, using a very long barrier word list, was evaluated in two different sets of text. In a sample of 10 paragraphs from the Medical Knowledge Self-Assessment Program of the American College of Physicians, the yield of nominal phrases as a percent of total chunks isolated was 66%. Some 500,000 chunks were isolated from Principles and Practice of Oncology (PPO). 38% of these chunk-occurrences were of chunks which matched to 10,000 concept names in Meta-1.4, the most recent version of the UMLS Metathesaurus. 50 paragraphs from PPO were chosen at random. Co-occurrences of concepts in those paragraphs were reviewed. 42 of the paragraphs had unique or infrequently occurring co-occurrences which described closely the major thrust of the paragraph.

Nelson SJ, Cole WG, Tuttle MS, Olson NE, Sherertz DD. Recognizing new medical knowledge computationally. Proc Annu Symp Comput Appl Med Care 1993:409-13. Can new medical knowledge be recognized computationally? We know knowledge is changing, and our knowledge-based systems will need to accommodate that change in knowledge on a regular basis if they are to stay successful. Computational recognition of these changes seems desirable. It is unlikely that low level objects in the computational universe, bits and characters, will change much over time, higher level objects of language, where meaning begins to emerge, may show change. An analysis of ten arbitrarily selected paragraphs from the Medical Knowledge Self-Assessment Program of the American College of Physicians was used as a test bed for nominal phrase recognition. While there were words not known to Meta-1.2, only 8 of the 32 concepts new to the primary author were pointed to by new words. Use of a barrier word method was successful in identifying 23 of the 32 new concepts. Use of co-occurrence (in sentences) of putative nominal phrases may reduce the amount of human effort involved in recognizing the emergence of new relationships. Copyright by and reprinted with permission of the American Medical Informatics Association.

Rindflesch TC, Aronson AR. Ambiguity resolution while mapping free text to the UMLS Metathesaurus. Proc Annu Symp Comput Appl Med Care 1994:240-4. We propose a method for resolving ambiguities encountered when mapping free text to the UMLS Metathesaurus. Much of the research in medical informatics involves the manipulation of free text. The Metathesaurus contains extensive information which supports solutions to problems encountered while processing such text. After discussing the process of mapping free text to the Metathesaurus and describing the ambiguities which are often the result of such mapping, we provide examples of rules designed to eliminate mapping ambiguities. These rules refer to the context in which the ambiguity occurs and crucially depend on semantic types obtained from the Metathesaurus. We have conducted a preliminary test of the methodology and the results obtained indicate that the rules successfully resolve ambiguity around 80% of the time. Copyright by and reprinted with permission of the American Medical Informatics Association.

Sneiderman CA, Rindflesch TC, Aronson AR. Finding the findings: identification of findings in medical literature using restricted natural language processing. Proc AMIA Fall Symp 1996:239-43. The ability to search the biomedical literature based on findings would provide enhanced access to information. We describe a computer program called FINDX which relies on the UMLS Metathesaurus and restricted natural language processing to identify findings in free text. Such identification can serve as a filtering mechanism while selecting relevant papers. After discussing the salient characteristics of findings on which FINDX depends, we report on the results of an experiment in which we tested the program on a set of MEDLINE abstracts pertaining to the diagnosis of Parkinson Disease. Copyright by and reprinted with permission of the American Medical Informatics Association.

Return to title page | Return to table of contents

Cimino JJ. Use of the Unified Medical Language System in patient care at the Columbia-Presbyterian Medical Center. Methods Inf Med 1995 Mar;34(1-2):158-64. The Unified Medical Language System (UMLS) project at the United States National Library of Medicine contains and organizes a large number of terms from controlled medical vocabularies. This study examines the suitability of the UMLS for representing patient care information as it exists in the Columbia-Presbyterian Medical Center (CPMC) clinical in formation system. Comparisons were made between the semantic types, semantic relations and medical concepts of the UMLS and the data model entities, semantic classes, semantic relations and concepts in the CPMC system. Results of the comparison demonstrate that the UMLS structural model is appropriate for representing CPMC vocabularies and patient data and that the UMLS concepts provide excellent coverage of CPMC concepts in many areas. Recommendations are made for enhancing UMLS structure to provide additional coverage of the CPMC model. It is concluded that content expansion to provide better coverage of clinical terminology is possible within the current UMLS model.

Cimino JJ, Barnett GO. The physician's workstation: recording a physical examination using a controlled vocabulary. Proc Annu Symp Comput Appl Med Care 1987:287-91. A system has been developed which runs on MS-DOS personal computers and serves as an experimental model of a physician's workstation. The program provides an interface to a controlled vocabulary which allows rapid selection of appropriate terms and modifiers for entry of clinical information. Because it captures patient descriptions, it has the ability to serve as an intermediary between the physician and computer-based medical knowledge resources. At present, the vocabulary permits rapid, reliable representation of cardiac physical examination findings. Copyright 1987 IEEE. Reprinted, with permission.

Lindberg DA, Humphreys BL. The Unified Medical Language System (UMLS) and computer-based patient records. In: Ball MJ, Collen MF, editors. Aspects of the computer-based patient record. New York: Springer-Verlag; 1992. p. 165-75.

Lowe HJ. Image Engine: an object-oriented multimedia database for storing, retrieving and sharing medical images and text. Proc Annu Symp Comput Appl Med Care 1993:839-43. This paper describes Image Engine, an object-oriented, microcomputer-based, multimedia database designed to facilitate the storage and retrieval of digitized biomedical still images, video, and text using inexpensive desktop computers. The current prototype runs on Apple Macintosh computers and allows network database access via peer to peer file sharing protocols. Image Engine supports both free text and controlled vocabulary indexing of multimedia objects. The latter is implemented using the TView thesaurus model developed by the author. The current prototype of Image Engine uses the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary (with UMLS Meta-1 extensions) as its indexing thesaurus. Copyright by and reprinted with permission of the American Medical Informatics Association.

Lowe HJ, Buchanan BG, Cooper GF, Vries JK. Building a medical multimedia database system to integrate clinical information: an application of high-performance computing and communications technology. Bull Med Libr Assoc 1995 Jan;83(1):57-64. The rapid growth of diagnostic-imaging technologies over the past two decades has dramatically increased the amount of nontextual data generated in clinical medicine. The architecture of traditional, text-oriented, clinical information systems has made the integration of digitized clinical images with the patient record problematic. Systems for the classification, retrieval, and integration of clinical images are in their infancy. Recent advances in high-performance computing, imaging, and networking technology now make it technologically and economically feasible to develop an integrated, multimedia, electronic patient record. As part of The National Library of Medicine's Biomedical Applications of High-Performance Computing and Communications program, we plan to develop Image Engine, a prototype microcomputer-based system for the storage, retrieval, integration, and sharing of a wide range of clinically important digital images. Images stored in the Image Engine database will be indexed and organized using the Unified Medical Language System Metathesaurus and will be dynamically linked to data in a text-based, clinical information system. We will evaluate Image Engine by initially implementing it in three clinical domains (oncology, gastroenterology, and clinical pathology) at the University of Pittsburgh Medical Center. Copyright by and reprinted with permission of the Medical Library Association.

Stitt FW. The problem-oriented medical synopsis: coding, indexing, and classification sub-model. Proc Annu Symp Comput Appl Med Care 1994:964. A clinical information system consists of four major components: the clinical database, decision support, data analysis (including outcomes), and the development system. We have created such a system using generally available database methodology. The system is documented using a conceptual model, a physical model, and sub-models for individual components. A key sub-model of the clinical database, for record-keeping, has been defined for coding, indexing, and classification of the medical narrative typically encountered in medical records. We describe an approach to the development of the coding component that results in a hybrid system for recording information, locating indexed information, and summarizing it for analysis of outcomes. These are based on a primary term list--the problem glossary; SNOMed--the Systematized Nomenclature of Medicine (3rd. edition); and ICD-9-CM. The relationship with the UMLS is also discussed. Copyright by and reprinted with permission of the American Medical Informatics Association.

Stitt FW. A standards-based clinical information system for HIV/AIDS. Medinfo 1995;8(Pt 1):402. Objective: To create a clinical data repository to interface the Veteran's Administration (VA) Decentralized Hospital Computer Program (DHCP) and a departmental clinical information system for the management of HIV patients. This system supports record-keeping, decision-making, reporting, and analysis. The database development was designed to overcome two impediments to successful implementations of clinical databases: (i) lack of a standard reference data model, and; (ii) lack of a universal standard for medical concept representation. Background: Health Level Seven (HL7) is a standard protocol that specifies the implementation of interfaces between two computer applications (sender and receiver) from different vendors or sources of electronic data exchange in the health care environment. This eliminates or substantially reduces the custom interface programming and program maintenance that would otherwise be required. HL7 defines the data to be exchanged, the timing of the interchange, and the communication of errors to the application. The formats are generic in nature and must be configured to meet the needs of the two applications involved. The standard conceptually operates at the seventh level of the ISO model for Open Systems Interconnection (OSI). The OSI simply defines the data elements that are exchanged as abstract messages, and does not prescribe the exact bit stream of the messages that flow over the network. Lower level network software developed according to the OSI model may be used to encode and decode the actual bit stream. The OSI protocols are not universally implemented and, therefore, a set of encoding rules for defining the exact representation of a message must be specified. The VA has created an HL7 module to assist DHCP applications in exchanging health care information with other applications using the HL7 protocol. The DHCP HL7 module consists of a set of utility routines and files that provide a generic interface to the HL7 protocol for all DHCP applications. Setting: The VA's DHCP core modules are in standard use at 169 hospitals, and the role of the VA system in health care delivery has been discussed elsewhere. This development was performed at the Miami VA Medical Center Special Immunology Unit, where a database was created for an HIV patient registry in 1987. Over 2,300 patients have been entered into a database that supports a problem-oriented summary of the patient's clinical record. The interface to the VA DHCP was designed and implemented to capture information from the patient treatment file, pharmacy, laboratory, radiology, and other modules. Results: We obtained a suite of programs for implementing the HL7 encoding rules from Columbia-Presbyterian Medical Center in New York, written in ANSI C. This toolkit isolates our application programs from the details of the HL7 encoding rules, and allows them to deal with abstract messages and the programming level. While HL7 has become a standard for healthcare message exchange, SQL (Structured Query Language) is the standard for database definition, data manipulation, and query. The target database provides clinical workstation functionality. Medical concepts are encoded using a preferred terminology derived from over 15 sources that include the Unified Medical Language System and SNOMed International.

Trace D, Naeymi-Rad F, Almeida FD, Moidu K, Haines D. A longitudinal medical record (IMR). Proc Annu Symp Comput Appl Med Care 1993:911. The Intelligent Medical Record (IMR) is currently being used in patient care activities at Norwalk Hospital in Norw alk, CT, and at Cook County Hospital in Chicago, IL. IMR has evolved into a multi-encounter patient record, stored in a multi-user database management system. The graphical user interface is designed to support physician efforts to capture patient data, c reate progress notes, and produce a longitudinal medical record. The authors describe the program's major components, including its multi-encounter knowledge management system, problem list, progress notes, point-of-entry query, and interface to UMLS. Copyright by and reprinted with permission of the American Medical Informatics Association.

Wagner MM. An automatic indexing method for medical documents. Proc Annu Symp Comput Appl Med Care 1991:1011-7. This paper describes MetaIndex, an automatic indexing program that creates symbolic representations of documents for the purpose of document retrieval. MetaIndex uses a simple transition network parser to recognize a language that is derived from the set of main concepts in the Unified Medical Language System Metathesaurus (Meta-1). MetaIndex uses a hierarchy of medical concepts, also derived from Meta-1, to represent the content of documents. The goal of this approach is to improve document retrieval performance by better representation of documents. An evaluation method is described, and the performance of MetaIndex on the task of indexing the Slice of Life medical image collection is reported. Copyright by and reprinted with permission of the American Medical Informatics Association.

Wagner MM, Cooper GF. Evaluation of a Meta-1-based automatic indexing method for medical documents. Comput Biomed Res 1992 Aug;25(4):336-50. This paper describes MetaIndex, an automatic indexing program that creates symbolic representations of documents for the purpose of document retrieval. MetaIndex uses a simple transition network parser to recognize a language that is derived from the set of main concepts in the Unified Medical Language System Metathesaurus (Meta-1). MetaIndex uses a hierarchy of medical concepts, also derived from Meta-1, to represent the content of documents. The goal of this approach is to improve document retrieval performance by better representation of documents. An evaluation method is described, and the performance of MetaIndex on the task of indexing the Slice of Life medical image collection is reported. Copyright 1992 Academic Press.

Return to title page | Return to table of contents

Breene M, Jasmin R, Eisner J. Computer-based curriculum analysis. A customized approach using external standards and the UMLS. Proc Annu Symp Comput Appl Med Care 1993:919. This article describes the beta test stage of the first software product developed by The American Association of Dental Schools (AADS) Curriculum Database Consortium, which is known as CATs (Curriculum Analysis Tools). The consortium wanted to develop a software tool that could adapt to the special circumstances of each of the fifty Dental Schools and Allied Dental Programs, while accurately and comprehensively representing their curricula. The product has satisfied the principle requirements for user customization, cross-referencing to external standards, and content searching by keyword using the UMLS. The product provides an intuitive method of curriculum analysis with powerful tools for viewing and interpreting the results. Copyright by and reprinted with permission of the American Medical Informatics Association.

Clyman SG. UMLS linked to a system for authoring simulations used in evaluation of physicians. Proc Annu Symp Comput Appl Med Care 1993:921. The National Board of Medical Examiners (NBME) develops examinations used in licensing physicians in the United States. To complement existing multiple-choice examinations, NBME has developed and is studying uncued, dynamic, computer-based simulations (CBX) of the patient care environment. In CBX, physicians type free-text orders for diagnostic studies, procedures , consultants, medications, and other therapies. As simulated time passes, patient conditions evolve in response to physician management decisions; multiple outcomes are possible. One impediment to widespread use of CBX is the technical expertise, time, a nd cost associated with it. A new case authoring system called SEEDS (Simulation Environment Engineering and Development System) will increase the efficiency of this process. The National Library of Medicine's Unified Medical Language System (UMLS) is use d in the development of SEEDS to link UMLS and CBX terms and concepts. Copyright by and reprinted with permission of the American Medical Informatics Association.

Eisner J. Curriculum Analysis Tools (CATs). A cooperative approach to the design of curriculum databases. Proc Annu Symp Comput Appl Med Care 1993:766-70. In 1990, a small group of dental schools agreed to pool their resources and cooperate in the design and programming of curriculum analysis software. After two and one-half years, the consortium had grown to include more than 50 institutions. Its efforts have been endorsed by the American Association of Dental Schools, and it is now beta testing its first software product, known as Curriculum Analysis Tools (CATs). The process by which the software has been developed, as well as its current design, offers a unique blend of flexibility and creativity, which could perhaps be adopted by other health professionals. Copyright by and reprinted with permission of the American Medical Informatics Association.

Eisner J. Multi-national, multi-lingual, multi-professional CATs: (Curriculum Analysis Tools). Medinfo 1995;8(Pt 2):1706. A consortium of dental schools and allied dental programs was established in 1991 with the expressed purpose of creating a curriculum database program that was end-user modifiable. In April of 1994, a beta version (Beta 2.5 written in FoxPro(TM) 2.5) of the software CATs, an acronym for Curriculum Analysis Tools, was released for use by over 30 of the consortium's 60 member institutions, while the remainder either waited for the Macintosh (TM) or Windows (TM) versions of the program or were simply not ready to begin an institutional curriculum analysis project. Shortly after this release, the design specifications were rewritten based on a thorough critique of the Beta 2.5 design and coding structures and user feedback. The result was Beta 3.0 which has been designed to accommodate any health professions curriculum, in any country that uses English or French as one of its languages. Given the program's extensive use of screen generation tools, it was quite easy to offer screen displays in a second language. As more languages become available as part of the Unified Medical Language System, used to document curriculum content, the program's design will allow their incorporation. When the software arrives at a new institution, the choice of language and health profession will have been preselected, leaving the Curriculum Database Manager to identify the country where the member institution is located. With these 'macro' end-user decisions completed, the database manager can turn to a more specific set of end-user questions including: 1) will the curriculum view selected for analysis be created by the course directors (provider entry of structured course outlines) or by the students (consumer entry of class session summaries)?; 2) which elements within the provided course outline or class session modules will be used?; 3) which, if any, internal curriculum validation measures will be in cluded?; and 4) which, if any, external validation measures will be included. External measures can include accreditation standards, entry-level practitioner competencies, an index of learning behaviors, an index of discipline integration, or others defined by the institution. When data entry, which is secure to the course level, is complete users may choose to browse a variety of graphic representations of their curriculum, or either preview or print a variety of reports that offer more detail about the content and adequacy of their curriculum. The progress of all data entry can be monitored by the database manager over the course of an academic year, and all reports contain extensive missing data reports to ensure that the user knows whether they are studying complete or partial data. Institutions using the beta version of the program have reported considerable satisfaction with its functionality and have also offered a variety of design and interface enhancements. The anticipated release date for Curriculum Analysis Tools (CATs) is the first quarter of 1995.

Fowler J, Wheeler DA, Camerino PW, Bat O, Burch PE. Development of a faculty research interest resource. Proc AMIA Fall Symp 1996:363-72. We have developed a faculty research interests resource by "mining" MEDLINE for relationships that are no t directly queryable through the normal MEDLINE schema. Faculty citations are retrieved and World-Wide Web pages built to interconnect authors, their citations, and the MeSH terms that have been assigned to these citations. The design and development of t he resource are discussed and examples of the results illustrated. Copyright by and reprinted with permission of the American Medical Informatics Association.

Kanter SL. Using the UMLS to represent medical curriculum content. Proc Annu Symp Comput Appl Med Care 1993:762-5. Recent innovations in medical education have highlighted the need for faculty involved with the curriculum to carefully examine curricular content with goals of detecting omissions and unwanted redundancies of subject matter, adding and integrating new content, and deleting old content. A number of medical schools have attempted to deal with these issues by developing a database of curricular content information, most often using faculty- or student-selected keywords to represent each unit of instruction. However, several problems have been identified with this method, and achieving the goals mentioned above remains a formidable task. This paper outlines an alternative method that uses the resources of the UMLS to characterize a medical concept by the semantic types of its co-occurring terms. This approach can facilitate achievement of the aforementioned goals. Copyright by and reprinted with permission of the American Medical Informatics Association.

Kanter SL, Miller RA, Tan M, Schwartz J. Using POSTDOC to recognize biomedical concepts in medical school curricular documents. Bull Med Libr Assoc 1994 Jul;82(3):283-7. Recognition of the biomedical concepts in a document is prerequisite to further processing of the document: medical educators examine curricular documents to discover the coverage of certain topics, detect unwanted redundancies, integrate new content, and delete old content; and clinicians are concerned with terms in patient medical records for purposes ranging from creation of an electronic medical record to identification of medical literature relevant to a particular case. POSTDOC (POSTprocessor of DOCuments) is a computer application that (1) accepts as input a free-text, ASCII-formatted document and uses the Unified Medical Language System (UMLS) Metathesaurus to recognize relevant main concept terms; (2) provides term co-occurrence data and thus is able to identify potentially increasing correlations among concepts within the document; and (3) retrieves references from MEDLINE files based on user identification of relevant subjects. This paper describes a formative evaluation of POSTDOC's ability to recognize UMLS Metathesaurus biomedical concepts in medical school lecture outlines. The precision and recall varied over a wide range and were deemed not yet acceptable for automated creation of a database of concepts from curricular documents. However, results were good enough to warrant further study and continued system development. Copyright by and reprinted with permission of the Medical Library Association.

Zucker J, Chase H, Molholt P, Bean C, Kahn RM. A comprehensive strategy for designing a web-based medical curriculum. Proc AMIA Fall Symp 1996:41-5. In preparing for a full featured online curriculum, it is necessary to develop scaleable strategies for software design that will support the pedagogical goals of the curriculum and which will address the issues of acquisition and updating of materials, of robust content-based linking, and of integration of the online materials into other methods of learning. A complete online curriculum, as distinct from an individual computerized module, must provide dynamic updating of both content and structure and an easy pathway from the professor's notes to the finished online product. At the College of Physicians and Surgeons, we are developing such strategies including a scripted text conversion process that uses the Hypertext Markup Language (HTML) as structural markup rather than as display markup, automated linking by the use of relational databases and the Unified Medical Language System (UMLS), integration of text, images, and multimedia along with interface designs which promote multiple contexts and collaborative study. Copyright by and reprinted with permission of the American Medical Informatics Association.

Return to title page | Return to table of contents

Baud RH, Rassinoux A-M, Lovis C, Wagner J, Griesser V, Michel P-A, Scherrer J-R. Knowledge sources for natural language processing. Proc AMIA Fall Symp 1996:70-4. This paper aims at reviewing the problem of feeding Natural Language Processing (NLP) too ls with convenient linguistic knowledge in the medical domain. A syntactic approach lacks the potential to solve a number of typical situations with ambiguities and is clearly insufficient for quality treatment of natural language. On the other hand, a co nceptual approach relies on some modelling of the domain, of which the elaboration is a long-term process and where the ultimate solutions are far from being recognised and universally accepted. In-between is the beauty of the compromise. How can we signi ficantly improve the coverage of linguistic knowledge in the years to come? Copyright by and reprinted with permission of the American Medical Informatics Association.

Evans DA, Brownlow ND, Hersh WR, Campbell EM. Automating concept identification in the eletronic medical record: an experiment in extracting dosage information. Proc AMIA Fall Symp 1996:388-92. We discuss the development and evaluation of an automated procedure for extracting drug-dosage information from clinical narratives. The process was developed rapidly using existing technology and resources, including categories of terms from UMLS96. Evaluations over a large training and smaller test set of medical records demonstrate an approximately 80% rate of exact and partial matches on target phrases, with few false positives and a modest rate of false negatives. The results suggest a strategy for automating general concept identification in electronic medical records. Copyright by and reprinted with permission of the American Medical Informatics Association.

Evans DA, Chute CG, Handerson SK, Yang Y, Monarch IA, Hersh WR. 'Latent semantics' as a basis for managing variation in medical terminologies. Medinfo 1992;7(Pt 2):1462-8. The authors are exploiting a version of latent semantic indexing as a general solution to the problem of managing language variation. They treat medical terms as the 'documents' to be retrieved by natural-language expressions of concepts, taken as 'queries'. In experiments, they have focused on (1) establishing a basis for the decomposition of concepts (terms) by lexical items and (2) exploiting existing medical thesauri to create Lexical-Item x Term spaces. They have demonstrated the ability to interpret natural-language statements of medical findings in multiple medical terminologies simultaneously (e.g. INTERNIST-I/QMR, PTXT, and NLM/UMLS META-1 vocabularies) and also to derive concept-relation spaces from collections of terms in the NLM/UMLS Metathesaurus (META-1). The power of this approach is that it does not depend on detailed semantic representations or on word-for-word correspondences among terms and that multiple vocabularies can be represented side-by-side.

Johnson SB, Aguirre A, Peng P, Cimino J. Interpreting natural language queries using the UMLS. Proc Annu Symp Comput Appl Med Care 1993:294-8. This paper describes AQUA (A QUery Analyzer), the natural language front end of a prototype information retrieval system. AQUA translates a user's natural language query into a representation in the Conceptual Graph formalism. The graph is then used by subsequent components to search various resources such as databases of the medical literature. The focus of the parsing method is on semantics rather than syntax, with semantic restrictions being provided by the UMLS Semantic Net. The intent of the approach is to provide a method that can be emulated easily in applications that require simple natural language interfaces. Copyright by and reprinted with permission of the American Medical Informatics Association.

Joubert M, Fieschi M, Robert JJ. A conceptual model for information retrieval with UMLS. Proc Annu Symp Comput Appl Med Care 1993:715-9. Information retrieval in large information databases is a non-deterministic process which needs a sequence of search steps generally. One of the main problems to which the end-users are faced is to parse efficiently their questions into the query language that the computer systems allow. Conceptual graphs were initially designed for natural language analysis and understanding. Due to their closeness to semantic networks, their expressiveness is powerful enough to be applied to knowledge representation and use by computer systems. This work demonstrates that conceptual graphs are a suitable means to model the end-users queries on the basis of the thesaurus and the semantic network of the UMLS project. Copyright by and reprinted with permission of the American Medical Informatics Association.

Joubert M, Fieschi M, Robert JJ, Tafazzoli A. Users conceptual views on medical information databases. Int J Biomed Comput 1994 Oct;37(2):93-104. As information databases we consider all the kinds of information repositories that are handled by computer systems. When querying very large information databases, the end-users are often faced with the problem to parse their questions efficiently into the query languages of the computer systems. Conceptual graphs were initially designed for natural language analysis and understanding. Due to their closeness to semantic networks, their expressiveness is powerful enough to be applied to knowledge representation and use by computer systems. This work demonstrates that conceptual graphs are a suitable means to model both the information in patient databases and the queries to these databases, and that operations on graphs can compute the pattern matching process needed to provide the answers. A prototype that exploits this model is presented. Experiments have been made with the material furnished by the Unified Medical Language System project (version 2, 1992) of the National Library of Medicine, USA.

Joubert M, Robert J-J, Miton F, Fieschi M. The project ARIANE: conceptual queries to information databases. Proc AMIA Fall Symp 1996:378-82. As information databases we consider all the collections of data records indexed by key-words, stored and delivered by computer systems. In previous research works we demonstrated the interest to design a conceptual model, in the conceptual graphs formalism, and to implement a computational model for information retrieval in large information databases. These models are based on the UMLS knowledge sources. This paper reminds briefly these models and describes tests done in querying a patients database and a bibliographical database. Copyright by and reprinted with permission of the American Medical Informatics Association.

Lamiell JM, Wojcik ZM, Isaacks J. Computer auditing of surgical operative reports written in English. Proc Annu Symp Comput Appl Med Care 1993:269-73. We developed a script-based scheme for automated auditing of natural language surgical operative reports. Suitable operations (appendectomy and breast biopsy) were selected, then audit criteria and operation scripts conforming with our audit criteria were developed. Our LISP parser was context and expectation sensitive. Parsed sentences were represented by semigraph structures and placed in a textual database to improve efficiency. Sentence ambiguities were resolved by matching the narrative textual database to the script textual database and employing the Uniform Medical Language System (UMLS) Knowledge Sources. All audit criteria questions were successfully answered for typical operative reports by matching parsed audit questions to the textual database. Copyright by and reprinted with permission of the American Medical Informatics Association.

McCray AT. Extending a natural language parser with UMLS knowledge. Proc Annu Symp Comput Appl Med Care 1991:194-8. Over the past several years our research efforts have been directed toward the identification of natural language processing methods and techniques for improving access to biomedical information stored in computerized form. To provide a testing ground for some of these ideas we have undertaken the development of SPECIALIST, a prototype system for parsing and accessing biomedical text. The system includes linguistic and biomedical knowledge. Linguistic knowledge involves rules and facts about the grammar of the language. Biomedical knowledge involves rules and facts about the domain of biomedicine. The UMLS knowledge sources, Meta-1 and the Semantic Network, as well as the UMLS test collection, have recently contributed to the development of the SPECIALIST system. Copyright by and reprinted with permission of the American Medical Informatics Association.

McCray AT, Aronson AR, Browne AC, Rindflesch TC, Razi A, Srinivasan S. UMLS knowledge for biomedical language processing. Bull Med Libr Assoc 1993 Apr;81(2):184-94. This paper describes efforts to provide access to the free text in biomedical databases. The focus of the effort is the development of SPECIALIST, an experimental natural language processing system for the biomedical domain. The system includes a broad coverage parser supported by a large lexicon, modules that provide access to the extensive Unified Medical Language System (UMLS) Knowledge Sources, and a retrieval module that permits experiments in information retrieval. The UMLS Metathesaurus and Semantic Network provide a rich source of biomedical concepts and their interrelationships. Investigations have been conducted to determine the type of information required to effect a map between the language of queries and the language of relevant documents. Mappings are never straightforward and often involve multiple inferences. Copyright by and reprinted with permission of the Medical Library Association.

Murphy SN, Barnett GO. Achieving automated narrative text interpretation using phrases in the electronic medical record. Proc AMIA Fall Symp 1996:532-6. Stereotypic phrases are used by clinicians throughout the medical record, as seen in an analysis of our COSTAR medical record database. These phrases are often associated with an underlying semantic concept; for example the phrase CLEAR LUNGS may be linked with the concept "normal lung exam" for a particular physician. Formalizing these associations with concepts from the UMLS using the MEDPhrase application allowed us to automate interpretation of narrative text within our electronic medical record. Copyright by and reprinted with permission of the American Medical Informatics Association.

Peng P, Aguirre A, Johnson SB, Cimino JJ. Generating MEDLINE search strategies using a librarian knowledge-based system. Proc Annu Symp Comput Appl Med Care 1993:596-600. We describe a librarian knowledge-based system that generates a search strategy from a query representation based on a user's information need. Together with the natural language parser AQUA, the system functions as a human/computer interface, which translates a user query from free text into a BRS Onsite search formulation, for searching the MEDLINE bibliographic database. In the system, conceptual graphs are used to represent the user's information need. The UMLS Metathesaurus and Semantic Net are used as the key knowledge sources in building the knowledge base. Copyright by and reprinted with permission of the American Medical Informatics Association.

Pietrzyk PM. Free text analysis. Int J Biomed Comput 1995 Apr;39(1):139-44. In the context of hospital information systems (HIS) medical free text analysis is reviewed with respect to current automated approaches to literature retrieval, case retrieval and fact retrieval from textual data in the patient record. The Unified Medical Language System (UMLS) project has enormously stimulated current research. It is expected that UMLS knowledge sources and SNOMED III (which need a translation into other languages as soon as possible) as well as the conceptual graphs formalism, could become standards to utilize free text information contained in HIS databases.

Sager N, Lyman M, Bucknall C, Nhan N, Tick LJ. Natural language processing and the representation of clinical data. J Am Med Inform Assoc 1994 Mar-Apr;1(2):142-60. OBJECTIVE: Develop a representation of clinical observations and actions and a method of processing free-text patient documents to facilitate applications such as quality assurance. DESIGN: The Linguistic String Project (LSP) system of New York University utilizes syntactic analysis, augmented by a sublanguage grammar and an information structure that are specific to the clinical narrative, to map free-text documents into a database for querying. MEASUREMENTS: Information precision (I-P) and information recall (I-R) were measured for queries for the presence of 13 asthma-health-care quality assurance criteria in a database generated from 59 discharge letters. RESULTS: I-P, using counts of major errors only, was 95.7% for the 28-letter training set and 98.6% for the 31-letter test set. I-R, using counts of major omissions only, was 93.9% for the training set and 92.5% for the test set. Copyright by and reprinted with permission of the American Medical Informatics Association.

Satomura Y, do Amaral MB. Automated diagnostic indexing by natural language processing. Med Inf (Lond) 1992 Jul-Sep;17(3):149-63.

Volot F, Zweigenbaum P, Bachimont B, Ben Said M, Bouaud J, Fieschi M, Boisvieux JF. Structuration and acquisition of medical knowledge. Using UMLS in the conceptual graph formalism. Proc Annu Symp Comput Appl Med Care 1993:710-4. The use of a taxonomy, such as the concept type lattice (CTL) of Conceptual Graphs, is a central structuring piece in a knowledge-based system. The knowledge it contains is constantly used by the system, and its structure provides a guide for the acquisition of other pieces of knowledge. We show how UMLS can be used as a knowledge resource to build a CTL and how the CTL can help the process of acquisition for other kinds of knowledge. We illustrate this method in the context of the MENELAS natural language understanding project. Copyright by and reprinted with permission of the American Medical Informatics Association.

Return to title page | Return to table of contents

Aronson AR. The effect of textual variation on concept based information retrieval. Proc AMIA Fall Symp 1996:373-7. Accounting for textual variation in the documents and queries processed by information retrieval systems is considered essential for achieving good retrieval. Recent research has called into question several of the techniques used to support this endeavor. This paper reports on experiments with a concept based information retrieval system which relies on a program called MetaMap to account for textual variation in the process of mapping biomedical text such as MEDLINE bibliographic citations to the UMLS Metathesaurus. The experiments confirm that the effort expended in handling textual variation is well-spent for at least one type of concept based information retrieval. Copyright by and reprinted with permission of the American Medical Informatics Association.

Chute CG, Yang Y. An evaluation of concept based latent semantic indexing for clinical information retrieval. Proc Annu Symp Comput Appl Med Care 1992:639-43. Latent Semantic Indexing (LSI) of surgical case report text using ICD-9-CM procedure codes and index terms was evaluated. The precision-recall performance of this two-step matrix retrieval process was compared with the SMART Document retrieval system, surface word matching, and humanly assigned procedure codes. Human coding performed best, two-step LSI did less well than surface matching or SMART. This evaluation suggests that concept-based LSI may be compromised by its two-stage nature and its dependence upon a robust term database linked to main concepts. However, the potential elegance of partial- credit concept matching merits the continued evaluation of LSI for clinical case retrieval. Copyright by and reprinted with permission of the American Medical Informatics Association.

Chute CG, Yang Y, Evans DA. Latent Semantic Indexing of medical diagnoses using UMLS semantic structures. Proc Annu Symp Comput Appl Med Care 1991:185-9. The relational files within the UMLS Metathesaurus contain rich semantic associations to main concepts. We invoked the technique of Latent Semantic Indexing to generate information matrices based on these relationships and created semantic vectors using singular value decomposition. Evaluations were made on the complete set and subsets of Metathesaurus main concepts with the semantic type Disease or Syndrome. Real number matrices were created with main concepts, lexical variants, synonyms, and associated expressions. Ancestors, children, siblings, and related terms were added to alternative matrices, preserving the hierarchical direction of the relation as the imaginary component of a complex number. Preliminary evaluation suggests that this technique is robust. A major advantage is the exploitation of semantic features which derive from a statistical decomposition of UMLS structures, possibly reducing dependence on the tedious construction of semantic frames by humans. Copyright by and reprinted with permission of the American Medical Informatics Association.

Doszkocs TE, Sass RK. An associative semantic network for machine-aided indexing, classification and searching. In: Fidel R, Kwasnik BH, Smith PJ, editors., Proceedings of 3rd ASIS SIG/CR Classification Research Workshop; 1992 Oct 25; Pittsburgh, PA. Medford (NJ): Learned Information; 1993. p. 15-35. Capturing and exploiting textual database associations has played a pivotal role in the evolution of automated information systems. A variety of statistical, linguistic and artificial intelligence approaches have been described in the literature. Many of these R&D concepts and techniques are now being incorporated into commercially available search systems and services. This paper discusses prior work and reports on research in progress aimed at creating and utilizing a global semantic associative database, AURA (Associative User Retrieval Aid), to facilitate machine-assisted indexing, classification and searching in the large-scale information processing environment of NLM's core bibliographic databases, MEDLINE and CATLINE. AURA is a semantic network of over two million natural language phrases derived from more than a million MEDLINE titles. These natural language phrases are associatively linked to NLM's MeSH (Medical Subject Headings) and UMLS (Unified Medical Language System) metathesaurus controlled vocabulary and classification resources. Reproduced with permission of the American Society for Information Science.

Harbourt AM, Syed EJ, Hole WT, Kingsland LC 3d. The ranking algorithm of the Coach browser for the UMLS metathesaurus. Proc Annu Symp Comput Appl Med Care 1993:720-4. This paper presents the novel ranking algorithm of the Coach Metathesaurus browser which is a major module of the Coach expert search refinement program. An example shows how the ranking algorithm can assist in creating a list of candidate terms useful in augmenting a suboptimal Grateful Med search of MEDLINE. Copyright by and reprinted with permission of the American Medical Informatics Association.

Hersh WR. Evaluation of Meta-1 for a concept-based approach to the automated indexing and retrieval of bibliographic and full-text databases. Med Decis Making 1991 Oct-Dec;11(4 Suppl):S120-4. SAPHIRE is a concept-based approach to information retrieval in the biomedical domain. Indexing and retrieval are based on a concept-matching algorithm that processes free text to identify concepts and map them to their canonical form. This process requires a large vocabulary containing a breadth of medical concepts and a diversity of synonym forms, which is provided by the Meta-1 vocabulary from the Unified Medical Language System Project of the National Library of Medicine. This paper describes the use of Meta-1 in SAPHIRE and an evaluation of both entities in the context of an information retrieval study. Copyright 1991 Hanley and Belfus.

Hersh WR, Hickam DH. A comparative analysis of retrieval effectiveness for three methods of indexing AIDS-related abstracts. In: Proceedings of the 54th Annual Meeting of the American Society for Information Science; 1991 Oct 27-31; Washington, DC. Medford (NJ): Learned Information; 1991. p. 211-25. SAPHIRE is an experimental information retrieval system featuring concept-based automated indexing and natural language input, relevance-based retrieval. This experiment evaluates SAPHIRE'S indexing capability in a three-way comparison of retrieval effectiveness versus traditional MEDLINE indexing and title-abstract word indexing. SAPHIRE's recall and precision values are inferior to both methods in a MEDLINE-style Boolean searching environment, although additional experiments suggest that better retrieval performance is obtained when SAPHIRE's natural language input and relevance ranking features are used.

Hersh WR, Hickam DH. A comparison of retrieval effectiveness for three methods of indexing medical literature. Am J Med Sci 1992 May;303(5):292-300. Conventional approaches to indexing medical literature include the human assignment of terms from a controlled vocabulary, such as MeSH, or the computer assignment of all words in the title and abstract as indexing terms. Human indexing suffers from inconsistency, while word-based indexing suffers from the multiple meanings of words. SAPHIRE is a computer program designed to provide indexing using controlled terms that are assigned by computer, based on their occurrence in the title and abstract. In this first evaluation of SAPHIRE, the authors compared the retrieval performance of the three indexing approaches--human-based MEDLINE with text words; machine-based SAPHIRE with text words; and text words only--for searches by measuring recall and precision for each search using a test collection of 200 abstracts. The abstracts were judged by human reviewers for relevance as applied to 12 literature queries. The results suggest that text word indexing is more effective than indexing with MeSH terms. SAPHIRE's indexing performance was slightly inferior but the program has other advantageous features. Copyright Southern Society for Clinical Investigation; published by Lippincott-Raven Publishers.

Hersh WR, Hickam DH. A comparison of two methods for indexing and retrieval from a full-text medical database. Med Decis Making 1993 Jul-Sep;13(3):220-6. The objective of this study was to compare how well medical professionals are able to retrieve relevant literature references using two computerized literature searching systems that provide automated (non-human) indexing of content. The first program was SAPHIRE, which features concept-based indexing, free-text input of queries, and ranking of retrieved references for relevance. The second program was SWORD, which provides single-word searching using Boolean operators (AND, OR). Sixteen fourth-year medical students participated in the study. The database for searching was six volumes from the 1989 Yearbook series. The queries were ten questions generated on teaching rounds. All subjects searched half the queries with each program. After the searching, each subject was given a questionnaire about prior experience and preferences about the two programs . Recall (proportion of relevant articles retrieved from the database) and precision (proportion of relevant articles in the retrieved set) were measured for each search done by each participant. Mean recall was 57.6% with SAPHIRE; it was 58.6% with SWORD. Precision was 48.1% with SAPHIRE vs 57.6% with SWORD. Each program was rated easier to use than the other by half of the searchers, and preferences were associated with better searching performance for that program. Both systems achieved recall and precision comparable to existing systems and may represent effective alternatives to MEDLINE and other retrieval systems based on human indexing for searching medical literature. Copyright 1993 Hanley and Belfus.

Hersh WR, Hickam DH. An evaluation of interactive boolean and natural language searching with an online medical textbook. J Am Soc Inf Sci 1995 Aug;46(7):478-89. Few studies have compared the interactive use of Boolean and natural language searching systems. We studied the use of three retrieval systems by senior medical students searching on queries generated by actual physicians in a clinical setting. The searchers were randomized to search on two of three different retrieval systems: a Boolean system, a word-based natural language system, and a concept-based natural language system. Our results showed no statistically significant differences in recall or precision among the three systems. Likewise, we found no user preference for any system over the others. In the course of this study we did find, however, a number of problems with traditional measures of retrieval evaluation when applied to the interactive search setting. Copyright 1995 by and reproduced with permission of John Wiley and Sons.

Hersh WR, Hickam DH, Leone TJ. Words, concepts, or both: optimal indexing units for automated information retrieval. Proc Annu Symp Comput Appl Med Care 1992:644-8. What is the best way to represent the content of documents in an information retrieval system? This study compares the retrieval effectiveness of five different methods for automated (machine-assigned) indexing using three test collections. The consistently best methods are those that use indexing based on the words that occur in the available text of each document. Methods used to map text into concepts from a controlled vocabulary showed no advantage over the word-based methods. This study also looked at an approach to relevance feedback which showed benefit for both word-based and concept-based methods. Copyright by and reprinted with permission by the American Medical Informatics Association.

Jenders RA, Estey G, Martin M, Hamilton G, Ford-Carleton P, Thompson BT, Oliver DE, Eccles R, Barnett GO, Zielstorff RD, et al. Indexing guidelines: applications in use of pulmonary artery catheters and pressure ulcer prevention. Proc Annu Symp Comput Appl Med Care 1994:802-6. In a busy clinical environment, access to knowledge must be rapid and specific to the clinical query at hand. This requires indices which support easy navigation within a knowledge source. We have developed a computer-based tool for trouble-shooting pulmonary artery waveforms using a graphical index. Preliminary results of domain knowledge tests for a group of clinicians exposed to the system (N = 33) show a mean improvement on a 30-point test of 5.33 (p < 0.001) compared to a control group (N = 19) improvement of 0.47 (p = 0.61). Survey of the experimental group (N = 25) showed 84% (p = 0.001) found the system easy to use. We discuss lessons learned in indexing this domain area to computer-based indexing of guidelines for pressure ulcer prevention. Copyright by and reprinted with permission of the American Medical Informatics Association.

Merz RB, Cimino C, Barnett GO, Blewett DR, Gnassi JA, Grundmeier R, Hassan L. A pre-search estimation algorithm for MEDLINE strategies with qualifiers. Proc Annu Symp Comput Appl Med Care 1994:910-4. Inexperienced users of online medical databases often have difficulty formulating their queries. Systems designed to assist them usually do not estimate how effective the initial search strategy will be before performing an actual search. Consequently, the search may find an overwhelming number of citations, or retrieve nothing at all. We have developed an estimation algorithm to predict the outcome of a MEDLINE search. The portion of the algorithm described here estimates retrieval for strategies containing qualifiers. In test searches, the estimate reduced the trial-and-error of strategy formulation. However, the accuracy of the estimate fell short of expectations. Our results show that pre-search estimation for strategies with qualifiers cannot be performed effectively with only the occurrence data that is presently available. They further imply that automated search intermediaries can benefit from medical knowledge which expresses the relationships that exist between terms. Copyright by and reprinted with permission of the American Medical Informatics Association.

Richwine PW. A study of MeSH and UMLS for subject searching in an online catalog. Bull Med Libr Assoc 1993 Apr;81(2):229-33.

Rindflesch TC, Aronson AR. Semantic processing in information retrieval. Proc Annu Symp Comput Appl Med Care 1993:611-5. Intuition suggests that one way to enhance the information retrieval process would be the use of phrases to characterize the contents of text. A number of researchers, however, have noted that phrases alone do not improve retrieval effectiveness. In this paper we briefly review the use of phrases in information retrieval and then suggest extensions to this paradigm using semantic information. We claim that semantic processing, which can be viewed as expressing relations between the concepts represented by phrases, will in fact enhance retrieval effectiveness. The availability of the UMLS domain model, which we exploit extensively, significantly contributes to the feasibility of this processing. Copyright by and reprinted with permission of the American Medical Informatics Association.

Robert JJ, Joubert M, Nal L, Fieschi M. A computational model of information retrieval with UMLS. Proc Annu Symp Comput Appl Med Care 1994:167-71. A high level representation of data would clarify the complex collection of medical concepts, terms and relationships derived from standard classifications that the Unified Medical Language System contains. A conceptual model is described which represents the data structure. A second objective of this conceptual model is to provide users with the capability to build queries to information databases as easily as possible on the basis of this data structure. The methods used to build this model are semantic networks and conceptual graphs. The object-oriented computational model which implements this conceptual model is detailed. It reuses part of the generic C++ classes of the National Institutes of Health library. New classes are added to this library to implement the needed functionalities. Copyright by and reprinted with permission of the American Medical Informatics Association.

Yang Y, Chute CG. Words or concepts: the features of indexing units and their optimal use in information retrieval. Proc Annu Symp Comput Appl Med Care 1993:685-9. Words or Concepts, which are a better choice for indexing the contents of documents? The answer depends on what method is used for retrieval. This paper studies the effects of using canonical concepts versus document words in different retrieval systems with a testing collection of MEDLINE documents. In our tests, for a retrieval system which does not use any human knowledge, using words yielded better retrieval results, while using concepts suffered from a vocabulary difference between canonical expressions of concepts and non-canonical words in queries or documents. For a system which depends on the UMLS synonym set for a mapping from queries or documents to canonical concepts, the retrieval results were slightly better than the case of not using the synonyms, but still worse than the systems using words. For the systems which automatically learn empirical connections between words and concepts from examples in the testing collection, the vocabulary problem was effectively solved, and the results of using concepts were competitive or better, compared to those using words. Copyright by and reprinted with permission of the American Medical Informatics Association.

Return to title page | Return to table of contents

Hersh W, Hickam D. Information retrieval in medicine: the SAPHIRE experience. Medinfo 1995;8(Pt 2):1433-7. Information retrieval systems are proliferating in biomedical settings, but many problems in indexing, retrieval, and evaluation of systems continue to exist. The SAPHIRE Project was undertaken to seek solutions to these problems. This paper summarizes the evaluation studies that have been done with SAPHIRE, highlighting the lessons learned and laying out the challenges ahead to all medical information retrieval efforts.

Hersh W, Hickam DH, Haynes RB, McKibbon KA. Evaluation of SAPHIRE: an automated approach to indexing and retrieving medical literature. Proc Annu Symp Comput Appl Med Care 1991:808-12. An analysis of SAPHIRE, an experimental information retrieval system featuring automated indexing and natural language retrieval, was performed on MEDLINE references using data previously generated for a MEDLINE evaluation. Compared with searches performed by novice and expert physicians using MEDLINE, SAPHIRE achieved comparable recall and precision. While its combined recall and precision performance did not equal the level of librarians, SAPHIRE did achieve a significantly higher level of absolute recall. SAPHIRE has other potential advantages over existing MEDLINE systems. Its natural language interface does not require knowledge of MeSH, and it provides relevance ranking of retrieved references. Copyright by and reprinted with permission of the American Medical Informatics Association.

Hersh W, Leone TJ. The SAPHIRE server: a new algorithm and implementation. Proc Annu Symp Comput Appl Med Care 1995:858-62. SAPHIRE is an experimental information retrieval system implemented to test new approaches to automated indexing and retrieval of medical documents. Due to limitations in its original concept-matching algorithm, a modified algorithm has been implemented which allows greater flexibility in partial matching and different word order within concepts. With the concomitant growth in client-server applications and the Internet in general, the new algorithm has been implemented as a server that can be accessed via other applications on the Internet. Copyright by and reprinted with permission of the American Medical Informatics Association.

Hersh WR, Greenes RA. SAPHIRE--an information retrieval system featuring concept matching, automatic indexing, probabilistic retrieval, and hierarchical relationships. Comput Biomed Res 1990 Oct;23(5):410-25. SAPHIRE (Semantic and Probabilistic Heuristic Information Retrieval Environment) is an experimental computer program designed to test new techniques in automated information retrieval in the biomedical domain. A main feature of the program is a concept-finding algorithm that processes free text to find canonical concepts. The algorithm is designed to handle a wide variety of synonyms and convert them to canonical form. This allows natural language to be used for query input and also serves as the basis for a new approach to automatic indexing based on a combination of probabilistic and linguistic methods. Copyright 1990 Academic Press.

Hersh WR, Hickam D. Information retrieval in medicine. The SAPHIRE experience. J Am Soc Inf Sci 1995 Dec;46(10):743-7. Information retrieval systems are being used increasingly in biomedical settings, but many problems still exist in indexing, retrieval, and evaluation. The SAPHIRE Project was undertaken to seek solutions for these problems. This article summarizes the evaluation studies that have been done with SAPHIRE, highlighting the lessons learned and laying out the challenges ahead to all medical information retrieval efforts. Copyright 1995 by and reproduced with permission of John Wiley and Sons.

Hersh WR, Hickam DH, Haynes RB, McKibbon KA. A performance and failure analysis of SAPHIRE with a MEDLINE test collection. J Am Med Inform Assoc 1994 Jan-Feb;1(1):51-60. OBJECTIVE: Assess the performance of the SAPHIRE automated information retrieval system. DESIGN: Comparative study of automated and human searching of a MEDLINE test collection. MEASUREMENTS: Recall and precision of SAPHIRE were compared with those attributes of novice physicians, expert physicians, and librarians for a test collection of 75 queries and 2,334 citations. Failure analysis assessed the efficacy of the Metathesaurus as a concept vocabulary; the reasons for retrieval of nonrelevant articles and nonretrieval of relevant articles; and the effect of changing the weighting formula for relevance ranking of retrieved articles. RESULTS: Recall and precision of SAPHIRE were comparable to those of both physician groups, but less than those of librarians. CONCLUSION: The current version of the Metathesaurus, as utilized by SAPHIRE, was unable to represent the conceptual content of one-fourth of physician-generated MEDLINE queries. The most likely cause for retrieval of nonrelevant articles was the presence of some or all of the search terms in the article, with frequencies high enough to lead to retrieval. The most likely cause for nonretrieval of relevant articles was the absence of the actual terms from the query, with synonyms or hierarchically related terms present instead. There were significant variations in performance when SAPHIRE's concept-weighing formulas were modified. Copyright by and reprinted with permission of the American Medical Informatics Association.

Hersh WR, Pattison-Gordon E, Evans DA. Adaption of Meta-1 for SAPHIRE, a general purpose information retrieval system. Proc Annu Symp Comput Appl Med Care 1990:156-60. The Unified Medical Language Systems Project (UMLS) of the National Library of Medicine (NLM) has produced Meta-1, a metathesaurus featuring over 40000 concepts and their synonyms from several commonly-used medical vocabularies. The authors have adapted Meta-1 for use in SAPHIRE, an information retrieval system featuring automated indexing and probabilistic retrieval. They have also built DESYGNS, a semantic network system designed to contain Meta-1 concepts along with their sematic relationships. Future plans include improved concept matching, improved indexing capability, and the use of semantic relationships. Copyright by and reprinted with permission of the American Medical Informatics Association.

Jachna JS, Powsner SM, Miller PL. Augmenting GRATEFUL MED with the UMLS Metathesaurus: an initial evaluation. Bull Med Libr Assoc 1993 Jan;81(1):20-8. Clinicians in patient care settings must be able to locate relevant recent medical literature quickly. Computer literacy is increasing, but many clinicians remain ill at ease with search strategies for online bibliographic databases. As part of an ongoing project to simplify the translation of clinical questions into effective searches, a Unified Medical Language System (UMLS) Metathesaurus tool was designed. The authors compared bibliographic searches by relatively inexperienced users employing only GRATEFUL MED to searches done using GRATEFUL MED augmented with this tool. The users were clinicians examining questions related to a test set of clinical cases. Their problems and successes were monitored; the results suggest that the addition of a thesaurus helps resolve some problems in citation retrieval that trouble the novice user. By helping the user understand indexing terms in context and by reducing typing errors, a thesaurus can help provide an intelligent solution to lexical mismatches in bibliographic retrieval. Copyright by and reprinted with permission of the Medical Library Association.

Joubert M, Riouall D, Fieschi M, Botti G, Proudhon H. Contextual aids for medical information retrieval. Medinfo 1992;7(Pt 2):1522-7. The paper presents the first results of a work that the authors began in 1990. This research work concerns the representation and use of medical concepts in medical information management. Customisation of the user interfaces and management of users contexts are addressed. These capabilities enhance the material furnished by UMLS.

Kingsland LC 3d, Harbourt AM, Syed EJ, Schuyler PL. Coach: applying UMLS knowledge sources in an expert searcher environment. Bull Med Libr Assoc 1993 Apr;81(2):178-83. With the development of the Unified Medical Language System (UMLS) Knowledge Sources, the National Library of Medicine (NLM) has produced a resource of great potential for improving the searching of MEDLINE. The Coach expert searcher system, an inhouse research project at NLM, is designed to help users of the GRATEFUL MED front-end software improve MEDLINE search and retrieval capabilities. This paper describes the Coach program, the knowledge sources it uses, and some of the ways it applies elements of the UMLS Metathesaurus to facilitate access to the biomedical literature. Copyright by and reprinted with permission of the Medical Library Association.

Kingsland LC 3d, Syed EJ, Lindberg DA. Coach: an expert searcher program to assist Grateful Med users searching MEDLINE. Medinfo 1992;7(Pt 1):382-6. The Coach expert searcher system is being developed as an in-house research project at the USA National Library of Medicine (NLM). It brings to bear the Unified Medical Language System (UMLS) Metathesaurus and ten additional knowledge sources to assist Grateful Med users seeking help in improving retrieval from the ELHILL mainframe retrieval engine. Initial work has concentrated on MEDLINE and its backfiles, and in particular on the problem of null retrieval. Subsequent functions will address searching problems relevant to other MEDLARS files. Coach offers true ELHILL multifile searches, ELHILL sorting of output citations before download, and considerable flexibility in the print format of downloaded results. It works interactively with the user, with Grateful Med and with ELHILL.

Return to title page | Return to table of contents

Cimino JJ. Linking patient information systems to bibliographic resources. Methods Inf Med 1996 Jun;35(2):122-6. Medical informatics researchers have explored a number of ways to integrate medical information resources into patient care systems. Particular attention has been given to the integration of on-line bibliographic resources. This paper presents an information model which breaks down the integration task into three components, each of which answers a question: what is the user's question?, where can the answer be found?, and how is the retrieval strategy composed? Twelve experimental systems are reviewed and their methods for addressing one or more of these questions are described.

Cimino JJ, Aguirre A, Johnson SB, Peng P. Generic queries for meeting clinical information needs. Bull Med Libr Assoc 1993 Apr;81(2):195-206. This paper describes a model for automated information retrieval in which questions posed by clinical users are analyzed to establish common syntactic and semantic patterns. The patterns are used to develop a set of general-purpose questions called generic queries. These generic queries are used in responding to specific clinical information needs. Users select generic queries in one of two ways. The user may type in questions, which are then analyzed, using natural language processing techniques, to identify the most relevant generic query; or the user may indicate patient data of interest and then pick one of several potentially relevant questions. Once the query and medical concepts have been determined, an information source is selected automatically, a retrieval strategy is composed and executed, and the results are sorted and filtered for presentation to the user. This work makes extensive use of the National Library of Medicine's Unified Medical Language System (UMLS): medical concepts are derived from the Metathesaurus, medical queries are based on semantic relations drawn from the UMLS Semantic Network, and automated source selection makes use of the Information Sources Map. The paper describes research currently under way to implement this model and reports on experience and results to date. Copyright by and reprinted with permission of the Medical Library Association.

Cimino JJ, Johnson SB, Aguirre A, Roderer N, Clayton PD. The MEDLINE Button. Proc Annu Symp Comput Appl Med Care 1992:81-5. We have developed a computerized method for performing bibliographic searches directly from patient data involving five steps: 1) identifying specific patient data which raises a question in the mind of the user, 2) selection (from a list of generic questions) of a small number of questions which fit the selected patient data, 3) automated translation of the patient data into appropriate terms used for bibliographic indexing, 4) conversion of the question selected by the user into a search strategy, and 5) transfer of the search strategy to a search engine for a bibliographic database. We have modified the Columbia-Presbyterian Clinical Information System to experiment with this method. The first implementation converts patient diagnoses and procedures coded in ICD9-CM into Medical Subject Headings (MeSH) and searches Medline using BRS/Onsite. Challenges include development of a useful set of generic questions and translation from ICD9-CM to MeSH using the Unified Medical Language System (UMLS). Copyright by and reprinted with permission of the American Medical Informatics Association.

Cimino JJ, Johnson SB, Peng P, Aguirre A. From ICD9-CM to MeSH using the UMLS: a how-to guide. Proc Annu Symp Comput Appl Med Care 1993:730-4. One purpose of the Unified Medical Language System (UMLS) is to facilitate conversion of terms from one controlled medical vocabulary to another. We examined our ability to convert International Classification of Diseases, 9th Edition, Clinical Modifications (ICD9-CM) to Medical Subject Headings (MeSH) using the UMLS. We describe a method which mapped 30.4% of ICD9-CM to UMLS. Of these, 95.0% were linked to MeSH, of which translation was straightforward in 90.4%. We discuss the use of these translations for retrieval from MeSH-indexed databases, such as Medline. Copyright by and reprinted with permission of the American Medical Informatics Association.

Cimino JJ, Sideli RV. Using the UMLS to bring the library to the bedside. Med Decis Making 1991 Oct-Dec;11(4 Suppl):S116-20. This paper presents an algorithm that can be used to convert ICD9 terms to related MeSH terms. Preliminary evaluation indicates that together, the algorithm and the UMLS provide a reasonable resource for facilitating such conversions. Copyright 1991 Hanley and Belfus.

Lowe HJ, Walker WK, Polonkey Se, Jiang F, Vries JK, McCray AT. The image engine HPCC project: a medical digital library system using agent-based technology to create an integrated view of the electronic medical record. In: Adam N, Halem M, Yesha Y, editors. Proceedings of the 3rd Forum on Research and Technology Advances in Digital Libraries, ADL '96. Los Alamitos: IEEE Computer Society Press; 1996. p. 45-56.

Merz RB, Cimino C, Barnett GO, Blewett DR, Gnassi JA, Grundmeier R, Hassan L. Q & A: a query formulation assistant. Proc Annu Symp Comput Appl Med Care 1993:498-502. Inexperienced users of online medical databases often do not know how to formulate their queries for effective searches. Previous attempts to help them have provided some standard procedures for query formulation, but depend on the user to enter the concepts of a query properly so that the correct search strategy will be formed. Intelligent assistance specific to a particular query often is not given. Several systems do refine the initial strategy based on relevance feedback but usually do not make an effort to determine how well-formed a query is before actually performing the search. As part of the Interactive Query Workstation (IQW), we have developed an expert system, Questions and Answers (Q&A), that assists in formulating an initial strategy given concepts entered by the user and that determines if the strategy is well-formed, refining it when necessary. Copyright by and reprinted with permission of the American Medical Informatics Association.

Miller PL, Frawley SJ. Trade-offs in producing patient-specific recommendations from a computer-based clinical guideline: a case study. J Am Med Inform Assoc 1995 Jul-Aug;2(4):238-42. This case study explored 1) how much online clinical data is required to obtain patient-specific recommendations from a computer-based clinical practice guideline, 2) whether the availability of increasing amounts of online clinical data might allow a higher specificity of those recommendations, and 3) whether that increased specificity is necessarily desirable. The "quick reference guide" version of the guideline for acute postoperative pain management in adults, developed by the Agency for Health Care Policy and Research, was analyzed. Patient-specific data items that might be used to tailor the computer's output for a particular case were grouped into rough categories depending on how likely they were to be available online and how readily they might be determined from online clinical data. The patient-specific recommendations were analyzed to determine to what degree the amount of text produced depended on the online availability of different categories of data. An examination of example recommendations, however, illustrated that high specificity may not always be desirable. The study provides a concrete illustration of how the richness of online clinical data can affect patient-specific recommendations, and describes a number of related design trade-offs in converting a clinical guideline into an interactive, computer-based form. Copyright by and reprinted with permission of the American Medical Informatics Association.

Miller RA, Gieszczykiewicz FM, Vries JK, Cooper GF. CHARTLINE: providing bibliographic references relevant to patient charts using the UMLS Metathesaurus Knowledge Sources. Proc Annu Symp Comput Appl Med Care 1992:86-90. A successful medical informatics program helps its users to match their information needs as closely and efficiently as possible to the capabilities of the system. CHARTLINE is a computer program whose input is a free text, natural language patient chart in ASCII format. Using the UMLS Metathesaurus Knowledge Sources, CHARTLINE can suggest bibliographic references relevant to the patient case described in the chart. The program does not attempt to understand the natural language content of the chart. CHARTLINE only recognizes UMLS Metathesaurus Main Concept terms (or their synonyms) as they occur in the medical text, since those terms represent the tokens used to index the literature. The program depends on user feedback to determine which topics of a large number of potentially relevant subjects are of interest to the user. Copyright by and reprinted with permission of the American Medical Informatics Association.

Nelson SJ, Sherertz DD, Tuttle MS. Issues in the development of an information retrieval system: the Physician's Information Assistant. Medinfo 1992;7(Pt 1):371-5. There are many models of electronic information retrieval systems. The diversity of these systems, each requiring the user to learn new and idiosyncratic operating methods, is inhibiting to the user. In an effort to find a useful set of conventions of information organization and user navigation, the authors have developed the Physician's Information System (PIA). The model is that of a hypertext browser which reflects an assessment of the manner a user is most likely to approach the knowledge (the user model). The PIA uses existing knowledge bases and resources, including a semantic network browser and a communications program which handles literature searches, and achieves interoperability over these multiple resources. Browsing allows a wide variety of cognitive styles and supports answering poorly-formed queries. The user must exercise judgment in determining if the information found is relevant to the clinical situation. Difficulties with large volumes of text on a CRT screen can occur; nontextual visual representations of information, such as maps, appear to be helpful in reducing the amount of text. The methodology used in building the PIA appears to be adaptable to a variety of knowledge sources and other resources. The consistent user metaphor provides an example of the type of conventions that may be necessary to enable users to approach multiple knowledge sources.

Nelson SJ, Sherertz DD, Tuttle MS, Abarbanel RA, Olson NE, Erlbaum MS, Sperzel WD. The Physician's Information Assistant. Proc Annu Symp Comput Appl Med Care 1991:950-2.

Patil RS, Silva JS, Swartout WR. An architecture for a health care provider's workstation. Int J Biomed Comput 1994 Jan;34(1-4):285-99. This paper presents an architecture for a health care provider's workstation designed to assist health care providers in performing their daily activities. The design is based on the concept of a clinician's associate which acts as an intelligent intermediary between the provider and a diverse collection of clinical, administrative, and educational information sources. The architecture is designed to be hardware platform independent, to work across different I/O capabilities, and to be open, allowing specialized applications to be easily integrated with the system and their functionality delivered through a common user environment.

Powsner SM, Miller PL. Automated online transition from the medical record to the psychiatric literature. Methods Inf Med 1992 Sep;31(3):169-74. Psych Topix is a knowledge-based program which guides the clinician from an on-line clinical report to a search of the psychiatric literature or of other relevant databases. It provides this guidance by using an outline of key topics in a clinical field to provide "concept-based" links. Each topic is augmented with an activation expression to signal when that topic is potentially relevant to a case, and with database search expressions to allow focused retrieval of information. The bibliographic retrieval component of Psych Topix is currently operational as part of the daily, routine operation of a psychiatric consultation service. The system is also implemented in a demonstration mode to provide retrieval from three additional textual databases. The current Psych Topix system provides a working demonstration of the clinical feasibility of using concept-based links to facilitate the focused, automated transition between on-line medical databases.

Powsner SM, Miller PL. From patient reports to bibliographic retrieval: a Meta-1 front-end. Proc Annu Symp Comput Appl Med Care 1991:526-30. A software front-end has been programmed to help construct Medline query expressions from selected text in clinical records. The user clicks to choose pertinent words or phrases from the text with a pointing device and the words are translated into Medical Subject Headings (MeSH). The National Library of Medicine's Unified Medical Language System Meta-1 Thesaurus is used to look up the words selected by the user. The software traces through chains of synonyms to assemble a small set of MeSH indexing terms. The user then makes the final selection from among the MeSH terms and combines chosen terms using logical connectives to form a Medline query which is passed on to Grateful Med. This approach provides the clinical user with a natural starting point, the text of a patient report with no need to know the MeSH terminology. The software handles the translation that otherwise would necessitate looking up terms in MeSH guidebooks, as well as handling the added drudgery of checking out different synonyms. Preliminary evaluation of this approach with clinical trainees indicated that they find the front-end a straightforward way to search for literature relevant to a clinical case. Having a tool for immediate translation from clinical terminology to indexing terminology seems to be an important factor. Apparently minor issues in interface design, such as keeping the clinical report displayed simultaneously along with the search under construction, and keeping both visible during the search itself seem to help orient the user. Copyright by and reprinted with permission of the American Medical Informatics Association.

Powsner SM, Miller PL. Linking bibliographic retrieval to clinical reports. PsychTopix. Proc Annu Symp Comput Appl Med Care 1989:431-5. PsychTopix is an expert system which guides the clinician from a computer based clinical record to a focused bibliographic retrieval. Specifically, PsychTopix searches the literature for references pertinent to a psychiatric consultation report. It provides intelligent guidance by drawing on a knowledge base of current issues in psychiatry. It scans the report to determine which clinical topics (issues) are likely to be relevant to the case. The clinician then selects those topics for which he or she desires a literature search. There is no need for the user to know the mechanics or protocols of computerized literature retrieval. PsychTopix initiates a Medline search and presents the references retrieved to the user. More generally, PsychTopix provides an effective demonstration of the key topics approach to medical database linkage. Copyright by and reprinted with permission of the American Medical Informatics Association.

Powsner SM, Riely CA, Barwick KW, Morrow JS, Miller PL. Automated bibliographic retrieval based on current topics in hepatology: Hepatopix. Comput Biomed Res 1989 Dec;22(6):552-64. The Hepatopix computer program helps the physician go from a computerized clinical record directly to a computerized search of the medical literature. The program uses a hierarchical list of current (key) topics in hepatology to offer "intelligent" searches. Each topic has associated "selection logic" and a tested Medline search. Starting with a liver biopsy case record, Hepatopix evaluates the selection logic to determine which topics may be pertinent to the case (based on clinical findings, lab tests, and critical words or phrases in the summary). The physician then picks those topics which are interesting enough to warrant a literature search. The citations are retrieved using search strategies stored for each topic and presented. Hepatopix is operational with over 200 topics in the realm of liver neoplasms, operating on liver biopsy case summaries from the Klatskin Database of Liver Biopsies. Besides demonstrating clinical case-directed bibliographic retrieval, it demonstrates the utility of a "key topics" list as a bridge between medical databases. Copyright 1989 Academic Press.

Sherertz DD, Tuttle MS, Olson NE, Hsu GT, Carlson RW, Fagan LM, Acuff RD, Cole WG, Nelson SJ. Accessing oncology information at the point of care: experience using speech, pen, and 3-D interfaces with a knowledge server. Medinfo 1995;8(Pt 1):792-5. Oncologists' information needs arise at diverse times and settings. For example: Is superior vena cava syndrome a medical emergency? Our collaborative group is developing a system that supports an interface with combinations of spoken, gestural, and simulated three-dimensional manipulation to help an oncologist focus on the information need, not the system. The system requires a small amount of input from the oncologist, and then anticipates what information is pertinent to the patient at hand, based on the sources it has available. The system makes use of a Knowledge Server to find relevant information. The Knowledge Server uses selected data for the particular patient from a Computer-based Patient Record (CPR) to provide context for the information needs. The Knowledge Server leverages the Unified Medical Language System (UMLS) resources as well as relevant communications standards. A layered, interaction protocol is used to help manage the fulfillment of information needs. Each of the oncology knowledge sources is transformed into a uniform representation that utilizes both its formal schema (e.g., its table of contents) and its concepts and words indexed through the UMLS Metathesaurus. Our focus on the appropriate use of information from a CPR, and on anticipating oncologists' information needs, resulted from our study of several longitudinal patient scenarios. We believe that our use of scenario-based design techniques will help to ensure the system's success.

Tuttle MS, Sherertz DD, Fagan LM, Carlson RW, Cole WG, Schipma PB, Nelson SJ. Toward an interim standard for patient-centered knowledge-access. Proc Annu Symp Comput Appl Med Care 1993:564-8. Most care-giver knowledge needs arise at the point of care and are patient-centered. Many of these knowledge needs can be met using existing on-line knowledge sources, but the process is too time-consuming, currently, for even the computer-proficient. We are developing a set of public domain standards aimed at bringing potentially relevant knowledge to the point of care in a straight-forward and timely fashion. The standards will a) make use of selected items from a Computer-based Patient Record (CPR), e.g., a diagnosis and measure of severity, b) anticipate certain care-giver knowledge needs, e.g., therapy, protocols, complications, and c) try to satisfy those needs from available knowledge sources, e.g., knowledge-bases, citation databases, practice guidelines, and on-line textbooks. The standards will use templates, i.e., fill-in-the-blank structures, to anticipate knowledge needs and UMLS Metathesaurus enhancements to represent the content of knowledge sources. Together, the standards will form the specification for a Knowledge-Server (KS) designed to be accessed from any CPR system. Plans are in place to test an interim version of this specification in the context of medical oncology. We are accumulating anecdotal evidence that a KS operating in conjunction with a CPR is much more compelling to users than either a CPR or a KS operating alone. Copyright by and reprinted with permission of the American Medical Informatics Association.

Tuttle MS, Sherertz DD, Olson NE, Nelson SJ, Erlbaum MS, Keck KD, Davis AN, Suarez-Munist ON, Lipow SS, Cole WG. Toward reusable software components at the point of care. Proc AMIA Fall Symp 1996:150-4. An architecture built from five software components -- a Router, Parser, Matcher, Mapper, and Server -- fulfills key requirements common to several point-of-care information and knowledge processing tasks. The requirements include problem-list creation, exploiting the contents of the Electronic Medical Record for the patient at hand, knowledge access, and support for semantic visualization and software agents. The components use the National Library of Medicine Unified Medical Language System to create and exploit lexical closure -- a state in which terms, text and reference models are represented explicitly and consistently. Preliminary versions of the components are in use in an oncology knowledge server. Copyright by and reprinted with permission of the American Medical Informatics Association.

van Mulligen EM. UMLS-based access to CPR data. Proc AMIA Fall Symp 1996:819. This abstract describes the results of a project that explores the use the Unified Medical Language System (UMLS) in browsing a computer-based patient record (CPR)]. The project consisted of a number of steps: the mapping between CPR terms and UMLS concepts, the development of an algorithm that explores the CPR data using this mapping, and the implementation of a first prototype browser that visualizes "found" data. A second issue in this project has been the direct access to online medical literature (MEDLINE) using the UMLS concepts found in the CPR data. In this project, we used a preliminary version of the Open Records for Patient Care (ORCA) CPR that consisted only of the history and physical examination data of patients suffering from heart failure. Copyright by and reprinted with permission of the American Medical Informatics Association.

Return to title page | Return to table of contents

Barber S, Fowler J, Long KB, Dargahi R, Meyer B. Integrating the UMLS into VNS retriever. Proc Annu Symp Comput Appl Med Care 1992:273-7. We are developing a networked resource for the National Library of Medicine's Unified Medical Language System. We call this resource the UMLS Retriever, which is an instance of our VNS Retriever architecture. Our prototype user interface makes use of the Virtual Notebook System Browser. The development of a networked UMLS service will result in numerous advantages to our user community. Copyright by and reprinted with permission of the American Medical Informatics Association.

Cimino C, Barnett GO. Standardizing access to computer-based medical resources. Proc Annu Symp Comput Appl Med Care 1990:33-7. The paper describes a working Interactive Query Workstation (IQW). The IQW allows users to query multiple resources: a medical knowledge base (DXplain), a clinical database (COSTAR/MQL), a bibliographic database (MEDLINE), a cancer database (PDQ), and a drug interaction database (PDR). The IQW has evolved from requiring alteration of resource code to using off-the-shelf products (Kappa & Microsoft Windows) to control resources. Descriptions of each resource were developed to allow IQW to access these resources. There are three components to these descriptions; information on how data is sent and received from a resource, in formation on types of queries to which a resource can respond, and information on what types of information are needed to execute a query. These components form the basis of a standard description of resources. Copyright by and reprinted with permission of the American Medical Informatics Association.

Cimino C, Barnett GO, Blewett DR, Hassan LJ, Grundmeier R, Merz R, Kahn JA, Gnassi JA. Interactive query workstation: a demonstration of the practical use of UMLS knowledge sources. Proc Annu Symp Comput Appl Med Care 1992:823-4. The Interactive Query Workstation (IQW) has been developed to provide clinicians with a uniform program interface for retrieving medical-related information from various computer-based information resources. These resources can vary in content (bibliographic databases, drug information, general medical text databases), function (article retrieval, differential diagnosis, drug interaction detection, or drug dosage and administration information), and media formats (local hard disk, CD-ROM, local area network, or distant telecommunication link). IQW allows modular addition of new resources as well as extension of previously installed resources. The National Library of Medicine's three Unified Medical Language System (UMLS) Knowledge Sources, the Metathesaurus (Meta), the Semantic Network, and the Information Sources Map (ISM) have been incorporated into many aspects of IQW. Meta provides information about medical terminology and aids IQW in isolating the basic concepts from a clinician's question. The Semantic Network provides information about the categorization of concepts and possible relations between concepts. It also assists IQW in determining which queries are appropriate for a set of concepts contained in the clinician's question. The ISM provides information about the content available from a computer-based resources and aids IQW in selecting an appropriate resource from which to collect information. The computer-based resource selection is performed without user intervention. This interactive demonstration shows an environment which increases the accessibility of medical information to clinicians by utilizing the three UMLS Knowledge Sources. Copyright by and reprinted with permission of the American Medical Informatics Association.

Cimino C, Barnett GO, Hassan L, Blewett DR, Piggins JL. Interactive query workstation: standardizing access to computer-based medical resources. Comput Methods Programs Biomed 1991 Aug;35(4):293-9. Methods of using multiple computer-based medical resources efficiently have previously required either the user to manage the choice of resource and terms, or specialized programming. Standardized descriptions of what resources can do and how they may be accessed would allow the creation of an interface for multiple resources. This interface would assist a user in formulating queries, accessing the resources and managing the results. This paper describes a working prototype, the Interactive Query Workstation (IQW). The IQW allows users to query multiple resources: a medical knowledge base (DXplain), a clinical database (COSTAR/MQL), a bibliographic database (MEDLINE), a cancer database (PDQ), and a drug interaction database (PDR). Descriptions of each resource were developed to allow IQW to access these resources. The descriptions are composed of information on how data are sent and received from a resource, information on types of query to which a resource can respond, and information on what types of information are needed to execute a query. These components form the basis of a standard description of resources.

Clyman JI, Powsner SM, Paton JA, Miller PL. Using a network menu and the UMLS Information Sources Map to facilitate access to online reference materials. Bull Med Libr Assoc 1993 Apr;81(2):207-16. As computer technology advances, clinicians and biomedical researchers are becoming more dependent upon information from online databases and information systems. By using specially configured computer workstations and high-speed computer networks, it is now possible to access this information in a rapid and straightforward manner. To empower users by providing these capabilities, the authors are assembling a variety of network workstations to be located throughout Yale-New Haven Medical Center. At the heart of the workstation is NetMenu, a program designed to help users connect to a number of important online information systems, including a hospital order entry and results reporting system, a drug reference, bibliographic retrieval systems, and educational programs. In addition, as part of the National Library of Medicine's Unified Medical Language System (UMLS) project, the authors have developed a local prototype of the UMLS Information Sources Map (ISM) and a companion query assistant program to complement the NetMenu in helping users select and connect automatically to information services relevant to a particular question. The ISM query assistant draws from a listing of many online information sources accessible via local and international networks. Copyright by and reprinted with permission of the Medical Library Association.

Dolin RH. Internet medical resources [letter]. Ann Intern Med 1996 Feb 1;124(3):375. Comment on: Ann Intern Med 1995 Jul 15;123(2):123-31.

Freiburger G, Levy SR, LePoer PM, Murray J, Heinold S, Warfield T. UMLS workstation project. Progress to date. Proc Annu Symp Comput Appl Med Care 1993:877. In 1992, the Health Science Library at the University of Maryland was awarded a three-year grant from the National Library of Medicine to create a Windows-based interface to the Unified Medical Language System (UMLS). This interface will use the UMLS Knowledge Sources to assist users searching various databases available at the Library, including the online catalog, PsycLIT, CINAHL, MEDLINE, and HSL Current Contents. This paper traces the evolution of the interface during its first 18 months by illustrating the revisions made to the prototype. Revisions are based on user feedback elicited in an iterative process of design, development, and review. Copyright by and reprinted with permission of the American Medical Informatics Association.

Gnassi JA, Barnett GO. A survey of electronic drug information resources and identification of problems associated with the differing vocabularies used to key them. Proc Annu Symp Comput Appl Med Care 1994:631-5. Drug information resources are increasingly becoming electronically available. They differ in scope, granularity, and purpose. These considerations have shaped the selection of dissimilar drug name keys, complicating access. An abbreviated and simplified historical context of the development of official controlled vocabularies and their relationships is followed by a review of the kinds of information available in several electronic drug information resources. The key vocabularies used are discussed with examples. Problems using the differing terms of the resource vocabularies are identified. Copyright by and reprinted with permission of the American Medical Informatics Association.

Gnassi JA, Bormel JI, Blewett DR, Kim RJ, Barnett GO. A medical information resource server: one stop shopping on the Internet. Proc Annu Symp Comput Appl Med Care 1994:1025.

Hersh WR, Brown KE, Donohoe LC, Campbell EM, Horacek AE. CliniWeb: managing clinical information on the World Wide Web. J Am Med Inform Assoc 1996 Jul-Aug;3(4):273-80. The World Wide Web is a powerful new way to deliver on-line clinical information, but several problems limit its value to health care professionals: content is highly distributed and difficult to find, clinical information is not separated from non-clinical information, and the current Web technology is unable to support some advanced retrieval capabilities. A system called CliniWeb has been developed to address these problems. CliniWeb is an index to clinical information on the World Wide Web, providing a browsing and searching interface to clinical content at the level of the health care student or provider. Its database contains a list of clinical information resources on the Web that are indexed by terms from the Medical Subject Headings disease tree and retrieved with the assistance of SAPHIRE. Limitations of the processes used to build the database are discussed, together with directions for future research. Copyright by and reprinted with permission of the American Medical Informatics Association.

Loonsk JW, Lively R, TinHan E, Litt H. Implementing the Medical Desktop: tools for the integration of independent information resources. Proc Annu Symp Comput Appl Med Care 1991:574-7. The increasing availability of medical information resources has moved the Medical Desktop from a theoretical construct to a practical necessity. Many micro-computers are becoming available in clinical and academic settings that can access several medical information applications. These computers are usually not powerful workstations that are part of a clinically oriented information support system, but are personal computers with varied capabilities. The applications on these computers come from different sources, are accessed through different user interfaces and do not share data well. The de facto Medical Desktop this situation presents will discourage most end-users because the combination of applications is complex, the applications are poorly integrated, and individual applications are inconsistent. At the State University of New York at Buffalo School of Medicine and Biomedical Sciences we have developed several Microsoft Windows-based tools that accept a systems level diversity of resources, but work toward the construction of a coherent Medical Desktop. These tools include a lexical term linker, a resource database, and a context sensitive help system that is tailored to locally available resources. Copyright by and reprinted with permission of the American Medical Informatics Association.

Rodgers RP. Automated retrieval from multiple disparate information sources. The World Wide Web and the NLM's Sourcerer project. J Am Soc Inf Sci 1995 Dec;46(10):755-64. The burgeoning amount of information available via the internet has heightened awareness of the need for improved tools for resource identification. The U.S. National Library of Medicine's (NLM) Sourcerer project is developing software which accepts a user query, automatically identifies appropriate information resources, and facilitates connection to those sources for information retrieval. The current Sourcerer prototype utilizes the multimedia/multiplatform/multiprotocol network-based hypertext system known as World Wide Web. It also relies upon the knowledge sources of the Unified Medical Language System (UMLS). The UMLS is the result of a long-term project of NLM. It comprises a large Metathesaurus of biomedical concepts (coupled with a semantic network and syntactical/lexical software tools) and the information Sources Map (ISM), a database of records describing specific biomedical information resources. Recent advances in the standardization of information exchange over computer networks, coupled with the tools provided by UMLS, facilitate query refinement and augmentation, connection to resources, and retrieval from resources. Daunting challenges remain with respect to optimizing resource descriptions, defining optimal algorithms for searching for sources, optimizing user interface design, and organizing retrieved information. Copyright 1995 by and reproduced with permission of John Wiley and Sons.

Sperzel WD, Abarbanel RM, Nelson SJ, Erlbaum MS, Sherertz DD, Tuttle MS, Olson NE, Fuller LF. Biomedical database inter-connectivity: an experiment linking MIM, GENBANK, and META-1 via MEDLINE. Proc Annu Symp Comput Appl Med Care 1991:190-3. The linkage of disparate biomedical databases is an important goal of the Unified Medical Language (UMLS) Project. We conducted an experiment to investigate the feasibility of using UMLS resources to link databases in clinical genetics and molecular biology. References from MIM (Mendelian Inheritance in Man) were lexically mapped to the equivalent citations in MEDLINE. The MeSH major subject headings by which the citations in a particular MIM entry had been indexed were used to develop a genetic-disorder-centered view of the world in Meta-1 (the first official version of the UMLS Metathesaurus). Our hypothesis was that these MeSH subject headings could provide access to a semantic neighborhood in Meta-1 that would be relevant to a particular genetic disorder. By browsing in this semantic neighborhood, a user could select various combinations of terms with which to search MEDLINE through an interface between Meta-1 and Grateful Med. Such searches might retrieve citations that were more recent than those in MIM or that provided useful supplementary information. Since some MEDLINE records contain pointers to entries in GENBANK, information about genetic sequences related to a particular clinical genetic disorder could also be retrieved. This scenario was implemented for a small number of MIM entries, providing a concrete demonstration that linking disparate electronic databases in an important subdomain of biomedicine is relatively straightforward. Copyright by and reprinted with permission of the American Medical Informatics Association.

Tuttle MS, Cole WG, Sherertz DD, Nelson SJ. Navigating to knowledge. Methods Inf Med 1995 Mar;34(1-2):214-31. One way to fulfill point-of-care knowledge needs is to present caregivers with a visual representation of the available "answers". Using such a representation, caregivers can recognize what they want, rather than have to recall what they need, and then navigate to an appropriate answer. Given selected pieces of information from a computer-based patient record, an interface can anticipate certain knowledge needs by initializing caregiver navigation in a semantic neighbourhood on answers likely to be relevant to the patient a hand. These notions draw heavily on two collaborative projects - the U.S. National Library of Medicine Unified Medical Language System and the U.S. National Cancer Institute Knowledge Server. Both these projects support navigation because they make the structure of medical knowledge explicit in a way that can be exploited by human interfaces.

Return to title page | Return to table of contents

Preliminary and Ancillary Studies


Cimino C, Barnett GO. Analysis of physician questions in an ambulatory care setting. Proc Annu Symp Comput Appl Med Care 1991:995-9. We collected 38 questions generated by physicians based on their active patient medical records. Each question was associated with a single term in a specific record (Key Term). These questions were analyzed with respect to word content and concept content. Concepts were matched to the National Library of Medicine's Metathesaurus (Meta-1). Thirty-seven Key Terms matched completely to Meta-1 terms. Each question matched to an average of 4.1 Meta-1 terms for a total of 156 concepts. Based on word count, these 156 concepts accounted for 40 percent, stop words accounted for 39 percent, and numbers and drug trade names accounted for less than 1 percent of the words. The remaining 20 percent of the words could be matched to 69 concepts not in Meta-1. Review of all concepts showed that they could be divided into medical terms (Noun Concepts), modifiers (Modifier Concepts), and concepts that provided context for the questions (Relation Concepts). The majority of Relation Concepts did not match to Meta-1. A vocabulary of Relation Concepts would provide a useful starting point for a computer system designed to aid physicians in answering clinical questions. Copyright by and reprinted with permission of the American Medical Informatics Association.

Cimino C, Barnett GO. Analysis of physician questions in an ambulatory care setting. Comput Biomed Res 1992 Aug;25(4):366-73. We collected 69 questions generated by physicians based on their active patient medical records. Each question was associated with a single term in a specific record (Key Term). These questions were analyzed with respect to word content and concept content. Concepts were matched to the National Library of Medicine's Metathesaurus (Meta-1). Sixty-eight Key Terms were completely matched by Meta-1 terms. Each question matched to an average of 3.7 Meta-1 terms for a total of 255 concepts. Based on word count, these 255 concepts accounted for 43%, stop words accounted for 36%, and numbers and drug trade names accounted for 3% of the words. The remaining 18% of the words could be matched to 143 concepts not in Meta-1. Review of all concepts showed that they could be divided into medical terms (Noun Concepts), modifiers (Modifier Concepts), and concepts that provided context for the questions (Relation Concepts). The majority of Relation Concepts did not match concepts in Meta-1. A vocabulary of Relation Concepts would provide a useful starting point for a computer system designed to aid physicians in answering these questions. Copyright 1992 Academic Press.

Forsythe DE, Buchanan BG, Osheroff JA, Miller RA. Expanding the concept of medical information: an observational study of physicians' information needs. Comput Biomed Res 1992 Apr;25(2):181-200. The authors describe an empirical study of information needs in four clinical settings in internal medicine in a university teaching hospital. In contrast to the retrospective data often used in previous studies, this research used ethnographic techniques to facilitate direct observation of communication about information needs. On the basis of this experience, the authors address two main issues: how to identify and interpret expressions of information needs in medicine and how to broaden ones conception of 'information needs' to account for the empirical data. Copyright 1992 Academic Press.

Osheroff JA, Forsythe DE, Buchanan BG, Bankowitz RA, Blumenfeld BH, Miller RA. Physicians' information needs: analysis of questions posed during clinical teaching. Ann Intern Med 1991 Apr 1;114(7):576-81. OBJECTIVE: To describe information requests expressed during clinical teaching. SETTING: Residents' work rounds, attending rounds, morning report, and interns' clinic in a university-based general medicine service. SUBJECTS: Attending physicians, medical house staff, and medical students in a general medicine training program. METHODS: An anthropologist observed communication among study subjects and recorded in field notes expressions of a need for information. We developed a coding scheme for describing information requests and applied the coding scheme to the data recorded. Based on assigned codes, we created a subset of strictly clinical requests. MEASUREMENTS: Five hundred nineteen information requests recorded during 17 hours of observed clinical activity were selected for detailed analysis. These requests related to the care of approximately 90 patients by 24 physicians and medical students. Sixty-five requests were excluded because they were not strictly clinical, leaving a subset of 454 clinical questions for analysis. MAIN RESULTS: On average, five clinical questions were raised for each patient discussed. Three hundred thirty-seven requests (74%) concerned patient care. Of these 337 questions, 175 (52%) requested a fact that could have been found in a medical record. Seventy-seven (23%) of these questions, motivated by the needs of patient care, were potentially answerable by a library, a textbook, a journal, or MEDLINE. Eighty-eight (26%) of the questions asked for patient care required synthesis of patient information and medical knowledge. CONCLUSIONS: Clinicians in the study settings requested information frequently. Many of these information needs required the synthesis of patient information and medical knowledge and thus were potentially difficult to satisfy. A typology is proposed that characterizes information needs as consciously recognized, unrecognized, and currently satisfied. Copyright Annals of Internal Medicine; http://www.acponline.org.

Return to title page | Return to table of contents

Bicknell EJ, Sneiderman CA, Rada RF. Computer-assisted merging and mapping of medical knowledge bases. Proc Annu Symp Comput Appl Med Care 1988:158-64. A user-interactive rule-based computer program, developed for the purpose of mapping and merging multiple hierarchical thesauri, is described. The program, called DynaSaurI (dynamic thesaurus integration), was tested on subsets of two medical thesauri, the Systemized Nomenclature of Medicine (SNOMED) and the Medical Subject Headings (MeSH). Each thesaurus is treated as a knowledge base, containing information both in the terms and in the various types of relationships between terms. DynaSaurI uses combinations of string matching and tree browsing to propose a ranked array of related MeSH terms for each SNOMED term presented. A medical expert selected the closest match from the array of terms proposed by DynaSaurI and entered the appropriate type of relationship between the terms. The information acquired from the expert was then used to refine DynaSaurI's mapping rules. With DynaSaurI, all 84 SNOMED concepts were successfully merged with and mapped to MeSH main headings; 74(88%) were mapped by the initial DynaSaurI pass, and the remainder by incorporating the user's responses. All merging utilized the relationship descriptions (e.g., "is narrower than") provided by the online interaction with the medical expert. Copyright by and reprinted with permission of the American Medical Informatics Association.

From the president: reviewing nursing concepts to be included in the 1995 version of the Unified Medical Language System. Nurs Diagn 1995 Apr-Jun;6(2):53-4.

Kannry JL, Wright L, Shifman M, Silverstein S, Miller PL. Portability issues for a structured clinical vocabulary: mapping from Yale to the Columbia medical entities dictionary. J Am Med Inform Assoc 1996 Jan-Feb;3(1):66-78. To examine the issues involved in mapping an existing structured controlled vocabulary, the Medical Entities Dictionary (MED) developed at Columbia University, to an institutional vocabulary, the laboratory and pharmacy vocabularies of the Yale New Haven Medical Center. 200 Yale pharmacy terms and 200 Yale laboratory terms were randomly selected from database files containing all of the Yale laboratory and pharmacy terms. These 400 terms were then mapped to the MED in three phases: mapping terms, mapping relationships between terms, and mapping attributes that modify terms. 73% of the Yale pharmacy terms mapped to MED terms. 49% of the Yale laboratory terms mapped to MED terms. After certain obsolete and otherwise inappropriate laboratory terms were eliminated, the latter rate improved to 59%. 23% of the unmatched Yale laboratory terms failed to match because of differences in granularity with MED terms. The Yale and MED pharmacy terms share 12 of 30 distinct attributes. The Yale and MED laboratory terms share 14 of 23 distinct attributes. The mapping of an institutional vocabulary to a structured controlled vocabulary requires that the mapping be performed at the level of terms, relationships, and attributes. The mapping process revealed the importance of standardization of local vocabulary subsets, standardization of attribute representation, and term granularity. Copyright by and reprinted with permission of the American Medical Informatics Association.

Lowe HJ, Barnett GO. Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. JAMA 1994 Apr 13;271(14):1103-8. Comment in: JAMA 1995 Jan 18;273(3):184; discussion 184-5.

Masarie FE Jr, Miller RA. Medical Subject Headings and medical terminology: an analysis of terminology used in hospital charts. Bull Med Libr Assoc 1987 Apr;75(2):89-94. Terminology used by health professionals in everyday written discourse was compared with terminology in a standardized medical vocabulary, the Medical Subject Headings (MeSH). Fifty written hospital charts were selected at random and analyzed by a computer program that identified MeSH terms in the charts. The charts were analyzed against two related MeSH vocabularies- one containing MeSH terms and one containing both MeSH terms and backwards cross-reference terms. when small words such as articles and prepositions were disregarded, approximately 50% of the words in a medical chart were found to be MeSH-related terminology. In addition, about 40% of MeSH-related words in the charts were either MeSH terms or backwards cross-reference terms. Copyright by and reprinted with permission of the Medical Library Association.

McCray AT, Browne AC, Moore DL. The semantic structure of neo-classical compounds. Proc Annu Symp Comput Appl Med Care 1988:165-8. The automated analysis of neo-classical compounds in the medical domain has been proposed and carried out by a number of researchers in recent years. This paper discusses the semantics of these compounds. The results indicate that neo-classical compounds are semantically underdetermined by their constituent parts. Thus, automated analysis of these compounds will need to be supplemented by human review. Copyright by and reprinted with permission of the American Medical Informatics Association.

Miller PL, Smith P, Morrow JS, Riely CA, Powsner SM. Semantic relationships and MeSH. Proc Annu Symp Comput Appl Med Care 1988:174-9. This paper compares bibliographic retrieval using current MeSH (Medical Subject Headings) to bibliographic retrieval using explicitly coded semantic relationships between index terms. In a previous study, 10 lists of abstracts, each list containing 20-40 papers discussing a specific pair of terms, were analyzed to identify the specific relationship(s) between those terms discussed in each paper. In the present study, we analyze how well current MeSH coding, using topical subheadings and check tags, can selectively retrieve those papers discussing each semantic relationship. Copyright by and reprinted with permission of the American Medical Informatics Association.

Return to title page | Return to table of contents

Ball S, Wright L, Miller P. SENEX, an object-oriented biomedical knowledge base. Proc Annu Symp Comput Appl Med Care 1989:85-9. An object-oriented knowledge base, SENEX, in the domain of neurodegeneration and loss of memory in aging is being developed. Initially, the focus is on three sets of issues in the representation of biomedical information. First, the authors are seeking to extend the medical subject headings (MeSH) nomenclature to include new classes of biomedical entities and to include relationships among those entities. Second, they are structuring biomedical information rather than categorizing text for bibliographic retrieval. Third, they are exploring ways in which such information could be used in an interactive system created for purposes of education and for designing basic research experiments. The current behavior of SENEX, which is being developed using the Common Lisp Object System (CLOS), is described. Various issues raised and plans for future development are discussed. Copyright by and reprinted with permission of the American Medical Informatics Association.

Ball SS, Mah VH, Miller PL. SENEX: a computer-based representation of cellular signal transduction processes in the central nervous system. Comput Appl Biosci 1991 Apr;7(2):175-87. The SENEX project is exploring knowledge representation in the neurobiology of ageing through object-oriented programming. SENEX is built from a classification structure of biologic entities and significant relationships among them. For example, an enzyme is an entity and an enzymatic reaction is a relationship among enzyme, cofactor(s), substrate(s) and product(s). There are currently 2600 classes of entities and 50 classes of relationships in SENEX. The class structure serves several functions. One function is to interrelate general and specific categories of molecular and morphologic entities. For example, tyrosine kinase and serine/threonine kinase are specific types of the more general class of protein kinase enzymes. Another function of the class structure is to serve as a network through which inheritance of attributes may occur. For example, the attribute 'subunits' is inherited by all subclasses of the general class multisubunit protein. Information may be accessed through links established in the class structure and through links relating one object as part of another. Relationships form the basis of separate modules within SENEX. This paper describes the types of relationships currently used and planned in the representation of age-related changes in cellular signal transduction processes of mammalian central nervous systems. We also describe tools for specific retrieval of relationships and for tracing links in complex reaction cascades. Application of these tools to identifying possible signal transduction pathways to guide further exploration through experimentation is discussed. Reprinted by permission of Oxford University Press.

Barr CE, Komorowski HJ, Pattison-Gordon E, Greenes RA. Conceptual modeling for the unified medical language system. Proc Annu Symp Comput Appl Med Care 1988:148-51. The Unified Medical Language System was proposed by the National Library of Medicine to facilitate the exchange and utilization of information from multiple sources. We are using semantic networks as the knowledge representation scheme in a prototype system to explore how to accomplish these goals. Conceptual modeling helps define a complete and consistent set of objects and relationships to include in the semantic net. Both top-down and bottom-up approaches were found useful in the seven step process of building the semantic network. Theoretical and practical issues are discussed as well as future extensions to the current prototype. Copyright by and reprinted with permission of the American Medical Informatics Association.

Berman L, Cullen M, Miller PL. Automated integration of external databases: a knowledge-based approach to enhancing rule-based expert systems. Comput Biomed Res 1993 Jun;26(3):230-41. Expert system applications in the biomedical domain have long been hampered by the difficulty inherent in maintaining and extending large knowledge bases. We have developed a knowledge-based method for automatically augmenting such knowledge bases. The method consists of automatically integrating data contained in commercially available, external, online databases with data contained in an expert system's knowledge base. We have built a prototype system, named DBX, using this technique to augment an expert system's knowledge base as a decision support aid and as a bibliographic retrieval tool. In this paper, we describe this prototype system in detail, illustrate its use, and discuss the lessons we have learned in its implementation. Copyright 1993 Academic Press.

Berman L, Cullen MR, Miller PL. Automated integration of external databases: a knowledge-based approach to enhancing rule-based expert systems. Proc Annu Symp Comput Appl Med Care 1992:227-33. Expert system applications in the biomedical domain have long been hampered by the difficulty inherent in maintaining and extending large knowledge bases. We have developed a knowledge-based method for automatically augmenting such knowledge bases. The method consists of automatically integrating data contained in commercially available, external, on-line databases with data contained in an expert system's knowledge base. We have built a prototype system, named DBX, using this technique to augment an expert system's knowledge base as a decision support aid and as a bibliographic retrieval tool. In this paper, we describe this prototype system in detail, illustrate its use and discuss the lessons we have learned in its implementation. Copyright by and reprinted with permission of the American Medical Informatics Association.

Canfield K, Bray B, Huff S. Representation and database design for clinical information. Proc Annu Symp Comput Appl Med Care 1990:350-3. Semantic discourse analysis and sublanguage methods are used to create a database model from free-text echocardiography reports. This model dictates the object structure of a sematic model that is implemented in a relations database form and evaluated for representational adequacy. The sematic frame structure of this relational patient database is a flexible projection of a hierarchical dictionary. The Unified Medical Language System (UMLS) Metathesaurus could be used as such a dictionary. The result is a structured clinical report database model that is built on a standard dictionary and is generalizable to other domains. Copyright by and reprinted with permission of the American Medical Informatics Association.

Canfield KC, Bray BE, Huff SM, Warner HR. Database capture of natural language echocardiographic reports: A UMLS approach. Proc Annu Symp Comput Appl Med Care 1989:559-63. We describe a prototype system for semiautomatic database capture of free-text echocardiography reports. The system is very simple and uses a Unified Medical Language System compatible architecture. We use this system and a large body of texts to create a patient database and develop a comprehensive hierarchical dictionary for echocardiography. Copyright by and reprinted with permission of the American Medical Informatics Association.

Chang S-Y. Kanes: knowledge acquisition for a neuroradiology expert system [dissertation]. [Salt Lake City]: The University of Utah; 1991. 151 p. Available from: University Microforms, Ann Arbor, MI; 9133706. One of the most severe obstacles to applying medical informatics to solve practical medical problems is acquiring the knowledge base. The Iliad knowledge base is among the most comprehensive medical knowledge bases in existence. The amount of effort devoted to its creation and maintenance is a testimony to the difficulty of building academic-quality, comprehensive knowledge bases for practical medical applications. In addition to the difficulty of knowledge acquisition, the accuracy and reliability of knowledge base are also major concerns for the developers and users of Iliad expert system. The major emphasis of this project was to use a patient database to improve the efficiency of creating a knowledge base for the neuroradiology domain and increase the accuracy of Iliad expert system based on this knowledge base. The project introduced a knowledge engineering model to automatically generate disease profiles in neuroradiology. The model used a new technique to collect patient data, obtain important statistics, calculate finding utility, and extract the best findings for the diagnostic frames. Knowledge Acquisition for a Neuroradiology Expert System (KANES) is an efficient, accurate, and easy-to-use personal computer program that can assist knowledge engineers in managing the patient database and executing the analysis tasks described above. The experience with the KANES program using a relational Database Management System (DBMS) can be extended to a larger environment where information is gathered from multiple sources and where real-time decisions need to be made. The knowledge engineering session in the Department of Medical Informatics is a good example of such an environment where different categories of individuals need and generate information and make decisions related to teaching, research, management, and administration. Under contract from the Unified Medical Language System (UMLS) project of the National Library of Medicine, a database is being built to contain clinical data from multiple sources like QMR, HELP, and Iliad. As the quality of the data in medical information systems improves, such databases will become an important resource for all of the probabilities that drive computerized diagnostic systems. The KANES program has the potential to expand to be a decision support system for all medical domains that would help the groups in the knowledge engineering session to make decisions more easily and more appropriately. Provided by UMI.

Cimino JJ, Elkin PL, Barnett GO. As we may think: the concept space and medical hypertext. Comput Biomed Res 1992 Jun;25(3):238-63. Hypertext, a medium for presenting written material in a nonsequential manner, is gaining popularity as a format for medical text. The structure of traditional hypertext documents (hyperdocuments) includes author-created links among text segments. This structure poses challenge for those who create and maintain hyperdocuments, while reading them can introduce disorientation and cognitive overload. An alternative model is presented in which text segments are linked to the concepts which they contain and the concepts are linked to each other in a semantic network called the Concept Space. The concepts and semantic links attempt to approximate potential topics of interest, allowing the reader to browse the hyperdocument in an individualized manner, rather than in an author-designated one. The concept space approach offers advantages for both the author and the reader. Copyright 1992 Academic Press.

Cimino JJ, Mallon LJ, Barnett GO. Automated extraction of medical knowledge from Medline citations. Proc Annu Symp Comput Appl Med Care 1988:180-4. The Medline database consists of over six million citations to the medical literature, indexed by the National Library of Medicine with the use of Medical Subject Headings (MeSH) and Subheadings. We propose that analysis of MeSH Headings and Subheadings in Medline citations will reveal the interrelationships among medical concepts described in the original articles. We have developed a rule-based system which postulates relationships based on the co-occurrence of MeSH Headings in Medline citations. At present, the rule base consists of 504 rules which propose 57 relationships. When this rule base was applied to a test of 673 citations, 93% of the proposed relationships were determined to be correct (96%, after correction of a transcription error in the rule base). We believe this approach has great potential, both for assisting acquisition of medical knowledge and for improving the quality of Medline retrievals. Copyright by and reprinted with permission of the American Medical Informatics Association.

Elkin PL, Cimino JJ, Lowe HJ, Aronow DB, Payne TH, Pincetl PS, Barnett GO. Mapping to MeSH (the art of trapping MeSH equivalence from within narrative text). Proc Annu Symp Comput Appl Med Care 1988:185-90. A tool for identifying medical subject headings (MeSH) terms found in narrative text is evaluated and discussed. The system's task is to discern unique matches from within MeSH for noun phrases in narrative text. The program consists of a medical morphological reduction routine, coupled with data structures created for the MeSH vocabulary searcher, MicroMeSH. To evaluate this tool, a study was undertaken within which three physicians were asked to review citations from the Annals of Internal Medicine and three paragraphs from a Textbook of Cardiology. They were asked to identify all of the important medical concepts within both the citations and text. Then these same physicians used MicroMeSH, to see which of these medical concepts could be mapped to MeSH. These results were compared to the output of our automated system. Of the concepts which were found to be medical concepts by our panel of experts 89% were identified using MicroMeSH as being represented in MeSH. The automated system was able to translate 90% of those medical concepts to the same MeSH term as identified by MicroMeSH. The remaining 10% of the concepts, which were recognized by the experts and not by the system, were terms where the medical knowledge of the experts allowed them to find matches which had no representation, morphologically or through an Entry Term, in the MeSH vocabulary. The specific algorithms, details of evaluation and potential usefulness of the system are discussed. Copyright by and reprinted with permission of the American Medical Informatics Association.

Evans DA. Pragmatically-structured, lexical-semantic knowledge bases for unified medical language systems. Proc Annu Symp Comput Appl Med Care 1988:169-73. Unified medical language systems must accommodate expressions ranging from fixed-form standardized vocabularies to the free-text, natural language of medical charts. Such ability will depend on the identification, representation, and organization of the concepts that form the useful core of the biomedical conceptual domain. The MedSORT-II and UMLS Projects at Carnegie Mellon University have established a feasibile design for the development of lexicons and knowledge bases to support the automated processing of varieties of expressions (in the subdomain of clinical findings) into uniform representations. The essential principle involves incorporating lexical-semantic typing restrictions in a pragmatically -structured knowledge base. The approach does not depend on exhaustive knowledge representation, rather takes advantage of selective, limited relations among concepts. In particular, the Projects have demonstrated that practical, comprehensive, and accurate processing of natural-language expressions is attainable with partial knowledge bases, which can be rapidly prototyped. Copyright by and reprinted with permission of the American Medical Informatics Association.

Greenes RA, McClure RC, Pattison-Gordon E, Sato L. The findings--diagnosis continuum: implications for image descriptions and clinical databases. Proc Annu Symp Comput Appl Med Care 1992:383-7. As part of the Unified Medical Language System (UMLS) project, we have been exploring the use of semantic net representation to build a medical ontology that can adapt to the needs and perspective of differing kinds of users with varying purposes. A principal objective is to facilitate indexing and retrieval of objects in a variety of target databases, using their own source vocabularies, while maintaining the representation of concepts to which these source vocabularies refer in a single consistent form, so that retrievals that span resource types can be accommodated. In addition, a particular area of deficiency of the existing UMLS Metathesaurus is that of clinical findings, a part of the problem being the multiple alternative views and granularity levels at which clinical findings are described in different target databases. The problem is particularly obvious when one examines the way in which image findings are described, which may be at a purely perceptual level, or at varying levels of aggregation into higher level observations or interpretations. We have developed a recursive model for representing observations and interpretations in a semantic net along a continuum of degree of aggregation, that appears to lend itself well to adaptation to varying perspectives. Copyright by and reprinted with permission of the American Medical Informatics Association.

Komorowski J, Barr CE, Pattison-Gordon E, Greenes RA. Knowledge modeling for the Unified Medical Language System. In: Kangassalo H, Ohsuga S, Jaakkola H, editors. Information modelling and knowledge bases. Amsterdam: IOS; 1990. p. 313-17.

Masarie FE Jr, Miller RA, Bouhaddou O, Giuse NB, Warner HR. An interlingua for electronic interchange of medical information: using frames to map between clinical vocabularies. Comput Biomed Res 1991 Aug;24(4):379-400. The proliferation of medical knowledge has led to the development of extensive dictionaries for electronically accessing information resources. The task of standardizing terminology used for electronic hospital records and for knowledge bases for medical expert systems and indexing the medical literature cannot easily be met by developing a single, monolithic official medical vocabulary. Developing a monolithic vocabulary would require a massive effort, and its existence would not guarantee its use by third-party payors, by practicing clinicians, or by developers of electronic medical information systems. Recognizing this, the National Library of Medicine (NLM) has begun to develop the Unified Medical Language System (UMLS) as a means of promoting electronic information exchange among systems with controlled vocabularies. The authors describe a frame-based system developed as an experimental approach to mapping between controlled clinical vocabularies. Copyright 1991 Academic Press.

Miller PL, Barwick KW, Morrow JS, Powsner SM, Riely CA. Towards a conceptual scema of medical knowledge: facilitating transition between different computer-based forms of clinical information. In: Hammond WE. Proceedings of the AAMSI Congress 88; 1988 May 5-7; San Francisco. Washington (DC): American Association for Medical Systems and Informatics; 1988. p. 77-82.

Nelson SJ, Sheretz DD, Erlbaum MS, Tuttle MS. Representing medical knowledge in the form of structured text. The development of current disease descriptions. Proc Annu Symp Comput Appl Med Care 1989:66-70. As part of the Unified Medical Language System (UMLS) initiative, about 900 diseases have been described using "structured text." Structured text is words and short phrases entered under labelled contexts. Vocabulary is not controlled. The contexts comprise a template for the disease description. The structured text is both manipulable by machine and readable by humans. Use of the template was natural, and only a few problems arose in using the template. Instructions to disease description composers must be explicit in definitions of the contexts. Diseases to be described are chosen, after clustering related diseases, according to the distinctions that physicians practicing in the area under question believe are important. Limiting disease descriptions to primitive observations and to entities otherwise described within the corpus appears to be both feasible and desirable. Copyright by and reprinted with permission of the American Medical Informatics Association.

Return to title page | Return to table of contents

Elkin PL, McLatchey J, Packer M, Hoffer E, Cimino C, Studney D, Barnett GO. Automated batch searching of MEDLINE for DXplain. Proc Annu Symp Comput Appl Med Care 1989:436-40. To obtain references for the diseases in the DXplain database a generic search strategy was created and then combined with a communication protocol for MEDLINE. The system's efficacy has been tested on the concepts contained in the DXplain disease names. The system takes the DXplain disease name and identifies MeSH Terms or their equivalent from within this unstructured input. These terms are then utilized to search MEDLINE. How the searches are constructed and the order in which they are performed, depend upon a user defined script (protocol for querying MEDLINE). These scripts can be run repetitively to cover multiple concepts. This technique for searching MEDLINE was used to download citations that will be used as references for diseases in the DXplain Database. DXplain is a medical diagnostic aid program developed and maintained at the MGH. The script used was a 28 step algorithm which was designed to download the most recent review articles about each of the DXplain diseases. The system provides the user with the ability to specify the number of articles which he/she would like returned from MEDLINE. This paper describes the technique by which the articles were retrieved, as well as the review process and the success rate of the system in identifying appropriate articles. Copyright by and reprinted with permission of the American Medical Informatics Association.

Miller PL, Barwick KW, Morrow JS, Powsner SM, Riely CA. Semantic relationships and medical bibliographic retrieval: a preliminary assessment. Comput Biomed Res 1988 Feb;21(1):64-77. This paper describes a project exploring whether semantic relationships between bibliographic terms may effectively partition the clinical literature. To address this question, a set of semantic relationships was identified between pairs of bibliographic terms taken from four categories: (1) diseases, (2) treatments, (3) tests, and (4) patient characteristics. The MEDLINE system of the National Library of Medicine was used to generate lists of abstracts relating to pairs of clinical terms. Each list of abstracts was examined to identify the semantic relationships, if any, which applied to the two terms in each paper. The study suggests that semantic relationships may play a potentially valuable role in assisting computer-based medical bibliographic retrieval. The degree to which relationships partition the literature is strongly dependent on the underlying semantics of the particular bibliographic terms involved. Copyright 1988 Academic Press.

Miller PL, Morrow JS, Powsner SM, Riely CA. Semantically assisted medical bibliographic retrieval: an experimental computer system. Bull Med Libr Assoc 1988 Apr;76(2):131-6. An experimental computer-based bibliographic retrieval system has been implemented to explore how semantic (conceptual) relationships between MeSH terms might assist the retrieval process. To construct the experimental system's database, lists of abstracts were produced using MEDLINE. Each list contained papers discussing a specified pair of terms. Each abstract was then analyzed to determine the specific relationship(s) between the two terms discussed in that paper. The project then explored how these semantic relationships could be incorporated into the computer to enhance bibliographic retrieval. Copyright by and reprinted with permission of the Medical Library Association.

Miller PL, Smith P, Morrow JS, Riely CA, Powsner SM. Capturing the semantic relationship between clinical terms with current MeSH bibliographic coding. Comput Methods Programs Biomed 1988 Nov-Dec;27(3):205-11. This paper compares bibliographic retrieval using current MeSH (Medical Subject Headings) to bibliographic retrieval using explicitly coded semantic relationships between index terms. In a previous study, ten lists of abstracts, each list containing 20-40 papers discussing a specific pair of terms, were analyzed to identify the specific relationship(s) between those terms discussed in each paper. In the present study, we analyze how well current MeSH coding using topical subheadings and check tags, can selectively retrieve those papers discussing each semantic relationship.

Powsner SM, Barwick KW, Morrow JS, Riely CA, Miller PL. Coding semantic relationships for medical bibliographic retrieval: a preliminary study. Proc Annu Symp Comput Appl Med Care 1987:108-12. It is suggested that the coding of semantic relationships may permit more precise searches of the medical literature than conventional key/index term coding with Boolean operators for retrieval. Such semantic coding captures the distinction between papers concerned with how hepatitis B (HB) may cause/predispose to liver neoplasms (LN) and papers concerned with how HB may effect outcome in patients with LN. These distributions were demonstrated by retrieving sets of MEDLINE abstracts, each set relevant to two clinical terms. Each abstract was then reviewed to determine the implied semantic relationship(s) between the two terms. Even in the restricted realm of liver diseases a number of very different relationships between terms are addressed in the literature. In addition, coding 'no relationship' allows articles discussing LN and HB independently to be avoided. It is concluded that semantic-relationship coding may prove to be very helpful for retrieving concise reference lists, to support clinical decisions. Copyright 1987 IEEE. Reprinted, with permission.

Radow DP, Blake M, Howard E, Jones C, Milgrom L, Ostergard M, Shaffer E. Using the Metathesaurus for bibliographic retrieval: a pre-implementation study. Proc Annu Symp Comput Appl Med Care 1994:980.

Return to title page | Return to table of contents

Appel RD, Komorowski HJ, Barr CE, Greenes RA. Intelligent focusing in knowledge indexing and retrieval - the relatedness tool. Proc Annu Symp Comput Appl Med Care 1988:152-7. Most present day information retrieval systems use the presence or absence of certain words to decide which documents are appropriate for a user's query. This approach has had certain successes, but it fails to capture relationships between concepts represented by the words, and hence reduces the potential specificity of both indexing and searching of documents. A richer representation of the semantics of documents and queries, and methods for reasoning about these representations, have been provided by artificial intelligence. Navigational tools for browsing and authoring knowledge bases (KB's) add a convenient technique for focusing in the complex landscape of semantic representations. The center of such representations is usually a frame or a semantic network system. We are developing a prototype Unified Medical Language System (UMLS) taxonomy to represent objects and relationships in medicine. One focus of our research is improved methods for indexing and querying repositories of biomedical literature. The technique which we propose is based on the notion of relatedness of concepts. To this end we define heuristics which find related concepts and apply it to the UMLS taxonomy. Preliminary results from experiments with the implemented heuristics demonstrate its potential usefulness. Copyright by and reprinted with permission of the American Medical Informatics Association.

Komorowski HJ, Greenes RA, Barr C, Pattison-Gordon E. Browsing and authoring tools for a Unified Medical Language System. In: User-oriented content-based text and image handling. RIAO 88 Program. Conference with Presentation of Prototypes and Operational Demonstrations; 1988 Mar 21-24; Cambridge, MA. Paris: C.I.D.; 1988. p. 624-41.

Komorowski HJ, Greenes RA, Pattison-Gordon E. The use of fisheye views for displaying semantic relationships in a medical taxonomy. Proc Annu Symp Comput Appl Med Care 1987:113-6. One of the critical issues for development of the unified medical language system (UMLS), or taxonomy of medical terms, is the identification of semantic features and relationships that should be represented in the UMLS and the design of the appropriate structure for storing and displaying these features and relationships. One approach, the use of the 'fisheye view' for displaying a region of interest in a semantic network, is discussed. This approach narrows the field of view at any given time to include only the most important and immediate 'landmarks', or items. Any of the se could then be focused on, or another level of detail accessed. Copyright 1987 IEEE. Reprinted, with permission.

Lowe H, Barnett GO, Scott J, Mallon L, Ryan-Blewett D. Remote Access MicroMeSH: demonstration of an enhanced microcomputer system for searching the MEDLINE database. Proc Annu Symp Comput Appl Med Care 1989:1009-11. Remote Access MicroMeSH (RAMM) is a powerful but easy to use microcomputer system for searching the medical literature. RAMM uses MicroMeSH, a system for accessing the National Library of Medicine's (NLMs) Medical Subject Headings (MeSH) vocabulary, to facilitate offline creation and refinement of highly specific MEDLINE search queries. Using these queries, RAMM automatically searches and retrieves citations from the MEDLINE databases through the NLMs MEDical Literature Analysis and Retrieval System (MEDLARS). As search query creation and citation review are performed offline, the cost of online searching is minimized. Copyright by and reprinted with permission of the American Medical Informatics Association.

Lowe HJ, Barnett GO. MicroMeSH: a microcomputer system for searching and exploring the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary. Proc Annu Symp Comput Appl Med Care 1987:717-20. MicroMeSH is a microcomputer-based tool for searching and exploring a complete, keyworded version of the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary. MeSH is used to index the MEDLINE database. MicroMeSH allows the user to retrieve MeSH headings rapidly, using either a powerful search operation or a user-friendly MeSH Tree Walker. MicroMeSH can translate many commonly used biomedical terms to equivalent MeSH headings. The program also provides access to the MeSH subheadings vocabulary. The authors describe the operation of MicroMeSH and the computer hardware required to use the system. Copyright 1987 IEEE. Reprinted, with permission.

Lowe HJ, Barnett GO, Scott J, Mallon L, Blewett DR. Remote Access MicroMeSH: evaluation of a microcomputer system for searching the MEDLINE database. Proc Annu Symp Comput Appl Med Care 1989:445-7. Remote Access MicroMeSH (RAMM) is a powerful but easy-to-use microcomputer system for searching the MEDLINE database. RAMM incorporates MicroMeSH, a microcomputer implementation of the National Library of Medicine's (NLMs) Medical Subject Headings (MeSH) vocabulary. RAMM facilitates the creation of highly specific MEDLINE search queries. The goals in creating RAMM were to provide a system that could be used to search the medical literature and to teach the basic skills required to use MeSH and MEDLINE. During the past two years RAMM has been used by clinicians, library professionals, researchers, and students at Harvard Medical School and at selected academic sites in the US and Canada. In February of 1989, an effort to formally evaluate RAMM was begun. This paper describes the preliminary results of that evaluation. Copyright by and reprinted with permission of the American Medical Informatics Association.

Return to title page | Return to table of contents

Fu LS. A public domain unified medical language system (UMLS) patient database (hospital information systems, event definition structures) [dissertation]. [Salt Lake City]: The University of Utah; 1992. 132 p. Available from: University Microforms, Ann Arbor, MI; 9236271. This document has presented a research work under the Unified Medical Language System (UMLS) design and its conceptual framework as well as its experimental implementation has been introduced. Three hypotheses have been proposed and tested with real patient cases from two hospital information systems (HELP and VA). Primary conclusions of this work were: (1) The total number of MOI entries will not grow indefinitely. For a given domain, it tends to level-off as more data sources are added. (2) To date, Event Definition structures can correctly represent a selected subset of dictionary entries ranging from 83% to 99%. (3) The current automatic instantiation algorithm has proven to have an 87% success rate. (4) From the evidence shown using this prototype with the UMLS patient database, it is possible to combine electronic patient records from different patient information systems into a unified structure. Moreover, this prototype allows the user to visually inspect the distribution of relevant medical variables which demonstrates discrimination between disease and nondisease groups independent of the source of data. Also, several vital future enhancements were also discussed. Provided by UMI.

Fu LS, Bouhaddou O, Huff SM, Sorenson DK, Warner HR. Toward a public domain UMLS patient database. Proc Annu Symp Comput Appl Med Care 1990:170-4. The paper describes a unified structure with an associated vocabulary to represent and store patient cases derived from different computerized patient databases. The unified structure is based on the concept of event definitions which are generic templates for representing clinical data in a patient database. An implementation of this structure has been evaluated using patient cases from two expert system (Iliad and QMR) and a hospital information system (HELP). The primary focus of the UMLS patient database is to accumulate patient information from different sources and provide enhanced statistical estimates of clinically important variables. Inter-communication and navigation among medical information systems are other potential benefits of this unified computerized medical record system. Copyright by and reprinted with permission of the American Medical Informatics Association.

Fu LS, Huff S, Bouhaddou O, Bray B, Warner H. Estimating frequency of disease findings from combined hospital databases: a UMLS project. Proc Annu Symp Comput Appl Med Care 1991:373-7. Merging data from the Salt Lake VA hospital database and the LDS hospital HELP system into a UMLS sponsored unified patient database has demonstrated that distribution of variables within a disease is hospital independent. Although disease prevalence is clearly not the same among hospitals, analysis of data within a disease group across hospitals can be done using such a merged database. This unified patient database would allow study of unusual diseases not possible using data from a single institution. Copyright by and reprinted with permission of the American Medical Informatics Association.

Schuyler PL, McCray AT, Schoolman HM. A test collection for experimentation in bibliographic retrieval. Medinfo 1989;6(Pt 2):910-12. To establish an environment in which various search and retrieval experiments can be performed, a subset of MEDLINE, the National Library of Medicine's bibliographic database, has been created according to the following requirements: all citations have a 1986 journal publication year, are in English, and have an author-prepared abstract. The resulting file contain approximately 167000 citations. This is a reasonable size that satisfies several necessary conditions for conducting search and retrieval experiments. Accompanying this file are approximately one hundred and fifty questions asked by users of the MEDLINE database and approximately three thousand citations retrieved from the experimental file in response to these queries. Assessments of the relevance of the retrieved citations to the questions asked are also available.

Return to title page | Return to table of contents

The UMLS in Relation to Other Programs


Cimino JJ. Review paper: coding systems in health care . Methods Inf Med 1996 Dec;35(4-5):273-84. Computer-based patient data which are represented in a coded form have a variety of uses, including direct patient care, statistical reporting, automated decision support, and clinical research. No standard exists which supports all of these functions. Abstracting coding systems, such as ICD, CPT, DRGs and MeSH fail to provide adequate detail, forcing application developers to create their own coding schemes for systems. Some of these schemes have been put forward as possible standards, but they have not been widely accepted. This paper reviews existing schemes used for abstracting, electronic record systems, and comprehensive coding. It also discusses the remaining impediments to acceptance of standards and the current efforts to overcome them, including SNOMED, the Gabrieli Medical Nomenclature, the Read Clinical Codes, GALEN, and the Unified Medical Language System (UMLS).

Cimino JJ, Sengupta S. IAIMS and UMLS at Columbia-Presbyterian Medical Center. Med Decis Making 1991 Oct-Dec;11(4 Suppl):89S-93S. The authors use an example to illustrate combining Integrated Academic Information Management System (IAIMS) components (applications) into an integral whole, to facilitate using the components simultaneously or in sequence. They examine a model for classifying IAIMS systems, proposing ways in which the Unified Medical Language System (UMLS) can be exploited by them. Copyright 1991 Hanley and Belfus.

Frawley SJ. Building a Database of Data Sets for Health Services Research. Proc Annu Symp Comput Appl Med Care 1994:377-81. The Database of Data Sets (DB/DS) for Health Services Research will be an online searchable directory of data sets which are available, often with restrictions and confidentiality safeguards, for use by health care researchers. The DB/DS project is aimed at a wide audience, and intends to include a very broad range of health care data sets, ranging from state hospital discharge data bases, to national registries and health survey data sets, to institutional clinical databases. The intended users are the same community of researchers, policy-makers, administrators and practitioners who are served by the National Library of Medicine's current bibliographic databases. This paper describes a pilot phase of the DB/DS project in which the issues involved in creating such a database were explored with an initial set of 20 representative data sets. Copyright by and reprinted with permission of the American Medical Informatics Association.

Henry SB, Holzemer WL, Reilly CA, Campbell KE. Terms used by nurses to describe patient problems: can SNOMED III represent nursing concepts in the patient record? J Am Med Inform Assoc 1994 Jan-Feb;1(1):61-74. OBJECTIVE: To analyze the terms used by nurses in a variety of data sources and to test the feasibility of using SNOMED III to represent nursing terms. DESIGN: Prospective research design with manual matching of terms to the SNOMED III vocabulary. MEASUREMENTS: The terms used by nurses to describe patient problems during 485 episodes of care for 201 patients hospitalized for Pneumocystis carinii pneumonia were identified. Problems from four data sources (nurse interview, intershift report, nursing care plan, and nurse progress note/flowsheet) were classified based on the substantive area of the problem and on the terminology used to describe the problem. A test subset of the 25 most frequently used terms from the two written data sources (nursing care plan and nurse progress note/flowsheet) were manually matched to SNOMED III terms to test the feasibility of using that existing vocabulary to represent nursing terms. RESULTS: Nurses most frequently described patient problems as signs/symptoms in the verbal nurse interview and intershift report. In the written data sources, problems were recorded as North American Nursing Diagnosis Association (NANDA) terms and signs/symptoms with similar frequencies. Of the nursing terms in the test subset, 69% were represented using one or more SNOMED III terms. Copyright by and reprinted with permission of the American Medical Informatics Association.

Integrated Academic Information Management Systems (IAIMS). Bull Med Libr Assoc 1992 Jul;80(3):241-3. This paper reviews the proceedings of the symposia sponsored by the Integrated Academic Information Management Systems Association. IAIMS is described from the perspectives of information management, knowledge management, and information technology. Papers delivered at the symposia are reviewed. An overview of the National Library of Medicine and the IAIMS initiative at NLM, the use of an IAIMS project at Georgetown University to bring together multiple sources of information, and the linking of an IAIMS system to the Unified Medical Language System initiative at Yale are also discussed. Copyright by and reprinted with permission of the Medical Library Association.

Lindberg DA. Global information infrastructure. Int J Biomed Comput 1994 Jan;34(1-4):13-9. The High Performance Computing and Communications Program (HPCC) is a multiagency federal initiative under the leadership of the White House Office of Science and Technology Policy, established by the High Performance Computing Act of 1991. It has been assigned a critical role in supporting the international collaboration essential to science and to health care. Goals of the HPCC are to extend USA leadership in high performance computing and networking technologies; to improve technology transfer for economic competitiveness, education, and national security; and to provide a key part of the foundation for the National Information Infrastructure. The first component of the National Institutes of Health to participate in the HPCC, the National Library of Medicine (NLM), recently issued a solicitation for proposals to address a range of issues, from privacy to 'testbed' networks, 'virtual reality,' and more. These efforts will build upon the NLM's extensive outreach program and other initiatives, including the Unified Medical Language System (UMLS), MEDLARS, and Grateful Med. New Internet search tools are emerging, such as Gopher and 'Knowbots'. Medicine will succeed in developing future intelligent agents to assist in utilizing computer networks. Our ability to serve patients is so often restricted by lack of information and knowledge at the time and place of medical decision-making. The new technologies, properly employed, will also greatly enhance our ability to serve the patient.

Lindberg DA. The IAIMS opportunity: the NLM view. Bull Med Libr Assoc 1988 Jul;76(3):224-5.

McCormick KA, Zielstorff R. Building a Unified Nursing Language System (UNLS). In. Nursing data systems: the emergency framework. Washington (DC): American Nurses Publishing; 1995. p.143-9.

Paton JA, Belanger A, Cheung KH, Grajek S, Branch KA, Ikeda N, Sette L, Miller PL, Fryer RK. Online bibliographic information: integration into an emerging IAIMS environment. Proc Annu Symp Comput Appl Med Care 1992:605-9. The Medical Library at Yale University has developed an online free-text database containing Current Contents citations. The database was designed to be integrated into an emerging campus-wide information environment. To this end Current Contents at Yale was designed with a user interface familiar to the Yale community, an alerting service based on electronic mail, and search expansion using the National Library of Medicine's Meta-1 metathesaurus. Copyright by and reprinted with permission of the American Medical Informatics Association.

Paton JA, Clyman JI, Lynch P, Miller PL, Sittig DF, Berson BZ. Strategic planning for IAIMS: prototyping as a catalyst for change. Proc Annu Symp Comput Appl Med Care 1990:709-13. Yale School of Medicine has developed a prototype integrated computing and information environment as part of its strategic IAIMS planning. The prototype consists of a menu system and underlying network communications programs and networks to access a variety of medical information resources at Yale and elsewhere. This prototype has been used in testing user needs, in designing a technical architecture, in exploring related institutional issues, and as a basis for research in integrated access to medical information using UMLS tools and concepts. Copyright by and reprinted with permission of the American Medical Informatics Association.

Roderer NK. Dissemination of medical information: organizational and technological issues in health sciences libraries. Libr Trends 1993 Summer;42(1):108-26. This article describes five programs that have been particularly significant to the evolution of biomedical communications over the last twenty years: the National Network of Libraries of Medicine (NNLM), Integrated Academic Information Management Systems (IAIMS), National Research and Education Network (NREN), Unified Medical Language System (UMLS), and the electronic journal. The major implications that each of these programs will continue to have for health sciences librarianship are examined. Reprinted with permission from Library Trends. Copyright 1993 The Board of Trustees of the University of Illinois.

Scherrer JR. Medical languages: use, definition and processing in ward information systems (WIS). In: Adlassnig KP, Grabner G, Bengtsson S, Hansen R, editors. Medical informatics Europe 1991. Proceedings; 1991 Aug 19-22; Vienna, Austria. Berlin: Springer-Verlag; 1991. p. 19-27.

Scherrer JR. [New architectures destined for hospital computer networks opening the medical world to more communication facilities of every kind]. Schweiz Med Wochenschr 1990 Dec 8;120(49):1866-71. (Fre).

Siegel ER. High priority research at NLM. In: Information: the transformation of society. Proceedings of the 50th Annual Meeting of the American Society for Information Science; 1987 Oct 4-8; Boston, MA. Medford (NJ): Learned Information; 1987. p. 275-6. A growing body of medically related machine-readable data of at least four types is identified. These include: biomedical literature, clinical records, medically relevant data banks, and knowledge bases. The development of a unified medical language system (UMLS) is discussed. The use of artificial intelligence techniques in diagnosis and management is explored. Reproduced with permission of the American Society for Information Science.

Smith KA. Medical information systems (National Library of Medicine information services). Bull Am Soc Inf Sci 1986 Apr-May;12(4):17-8. This description of information services from the National Library of Medicine (NLM) highlights a new system for retrieving information from NLM's databases (GRATEFUL MED); a formal Regional Medical Library Network; DOCLINE; the Unified Medical Language System; and Integrated Academic Information Management Systems. Research and development and the future are discussed . Reproduced with permission of the American Society for information Science.

Tilley CB. Medical databases and health information systems. Annu Rev Inf Sci Tech 1990;25:313-82.

Return to title page | Return to table of contents

Commentaries and Opinions about UMLS


Bishop CW. Alternate approaches to a UMLS. Med Decis Making 1991 Oct-Dec;11(4 Suppl):S99-102. A scheme for the continuing development of Meta-1, a taxonomy of medical subjects based; on MeSH and other systems, is described. The objective is a single, structured classification for medical knowledge. Copyright 1991 Hanley and Belfus.

Bishop CW, Ewing P. Representing medical knowledge: reconciling the present or creating the future? MD Comput 1992 Jul-Aug;9(4):218-25. Modern technology has sparked the creation of computing systems that perform many medically related tasks, but communication between these systems is limited, in part by differences in the terminology used for various purposes and in part by the changing nature of medical concepts. The Unified Medical Language System represents an attempt to find a means of translation between diverse knowledge systems. An alternative, which we propose, is to agree on a knowledge base for the future and make use of present accomplishments in moving toward that goal. Copyright 1992 Springer-Verlag.

Cimino JJ. Saying what you mean and meaning what you say: coupling biomedical terminology and knowledge. Acad Med 1993 Apr;68(4):257-60.

Evans DA. Medical language processing: issues and methods in unifying medical concepts: reflections on the UMLS project. In: Managing information and technology. Proceedings of the 52nd Annual Meeting of the American Society for Information Science; 1989 Oct 30-Nov 2; Washington, DC. Medford (NJ): Learned Information; 1989. p. 252-3. Several medical-related language processing projects are examined: (1) the UMLS project, focusing on development of a "meta-thesaurus" for concepts in biomedicine and an "information sources map" to coordinate access to information across databases; (2) NYU's Medical Language Processor (MLP), a 5-stage computer processing system which analyzes sentences so that their informational elements are recognized and labeled for mapping into the correct database field; and (3) the Systemized Nomenclature of Medicine (SNOMED), which is a comprehensive indexing system for clinical medicine. Reproduced with permission of the American Society for Information Science.

Rudin JL. DART (Diagnostic Aid and Resource Tool): a computerized clinical decision support system for oral pathology. Compendium 1994 Nov;15(11):1316, 1318, 1320 passim.

Tuttle MS, Nelson SJ. The role of the UMLS in 'storing' and 'sharing' across systems. Int J Biomed Comput 1994 Jan;34(1-4):207-37. We will argue that 'sharing', 're-use', 're-purposing', and 'addition' of health care information is difficult, intrinsically; that the best way to overcome the difficulty is to start doing it, as soon as possible, and that the UMLS Knowledge Sources provide the best place to start. We recommend that the UMLS be used as a default source of biomedical concept names and relationships, as a comprehensive, data-based, 'reference model', and as an example of a large, ecumenical, evolving, continuously updated source of re-usable health care information.

Return to title page | Return to table of contents

Last updated: 24 April 1998
First published: 24 April 1998
Metadata| Permanence level: Permanent: Dynamic Content