NSF LogoNSF Award Abstract - #0205448 AWSFL008-DS3

ITR: Mining the Bibliome -- Information Extraction from the Biomedical
Literature

NSF Org IIS
Latest Amendment Date July 28, 2004
Award Number 0205448
Award Instrument Continuing grant
Program Manager Sylvia J. Spengler
IIS DIV OF INFORMATION & INTELLIGENT SYSTEMS
CSE DIRECT FOR COMPUTER & INFO SCIE & ENGINR
Start Date September 1, 2002
Expires August 31, 2007 (Estimated)
Expected Total Amount $3499759 (Estimated)
Investigator Aravind K. Joshi joshi@linc.cis.upenn.edu (Principal Investigator current)
Mark Liberman (Co-Principal Investigator current)
Martha S. Palmer (Co-Principal Investigator current)
Susan B. Davidson (Co-Principal Investigator current)
Fernando C. Pereira (Co-Principal Investigator current)
Sponsor U of Pennsylvania
Research Services
Philadelphia, PA 191046205 215/898-7293
NSF Program 1687 ITR MEDIUM (GROUP) GRANTS
Field Application 0000099 Other Applications NEC
Program Reference Code 1655,9218,HPCC,

Abstract

EIA-0205448 Joshi, Aravind University of Pennsylvania

ITR: Mining the Bibliome -- Information Extraction from the Biomedical Literature

The major goal is the development of qualitatively better methods for automatically extracting information from the biomedical literature, relying on recent research in high-accuracy parsing and shallow semantic analysis. The special focus will be on information relevant to drug development, in collaboration with researchers in the Knowledge Integration and Discovery Systems group at GlaxoSmithKline.

This project will also address several database research problems, including methods for modeling complex, incomplete and changing information using semistructured data, and also ways to connect the text analysis process to an information integration environment that can deal with the wide variety of extant bioinformatic data models, formats, languages and interfaces.

The engine of recent progress in language processing research has been linguistic data: text corpora, treebanks, lexicons, test corpora for information retrieval and information extraction, and so on. Much of this data has been created by Penn researchers and published by Penn's Linguistic Data Consortium. Hence, one of our major goals is to develop and publish new linguistic resources in three categories: a large corpus of biomedical text annotated with syntactic structures `Treebank' and shallow semantic structures (proposition bank or `Propbank'; several large sets of biomedical abstracts and full-text articles annotated with entities and relations of interest to drug developers, such as enzyme inhibition by various compounds or genotype/phenotype connections `Factbanks'; and broad-coverage lexicons and tools for the analysis of biomedical texts.


You may also retrieve a text version of this abstract.
Please report errors in award information by writing to: award-abstracts-info@nsf.gov.

Please use the browser back button to return to the previous screen.

If you have trouble accessing any FastLane page, please contact the FastLane Help Desk at 1-800-673-6188