Lister Hill Center Logo  
Search Tips
About the Lister Hill Center
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Innovative Research
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Publications and Lectures
Blue Arrow
Blue Arrow
Blue Arrow
Training and Employment
Blue Arrow
Blue Arrow
LHNCBC: Document Abstract
Year: 2003Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
LHNCBC-2003-014
Automated Labeling of Bibliographic Data Extracted from Biomedical Online Journals
Kim J, Le DX, Thoma GR
Proc. SPIE Electronic Imaging. 2003 Jan;5010: 47-56.
A prototype system has been designed to automate the extraction of bibliographic data (e.g., article title, authors, abstract, affiliation and others) from online biomedical journals to populate the National Library of Medicine's MEDLINE database. This paper describes a key module in this system: the labeling module that employs statistics and fuzzy rule-based algorithms to identify segmented zones in an article's HTML pages as specific bibliographic data. Results from experiments conducted with 1,149 medical articles from forty-seven journal issues are presented.
PDF