Lister Hill Center Logo  
Search Tips
About the Lister Hill Center
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Innovative Research
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Publications and Lectures
Blue Arrow
Blue Arrow
Blue Arrow
Training and Employment
Blue Arrow
Blue Arrow
LHNCBC: Document Abstract
Year: 2001Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
LHNCBC-2001-011
Automated Labeling in Document Images
Kim J, Le DX, Thoma GR
Proc. SPIE, Document Recognition and Retrieval VIII. 2001 Jan;4307:111-22.
The National Library of Medicine (NLM) is developing an automated system to produce bibliographic records for its MEDLINE database. This system, named Medical Article Record System (MARS), employs document image analysis and understanding techniques and optical character recognition (OCR). This paper describes a key module in MARS called the Automated Labeling (AL) module, which labels all zones of interest (title, author, affiliation, and abstract) automatically. The AL algorithm is based on 120 rules that are derived from an analysis of journal page layouts and features extracted from OCR output. Experiments carried out on more than 11,000 articles in over 1,000 biomedical journals show the accuracy of this rule-based algorithm to exceed 96%.
PDF