Converting of paper documents into electronic files through document imaging helps the Department of Energy (DOE) manage, store, access and archive the organizational information we have “locked up” in paper documents.
The DOE document imaging
system utilizes high-quality document scanners, a top-end six-engine Optical Character Recognition (OCR) system, and
Quality Controls.
Once converted, these electronic files can be indexed and searched, stored easier, and accessed and distributed faster, easier and cheaper than their paper originals.
DOE's standard file format for static document archival is Acrobat Image + Text.
This is an Acrobat file that contains the actual scans of the pages for viewing and printing purposes, and has the OCR'd text behind the image for indexing and searching.
By containing the actual scanned page no information is lost, all handwriting, charts, photos, etc. are viewed and printed.
With OCR'd text behind the image these files can be indexed and searched like any other text-based file, and the text can be copied or exported if desired.
Acrobat files can be indexed with the Acrobat program, document storage and management systems, and search engines that can be PC, file server or web server based.
As an example, on a local hard drive, a directory of 100,000 pages is searched in under one second.
|