NIST Scientific and Technical Databases NIST Scientific and Technical Databases NIST Homepage Databases Search Our Website

Data Home

Analytical Chemistry

Atomic and Molecular Physics

Biometrics

Biotechnology

Chemical and Crystal Structure

Chemical Kinetics

Chemistry

Communications

Construction

Environmental Data

Fire

Fluids

International Trade

Law Enforcement

Materials Properties

Mathematical Databases, Software and Tools

Optical Character Recognition

Physics

Product Design

Surface Data

Text and Video Retrieval

Thermophysical and Thermochemical

 

thin vertical line

NIST Special Database 8

NIST Machine-Print Database of Gray Scale and Binary Images (MPDB)

Link to Online Purchase Order Form Link to FAX or Mail Order Form

 

A sample of the data contained in this database is available via anonymous ftp at sequoyah.ncsl.nist.gov in the files sd8-README.txt and sd8.tar.Z [699K].

The NIST machine-printed database contains gray scale and binary images of machine printed pages.

There are 360 digitized pages on three CD-ROM discs. There are a total of 3,063,168 characters in the set which is an average of 8509 characters per page.

A reference file is included for each page. These reference files are the ASCII text pages that were used to generate the original hardcopy that was digitized.

This database is being distributed for use in the development and testing of Optical Character Recognition (OCR) systems on a common set of images. This allows vendors to report results with respect to this common image set.

You may browse the Users' Guide to see how this database works.

Each disc in this three-disc set contains approximately 593 megabytes of storage when the images are compressed. Uncompressed, each disc contains 1.1 gigabytes of data (1.85 :1 average compression ratio using JPEG and CCITT group 4 compression schemes).

The database has the following features:

  • 3 font styles: Bold, Italics, and Normal
  • 6 font types: Courier, Helvetica, New Century Schoolbook, Optima, Palatino, and Times Roman
  • 10 point sizes; 4, 5, 6, 7, 8, 10, 11, 12, 15, 17, and 20
  • randomly generated order and sequential ordered pages
  • 360 unique pages each having a gray scale and binary representation
  • 12 pixels/mm resolution
  • 360 text files containing page reference answers
  • image format documentation and example software

Suitable for automated machine-print research, development, and evaluation, the data set can be used for:

  • algorithm development
  • system training and testing
  • character segmentation: separating full page image into characters
  • character recognition: identifying specific machine-printed characters

The database is a valuable tool for measurement and comparison of system performance on machine-print pages.

System Requirements: CD-ROM drive with software to read ISO-9660 format.

Price: $90.00. Special pricing for multiple copies available. Call for details.

To order online, click here

Link to FAX or Mail Order FormSpec. Database 8. NIST Machine-Print DB of Gray Scale and Binary Images

For more information on Special Database 8 please contact:

Standard Reference Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 2310
Gaithersburg, MD 20899-2310


(301) 975-2008 (VOICE) / (301) 926-0416 (FAX) / Contact Us

The scientific contact for this database is:

Michael Garris
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Building 225, Room A216
Gaithersburg, MD 20899-8940
(301) 975-2928
michael.garris@nist.gov

Keywords: ASCII Reference, automated character recognition, automated data capture, binary, character recognition, font size, full page, Grayscale Image Database, machine print, NIST, OCR, optical character recognition, software recognition, style
.

[Online Databases] [New and Updated Databases]
[Database Price List] [JPCRD] [CODATA] [FAQ] [Comments] [NIST] [Data]

Create Date: 6/02
Last Update: Thursday, 06-Mar-03 15:42:04
Contact Us