NIST Scientific and Technical Databases NIST Scientific and Technical Databases NIST Homepage Databases Search Our Website

Data Home

Analytical Chemistry

Atomic and Molecular Physics

Biometrics

Biotechnology

Chemical and Crystal Structure

Chemical Kinetics

Chemistry

Communications

Construction

Environmental Data

Fire

Fluids

International Trade

Law Enforcement

Materials Properties

Mathematical Databases, Software and Tools

Optical Character Recognition

Physics

Product Design

Surface Data

Text and Video Retrieval

Thermophysical and Thermochemical

 

thin vertical line

NIST Special Database 2

NIST Structured Forms Reference Set of
Binary Images (SFRS)

Link to Online Purchase Order Form  To Order

The NIST Structured Forms Database consists of 5,590 pages of binary, black-and-white images of synthesized documents.

The documents in this database are 12 different tax forms from the IRS 1040 Package X for the year 1988. These include Forms 1040, 2106, 2441, 4562, and 6251 together with Schedules A, B, C, D, E, F, and SE.

Eight of these forms contain two pages or form faces; therefore, there are 20 different form faces represented in the database.

The document images in this database appear to be real forms prepared by individuals, but the images have been automatically derived and synthesized using a computer.

There are 900 simulated tax submissions represented in the database averaging 6.2 form faces per submission. This significant new database totals approximately 5.9 gigabytes of uncompressed image data including image format documentation and example software.

The database has the following features:

  • 900 simulated tax submissions
  • 5,590 images of completed structured form faces
  • 300 pixel/inch resolution
  • 5,590 text files containing entry field answers
  • 20 tables of entry field types and contexts
  • image format documentation and example software

Suitable for both document processing and automated data capture research, development, and evaluation, the data set can be used for:

  • forms identification
  • field isolation; locating the entry fields on the form
  • character segmentation: separating entry field values into characters
  • character recognition: identifying specific machine printed characters

This database is a valuable tool for measurement of system performance and system comparison on complex forms.

System Requirements: CD-ROM drive with software to read ISO-9660 format.

Price: $90.00. Special pricing for multiple copies available. Call for details.

Please click here to view the PDF version of Users' Guide.

To order online, click here

Link to FAX or Mail Order FormSpec. DB 2. NIST Structured Forms Reference Set of Binary Images (SFRS)

For more information on Special Database 2 please contact:

Standard Reference Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 2310
Gaithersburg, MD 20899-2310

(301) 975-2008 (VOICE) / (301) 926-0416 (FAX) / Contact Us

The scientific contact for this database is:

Michael Garris
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Building 225, Room A216
Gaithersburg, MD 20899-8940
(301) 975-2928
michael.garris@nist.gov

Keywords: ASCII Reference, automated character recognition, automated data capture, Binary Image Database, forms identification, image format documentation, IRS, NIST, Machine Print, OCR, optical character recognition, printed characters, software recognition, synthesized documents, tax forms.


[Online Databases] [New and Updated Databases]
[Database Price List] [JPCRD] [CODATA] [FAQ] [Comments] [NIST] [Data]

Create Date: 6/02
Last Update: Friday, 19-Mar-04 08:17:35
Contact Us