Skip Navigation Links
Centers for Disease Control and Prevention
 CDC Home Search Health Topics A-Z

National Center for Chronic Disease Prevention and Health Promotion
Tobacco Industry Documents
Home | Industry Sites | Smoking & Health Database | Site Map | Contact Us
Photo of a document box
Search
Site Contents

Department of Health and Human Services


About the
Minnesota Select Set

The Minnesota Select Set  

is a subset, containing approximately 380,000 pages, of the total 27 million pages that were made available during Minnesota litigation – State of Minnesota vs Philip Morris Inc., et al. The trial opened in 1994 and concluded in May of 1998. The Minnesota Tobacco Document Depository was established on Nov. 7, 1995. The Minnesota Select Set documents were "selected" by Minnesota attorneys as key to the trial. When the documents were loaded into this database, objective coding was added along with OCR text. Although the OCR text is not viewable at this time (GIF images, however, are viewable) the addition of the OCR to the database structure allows the documents to be fully searched. The Minnesota Select Set presented on this Web site provides the public with easy, text-searchable access to a valuable portion of tobacco industry documents.

How to use the Minnesota Select Set
On this site, users are able to locate desired documents by putting information into any or all of eight fields – company, title, author, beginning Bates Number, ending Bates Number, box number, date published, and text. Keyword searching of the entire document is also available. Once the choices for the fields have been assigned and entered, the user will be shown a citation for that particular document or documents. The user is then able to look directly at a GIF image of the document. GIF images are used because they are readable on all browsers and do not require plug-in software. The document can also be printed or downloaded (downloading varies according to the browser that is used). A search of the Minnesota Select Set provides users with a selected set (about 30 thousand documents) of the approximately 4 million tobacco industry documents that are stored in the Minnesota Tobacco Documents Depository and that may also be available on the tobacco companies' Web sites.

Bates Number Problems
Alpha/numeric Bates Numbers pose particular retrieval problems. For some companies, the space between in an alpha/numeric Bates Number is recognized, and users can search using the space. However, other companies do not recognize the space and a "0" (zero) must be inserted into the Bates Number between the alpha and numeric portion of the Bates Number. If you do not retrieve a document using the Bates Number, try deleting the space or replacing the space with a "0" (zero).

Box Number Problems
At this time, Philip Morris documents can not be searched by the box numbers. Additionally, results sets for Philip Morris will display Bates Numbers instead of box numbers. This will be corrected in the near future.

OCR Problems
A great portion of the documents are of very poor quality. A large percentage are old, yellowed, and worn. Many were typed on typewriters with broken characters. Compounding this, some documents have been reproduced and faxed so often, they are almost illegible. The handwritten marginalia also is extremely difficult to capture by OCR. Because of all these issues, much of the OCR contains errors such as "toaco" instead of "tobacco." The OCR's will not be displayed at this time. Work is in progress to find solutions to correct these problems.


Privacy Policy | Accessibility

Home | Industry Sites | Smoking & Health Database | Site Map | Contact Us

CDC Home | Search | Health Topics A-Z

This page last reviewed March 05, 2002

United States Department of Health and Human Services
Centers for Disease Control and Prevention

National Center for Chronic Disease Prevention and Health Promotion
Office on Smoking and Health