|
|
About the
Minnesota Select Set
The Minnesota Select Set
is
a
subset,
containing
approximately
380,000
pages,
of
the
total
27
million
pages
that
were
made
available
during
Minnesota
litigation
–
State
of
Minnesota
vs
Philip
Morris
Inc.,
et
al.
The
trial
opened
in
1994
and
concluded
in
May
of
1998.
The
Minnesota
Tobacco
Document
Depository
was
established
on
Nov.
7,
1995.
The
Minnesota
Select
Set
documents
were
"selected"
by
Minnesota
attorneys
as
key
to
the
trial.
When
the
documents
were
loaded
into
this
database,
objective
coding
was
added
along
with
OCR
text.
Although
the
OCR
text
is
not
viewable
at
this
time
(GIF
images,
however,
are
viewable)
the
addition
of
the
OCR
to
the
database
structure
allows
the
documents
to
be
fully
searched.
The
Minnesota
Select
Set
presented
on
this
Web
site
provides
the
public
with
easy,
text-searchable
access
to
a
valuable
portion
of
tobacco
industry
documents.
How
to
use
the
Minnesota
Select
Set
On
this
site,
users
are
able
to
locate
desired
documents
by
putting
information
into
any
or
all
of
eight
fields
–
company,
title,
author,
beginning
Bates
Number,
ending
Bates
Number,
box
number,
date
published,
and
text.
Keyword
searching
of
the
entire
document
is
also
available.
Once
the
choices
for
the
fields
have
been
assigned
and
entered,
the
user
will
be
shown
a
citation
for
that
particular
document
or
documents.
The
user
is
then
able
to
look
directly
at
a
GIF
image
of
the
document.
GIF
images
are
used
because
they
are
readable
on
all
browsers
and
do
not
require
plug-in
software.
The
document
can
also
be
printed
or
downloaded
(downloading
varies
according
to
the
browser
that
is
used).
A
search
of
the
Minnesota
Select
Set
provides
users
with
a
selected
set
(about
30
thousand
documents)
of
the
approximately
4
million
tobacco
industry
documents
that
are
stored
in
the
Minnesota
Tobacco
Documents
Depository
and
that
may
also
be
available
on
the
tobacco
companies'
Web
sites.
Bates
Number
Problems
Alpha/numeric
Bates
Numbers
pose
particular
retrieval
problems.
For
some
companies,
the
space
between
in
an
alpha/numeric
Bates
Number
is
recognized,
and
users
can
search
using
the
space.
However,
other
companies
do
not
recognize
the
space
and
a
"0"
(zero)
must
be
inserted
into
the
Bates
Number
between
the
alpha
and
numeric
portion
of
the
Bates
Number.
If
you
do
not
retrieve
a
document
using
the
Bates
Number,
try
deleting
the
space
or
replacing
the
space
with
a
"0"
(zero).
Box
Number
Problems
At
this
time,
Philip
Morris
documents
can
not
be
searched
by
the
box
numbers.
Additionally,
results
sets
for
Philip
Morris
will
display
Bates
Numbers
instead
of
box
numbers.
This
will
be
corrected
in
the
near
future.
OCR
Problems
A
great
portion
of
the
documents
are
of
very
poor
quality.
A
large
percentage
are
old,
yellowed,
and
worn.
Many
were
typed
on
typewriters
with
broken
characters.
Compounding
this,
some
documents
have
been
reproduced
and
faxed
so
often,
they
are
almost
illegible.
The
handwritten
marginalia
also
is
extremely
difficult
to
capture
by
OCR.
Because
of
all
these
issues,
much
of
the
OCR
contains
errors
such
as
"toaco"
instead
of
"tobacco."
The
OCR's
will
not
be
displayed
at
this
time.
Work
is
in
progress
to
find
solutions
to
correct
these
problems.
|