Searching for Words and Phrases
Your search can include words and phrases separated by a comma. By default,
words and phrases in the query are stemmed, meaning the search is broadened
to include the stemmed variations of these words.
The query below will search for the phrase "press releases" and stemmed
variations of the word "wetlands":
-
press releases, wetlands
You may want to search for the word "wetlands" and not the word along
with all of its stemmed variations. To do this, you just delimit the search
term in double-quotation marks. For example, the following query will
search for the phrase "Region 1" and the word "wetlands":
-
"Region 1", "wetlands"
If searching for chemical ID numbers like 83-32-9 you can get better
results by leaving out the hyphens and submitting the search as 83 32
9.
Note that searches are not case-sensitive by default. This means you
can use "Region 1" or "region 1" in the above examples and get the same
search results.
Using Query Language
You can use operators and modifiers to apply logic to your query and
pinpoint the exact information you are interested in. Popular operators
are: AND, OR, and NEAR. A modifier can be used with an operator to further
define your question for the search engine. Frequently-used modifiers
are: MANY and NOT. By default, the words "and," "or," and "not" are interpreted
as query language; all other query language elements, such as the NEAR
operator, are interpreted as words unless surrounded by angle brackets.
Sample query expressions using query language are below.
The AND operator selects documents that contain all of the search elements
you specify. To find documents that contain both "Region 1" and at least
one stemmed variation of the word "wetlands" you can use the following
query:
-
"Region 1" and wetlands
The OR operator selects documents that show evidence of at least one
of the search elements. To find documents that contain either "Region
1" or at least one stemmed variation of the word "wetlands" you can use
the following query:
-
"Region 1" or wetlands
Searching Zones and Fields
Zones and Fields can be searched to help narrow a search set or to retrieve
specific types of documents. The collection administrator defines the
zones and fields that are available on the EPA search engine and many
of these have been created to elicit customized search results for specific
document types on the EPA Web site.
Zones
Zones are specific regions of a document to which searches can be limited.
Using the zone filter, the Verity search engine builds zone information
into the collection's full-word index, which allows quick and efficient
searches over zones.
To search for a term in a specific zone, the operator is used.
For example, to search for a URL that contains the word "water", type:
-
water <IN> URL_ZONE
Most zones are defined by tags found in the HTML code of a web page.
Below is a list of potentially useful zones used on the EPA Web site.
ZONE
|
DESCRIPTION
|
AREA_ZONE
|
Metatag field which indicates the Lotus Notes database from which
a dynamically generated web page is created.
|
BLOCKQUOTE
|
HTML tag used to create block quotes, or indented text.
|
BODY
|
HTML tag used to indicate the main body of text on a web page.
|
CAPTION
|
HTML tag used to display a caption or title either directly above
or below a table.
|
CITE
|
HTML style element indicating text that is used as a citation.
It is usually rendered as italic text.
|
CODE
|
HTML style element indicating text that is a sample of computer
code. It is usually rendered with a fixed-width font.
|
DESCRIPTION_ZONE
|
Metatag field used to provide a brief summary description of a
web page.
|
FORM
|
HTML element used to delimit the range of data fields for a form
on a web page.
|
GROUP_ZONE
|
Eight (8) character code (TSSMS) that indicates the specific office
or division of EPA responsible for a web page.
|
H1
|
HTML tag that identifies a Level 1 Heading.
|
H2
|
HTML tag that identifies a Level 2 Heading.
|
H3
|
HTML tag that identifies a Level 3 Heading.
|
H4
|
HTML tag that identifies a Level 4 Heading.
|
H5
|
HTML tag that identifies a Level 5 Heading.
|
H6
|
HTML tag that identifies a Level 6 Heading.
|
HEAD
|
Top level element which encapsulates information about the HTML document.
|
KEYWORDS_ZONE
|
Metatag field used to provide keywords that describe the content
of a web page.
|
NAME_ZONE
|
Name of individual who last updated a web page.
|
STYLE
|
HTML element used in the document HEAD section to indicate style
information for the entire document.
|
TEXTAREA
|
HTML tag that indicates a multi-line text entry field (used on forms).
|
TITLE
|
HTML tag indicating the title of the web page.
|
URL_ZONE
|
The Uniform Resource Locator (URL) which uniquely identifies the
web page and its location.
|
Fields
Fields are extracted from the document and stored in the collection
for retrieval and searching, and can be returned on a results list.
Many commercial word processor applications include fields that can
be searched.
A region of text must first be defined as a zone in order to be a field.
Depending on how an administrator has defined the collection, a region
of text can be only a zone, or it can be both a field and a zone.
To search for a term in a specific field, the operator is
used. For example, to search for the word "water" in the HTML meta data
keywords, type:
-
Keywords <CONTAINS> water
Many of the fields defined for the EPA Web site provide information
about file attributes, as well as content. Below is a list of potentially
useful fields used on the EPA Web site.
FIELD
|
DESCRIPTION
|
AUTHOR
|
Metatag field that indicates the author of the document file.
|
AREA
|
Metatag field which indicates the Lotus Notes database from which
a dynamically generated web page is created.
|
COMMENTS
|
Metatag field used to provide author comments.
|
CREATED
|
Date the file was created.
|
DATE
|
Last date the file was modfied or created.
|
DESCRPITION
|
Metatag field used to provide a brief summary description of a
web page.
|
GROUP
|
Eight (8) character code (TSSMS) that indicates the specific office
or division of EPA responsible for a web page.
|
KEYWORDS
|
Metatag field used to provide keywords that describe the content
of a web page.
|
MIME-TYPE
|
Software application file type. Indexed file types on the EPA Web site include: application/pdf, text/plain, and application/html.
|
MODIFIED
|
Last date the file was modfied or created.
|
NAME
|
User name of file owner.
|
SIZE
|
File size in bytes.
|
SNIPPET
|
The first 400 printable characters of a document.
|
SUBJECT
|
Field that indicates the subject matter of the file.
|
TITLE
|
HTML tag indicating the title of the web page.
|
URL
|
The Uniform Resource Locator (URL) which uniquely identifies the
web page and its location.
|
Proximity Search Methods
There are several search methods for doing proximity searches. A proximity
search looks for documents containing search terms within close proximity
of each other. The following operators enable proximity search methods:
NEAR, PHRASE, SENTENCE, PARAGRAPH.
The NEAR operator selects documents containing specified search terms
within close proximity to each other. Document scores are calculated based
on the relative number of words between search terms; the closer the search
terms, the higher the score. To find documents that contain the phrase
"Region 1" and stemmed variations of the word "wetlands" within close
proximity to each other, you can use this query:
-
"Region 1"<NEAR>wetlands
The SENTENCE and PARAGRAPH operators are used to specify a search within
a sentence or paragraph. The syntax for using these operators is similar.
To find documents that contain the phrase "Region 1" and stemmed variations
of the word "wetlands" within the same paragraph, you can use this query:
-
"Region 1"<PARAGRAPH>wetlands
Excluding Information
Want to exclude something from a search? That's what the NOT modifier
does. For example, to find documents containing stemmed variations of
the words "ORD" and "pesticide" in close proximity to each other, but
not stemmed variations of the word "water", you enter this query:
-
ORD<NEAR>pesticide<AND> <NOT>water
|