Searching for Words and Phrases
Using Query Language
Searching Zones and Fields
Proximity Search Methods
Excluding Information
Searching for Words and Phrases
Your search can include words and phrases separated by a comma.
By default, words and phrases in the query
are stemmed, meaning the search is broadened to include the stemmed
variations of these words.
The query below will search for the phrase "press releases" and
stemmed variations of the word "wetlands":
- press releases, wetlands
You may want to search for the word "wetlands," but not
the word along with all of its stemmed variations. To do this, you just
delimit the search term in double-quotation marks. For example, the
following query will search for the phrase "Region 1" and the word
"wetlands":
- "Region 1", "wetlands"
If searching for chemical ID numbers like 83-32-9 you can get better results by leaving out the hyphens and submitting the
search as 83 32 9.
Note that searches are not case-sensitive by default. This means you
can use "Region 1" or "region 1" in the above examples and get the same
search results.
Using Query Language
You can use operators and modifiers to apply logic to your query and
pinpoint the exact information you are interested in. Popular operators are:
AND, OR, and NEAR. A modifier can be used with an operator to
further define your question for the search engine. Frequently-used
modifiers are: MANY and NOT. By default, the words "and," "or," and "not" are
interpreted as query language; all other query language elements,
such as the NEAR operator, are interpreted as words unless surrounded by
angle brackets. Sample query expressions using query language are below.
The AND operator selects documents that contain all of the search elements
you specify. To find documents that contain both
"Region 1" and at least one stemmed variation of the word "wetlands" you
can use the following query:
- "Region 1" and wetlands
The OR operator selects documents that show evidence of at least one of
the search elements. To find documents that contain either
"Region 1" or at least one stemmed variation of the word
"wetlands" you can use the following query:
- "Region 1" or wetlands
Searching Zones and Fields
Zones and Fields can be searched to help narrow a search set or to retrieve specific types
of documents. The collection administrator defines the zones and fields that are available
on the EPA search engine and many of these have been created to elicit customized search results
for specific document types on the EPA Web site.
Zones
Zones are specific regions of a document to which searches can be limited. Using the zone
filter, the Verity search engine builds zone information into the collection’s full-word
index, which allows quick and efficient searches over zones.
To search for a term in a specific zone, the operator <IN> is used. For example,
to search for a URL that contains the word "water", type:
- water <IN> URL_ZONE
Most zones are defined by tags found in the HTML code of a Web page. Below is a list of
potentially useful zones used on the EPA Web site.
ZONE |
DESCRIPTION |
AREA_ZONE |
Metatag field that indicates the Lotus Notes database from which a dynamically generated Web page is created. |
BLOCKQUOTE |
HTML tag used to create block quotes or indented text. |
BODY |
HTML tag used to indicate the main body of text on a Web page. |
CAPTION |
HTML tag used to display a caption or title either directly above or below a table. |
CITE |
HTML style element indicating text that is used as a citation. It is usually rendered as italic text. |
CODE |
HTML style element indicating text that is a sample of computer code. It is usually rendered with a fixed-width font.
DESCRIPTION_ZONE |
Metatag field used to provide a brief summary description of a Web page. |
FORM |
HTML element used to delimit the range of data fields for a form on a Web page. |
GROUP_ZONE |
Eight (8) character code (TSSMS) that indicates the specific office or division of
EPA responsible for a Web page. |
H1 |
HTML tag that identifies a Level 1 Heading. |
H2 |
HTML tag that identifies a Level 2 Heading. |
H3 |
HTML tag that identifies a Level 3 Heading. |
H4 |
HTML tag that identifies a Level 4 Heading. |
H5 |
HTML tag that identifies a Level 5 Heading. |
H6 |
HTML tag that identifies a Level 6 Heading. |
HEAD |
Top level element that encapsulates information about the HTML document. |
KEYWORDS_ZONE |
Metatag field used to provide keywords that describe the content of the Web page. |
NAME_ZONE |
Name of individual who last updated a Web page.
|
STYLE |
HTML element used in the document HEAD section to indicate style information for the entire document.
|
TEXTAREA |
HTML tag that indicates a multi-line text entry field (used on forms). |
TITLE |
HTML tag indicating the title of the Web page. |
URL_ZONE |
The Uniform Resource Locator (URL), which uniquely identifies the Web page and its location. |
Fields
Fields are extracted from the document and stored in the collection for retrieval and
searching, and can be returned on a results list. Many commercial word processor applications
include fields that can be searched.
A region of text must first be defined as a zone in order to be a field. Depending on how
an administrator has defined the collection, a region of text can be only a zone, or it
can be both a field and a zone.
To search for a term in a specific field, the operator <CONTAINS> is used. For example,
to search for the word "water" in the HTML meta data keywords, type:
- Keywords <CONTAINS> water
Many of the fields defined for the EPA Web site provide information about file attributes,
as well as content. Below is a list of potentially useful fields used on the EPA
Web site.
FIELD |
DESCRIPTION |
AUTHOR |
Metatag field that indicates the author of the document file. |
AREA |
Metatag field which indicates the Lotus Notes database from which a dynamically generated Web page is created. |
COMMENTS |
Metatag field used to provide author comments. |
CREATED |
Date the file was created. |
DATE |
Last date the file was modified or created. |
DESCRIPTION |
Metatag field used to provide a brief summary description of a Web page.
|
GROUP |
Eight (8) character code (TSSMS) that indicates the specific office or division of
EPA responsible for a Web page. |
KEYWORDS |
Metatag field used to provide keywords that describe the content of a Web page. |
MIME-TYPE |
Software application file type. Indexed file types on the EPA Web site include: application/pdf, text/plain, and application/html. |
MODIFIED |
Last date the file was modified or created. |
NAME |
User name of file owner. |
SIZE |
File size in bytes. |
SNIPPET |
The first 400 printable characters of a document. |
SUBJECT |
Field that indicates the subject matter of the file. |
TITLE |
HTML tag indicating the title of the Web page. |
URL |
The Uniform Resource Locator (URL) which uniquely identifies the Web page and its location. |
Proximity Search Methods
There are several search methods for doing proximity searches. A proximity
search looks for documents containing search terms within close proximity
of each other. The following operators enable proximity search methods:
NEAR, PHRASE, SENTENCE, PARAGRAPH.
The NEAR operator selects documents containing specified search terms
within close proximity to each other. Document scores are calculated
based on the relative number of words between search terms; the closer
the search terms, the higher the score. To find documents that contain
the phrase "Region 1" and stemmed variations of the word "wetlands" within
close proximity to each other, you can use this query:
- "Region 1"<NEAR>wetlands
The SENTENCE and PARAGRAPH operators are used to specify a search within
a sentence or paragraph. The syntax for using these operators is similar.
To find documents that contain the phrase "Region 1" and stemmed variations
of the word "wetlands" within the same paragraph, you can use this
query:
- "Region 1"<PARAGRAPH>wetlands
Excluding Information
Want to exclude something from a search? That's what the NOT modifier
does. For example, to find documents containing stemmed variations of the
words "ORD" and "pesticide" in close proximity to each other,
but not stemmed variations of the word "water", you enter this query:
- ORD<NEAR>pesticide<AND> <NOT>water
|