Preparing Documents for iPlanet Compass Server

Contents:


Your Documents and iPlanet Compass Server

There are two aspects to preparing your documents for iPlanet Compass Server:

iPlanet Compass Server is based on an index of documents. When you perform a search, it is the information in the index that is examined. When a document is listed, it is the information in the index that is displayed. iPlanet Compass Server extracts two different kinds of information from each document to build its index:

The following guidelines describe how to prepare your documents so that iPlanet Compass Server users can more easily find the information that they need.

Note however that:


Making It Easier for Readers to Find Your Documents

By default, the server index contains four different kinds of information:

You use iPlanet Compass Server to search the index for any or all of these kinds of information. For example, if you enter "Abraham Lincoln" in the search box, you will find document about Abraham Lincoln and documents written by Abraham Lincoln.


About Keywords

Keywords are words that identify the contents of a document. You use iPlanet Compass Server to search for documents containing the keywords you are interested in.

For example, the keywords for an essay on the life of Thomas Jefferson might include Jefferson, presidents, America, United States, Declaration of Independence, consitution, history, Monticello, founders, founding fathers, revolution, and so on. If you wanted to find documents containing information about Jefferson and Monticello, you would search for those two keywords.

Keywords are the most important element in a document search. By making sure that the index contains the right keywords for your documents you make it easier for users to find the information you want them to have.

Having the right keywords to describe your document is far more important than the number of keywords. (While iPlanet Compass Server can accommodate a maximum of 1Mb of keywords for each document, it is unlikely that any index entry would ever approach that limit.)

By default, the server index obtains its list of keywords from four different document sources:


Document Content and Keywords

By default, the index's list of keywords for a document come from the following content sources:

(Keywords can also come from META tags.)

Headings and Keywords

The most obvious place to look for keywords is in chapter and section headings.

The unique words in all of the level <h1>, <h2>, and <h3>, heads are automatically listed as keywords. For example, the heading Lincoln at Gettysburg would produce two useful keywords: Lincoln and Gettysburg.

To ensure that your documents headings are helpful in generating keywords, follow these rules:

By default, the server only generates keywords from the first three heading levels. (Your administrator can specify more or fewer levels.)

Opening Text and Keywords

Much of the information that iPlanet Compass Server uses to generate keywords comes from the text of the document itself. By default, all the unique words in the first 4,000 bytes of text (approximately the first 800 words) are listed as keywords. (Your administrator can increase or decrease the number of bytes from which keywords are taken.)

Keep in mind though, that to the server the "first text" is whatever immediately follows the <body> tag in the document file. If the first text is routing information, reference citations, acknowledgments, and so forth, that is what gets listed as keywords.

From the point of view of a search, it is a good rule of thumb to begin each document with a concise summary or overview of the document's contents. By doing that you ensure that the keywords taken from the first text are the important keywords that you want listed in the index.

Note that the exact amount of text included as keywords is adjustable by your site administrator.


Using META Tags to Make Searching Easier

In addition to shaping your document content to make iPlanet Compass Server searches more effective, you can also use META tags to help users find the information that they need. You can:


META Tags and Keywords

You can use META tags to specify keywords to be included in the index. When specifying keywords with META tags, keep these principles in mind:

You can use the following META tags to add keywords to the server index:

See Working With META information for information on how to add META tags to your documents.


Describing Your Documents

By default, document lists contain two pieces of information about each document:

Document lists produced by a search also display search relevance indicators (boxes), and a link to the item's category if it has one.

Document Titles

The title displayed in a list is the document's search title. When users browse by category, each category's documents are listed alphabetically by title.

You specify the search title you want to use for your document with a <title> META tag.

If you do not include a <title> META tag in your document, the list displays the document's URL (web) address as the title. Since that may not be helpful to readers, it is good practice to always include a <title> META tag in every document.

For many documents, the search title is the same as the formal title that readers see when they view it. But they do not have to be the same, and it is not unusual for the search title and formal title to be different. (See <Title> META Tags for information on the different uses of title tags.)

Document Descriptions

By default, document lists contain descriptions of every item. You specify the description you want displayed for your document with a <description> META tag.

If you do not include a <description> META tag in your document, the document list displays the first 20 to 30 words of document content as the description. That is, whatever words immediately follow the <body> tag. If those words are headers, bylines, acknowledgments, navigation links, frame descriptors, or other miscellaneous information, they won't provide a very useful description. Thus, it is good practice to always include a <description> META tag in every document.


Categorizing Your Documents

At most sites, documents are grouped into categories and subcategories. Once a document has been assigned to a category, it will be listed in alphabetic order whenever a reader browses that category. When a document is found by a search, its listing will contain a link back to that document's category so that users can browse for similarly categorized items.

Your site administrator creates the categories and specifies the rules that the server will use when automatically assigning a document to a category or categories.

A document can be assigned to more than one category. By default, a document can be placed in as many as three different categories. (Your site administrator can increase or decrease that number.) When a document has multiple categories, one of those categories is primary. When a document is listed as a result of a search, the link to similarly categorized documents links to the document's primary category.

You can use a META <Classification> tag to explicitly place a document in a particular category. Categories that you assign with a classification tag are in addition to any categories automatically assigned by the server. A category explicitly specified with a classification tag becomes that document's primary category with precedence over any categories automatically generated by iPlanet Compass Server. If you specify multiple categories with a classification tag, the first one is the primary one.


Working with META information

META information is information about a document (as opposed to the document's contents). Types of META information include:

Some META information is generated, maintained, and displayed by the server containing the document. Other types you enter into the document itself using META tags. There are two ways of adding META tags to your documents:


Adding META information by Hand

You can use a text editor to add META information to your web documents. META information is specified with META tags.

Important: When using a text editor to add any kind of information to a document, you must always save your document in ASCII format. (Some popular word processors and editors use names like text, text-only, or DOS-text instead of ASCII.) If you do not know how to set your editor to work in, or save as, ASCII text, ask your site administrator for assistance before adding META tags by hand.

Web documents (also known as HTML documents) have two parts:

All META tags (except the title tag) have the same format:
<META name="xxx" content="zzz">.
Where:

For example, the META tag specifying the author for this page looks like this:

<META NAME="Author" CONTENT="Sun Microsystems, Inc.">

For example, the following portion of an HTML document defines the document title, author, description, category, and keywords:

<HTML> 
<HEAD> 
  <TITLE>Declaration of Independence</TITLE> 
  <META name="Author" content="Thomas Jefferson"> 
  <META name="Description" content="Statement of 
principles and enumeration of grievances by American 
colonists to British monarchy"> 
  <META name="Keywords" content="Continental Congress, 
human rights, independence, America, democracy, 
July 4th 1776, Philadelphia, Libety Bell, taxation"> 
  <META name="Classification" content="History:American:Documents; 
</HEAD> 
</HTML>

In addition to the standard META tags described here, your site administrator can define other META tags that you can use and that the server recognizes.


Standard META Tags Used by iPlanet Compass Server

By default, the following META tags are recognized and used by iPlanet Compass Server. (Your site administrator can add or delete recognized META tags.)

Using the <Title> META Tag

See also Document Titles.

You use the <title> tag to assign a search title to your document. Search titles are used for the following online purposes:

To create a search title:

  1. Enter the <title> and </title> tags between the <head> and </head> tags at the top of your document.
  2. Enter your search title between the <title> and </title> tags.

For example, to create a search title that reads: This is the Title your document would look like this:

<head>
<title>This is the Title</title>
</head>

Note that unlike other META tags, the <title> tag does not use the word META.

Using the <Keywords> META Tag

See About Keywords and Document Content and Keywords for information on how iPlanet Compass Server uses keywords.

You use the <keywords> META tag to specify keywords for your document. This tag uses the standard <META name="keywords" content="  "> format.

To create keywords:

  1. Enter a <META name="keywords" content="  "> between the <head> and </head> tags at the top of your document.
  2. Enter your keywords between the quotation marks of the content field.

For example, to add the keywords Netscape browser, web, HTML, Compass, search, search engine, and document your document would look like this:

<head>
<title>This is the Title</title>
<META name="keywords" content="Netscape browser, web, HTML, 
Compass, search, search engine, document">
</head>

See META Tags and Keywords for general principles that you should use when specifying keywords. Also note that:

Using the <Author> META Tag

You use the <author> META tag to specify the individuals or organization that created your document. This tag uses the standard <META name="keywords" content="  "> format.

To specify authors:

  1. Enter a <META name="author" content="  "> between the <head> and </head> tags at the top of your document.
  2. Enter author's name between the quotation marks of the content field.

For example, to add the author Harriet Stowe, your document would look like this:

<head>
<title>This is the Title</title>
<META name="author" content="Harriet Stowe">
</head>

You can specify multiple authors by separating the author names with semicolons. For example, to specify both Sun Microsystems and C. Brookes as authors, your tag would look like this:
<META NAME="Author" CONTENT="Sun Microsystems; C. Brookes">

Using the <Description> META Tag

See also Describing Your Documents.

You use the <description> META tag to specify the short description of your document that you want readers to see when the document is displayed in a list. The words in your description are also added to the list of document keywords.

This tag uses the standard <META name="keywords" content="  "> format.

To specify a description:

  1. Enter a <META name="description" content="  "> between the <head> and </head> tags at the top of your document.
  2. Enter your description between the quotation marks of the content field.

For example, to add the description An analyses of third quarter sales by region and product line, your document would look like this:

<head>
<title>This is the Title</title>
<META name="description" content="An analyses of 
third quarter sales by region and product line.">
</head>

Your description can be as long as necessary, but keep in mind that most users will prefer to see a brief summary rather than a extensively detailed review.

See Document Descriptions for additional information.

Using the <Expire> META Tag

You use the <expires> META tag to specify when your document should be dropped from the iPlanet Compass Server index. (Note that the expiration date only affects the index listing it does not cause a document to be removed from the server where it resides.)

Once the expiration date you specify has passed, your document will be removed from the index the next time your administrator purges expired documents. See Removing Documents from iPlanet Compass Server for additional information.

This tag uses the standard <META name="keywords" content="  "> format.

To specify an expiration date:

  1. Enter a <META name="expires" content="  "> between the <head> and </head> tags at the top of your document.
  2. Enter the date between the quotation marks of the content field.

For example, to specify that your document should be dropped from the index listing after January 9, 1998, your document would look like this:

<head>
<title>This is the Title</title>
<META name="expires" content="1/9/98">
</head>

Keep in mind the following points:

Using the <Classification> META Tag

See also Categorizing Your Documents.

You use the <classification> META tag to specify categories for your document.

This tag uses the standard <META name="keywords" content="  "> format.

To specify a category:

  1. Enter a <META name="classification" content="  "> between the <head> and </head> tags at the top of your document.
  2. Enter the category name between the quotation marks of the content field. Separate parent and subcategories with colons.

For example, to specify that your document should be listed in the Compass category, which is a subcategory of New Products, which is a subcategory of iPlanet, your document would look like this:

<head>
<title>This is the Title</title>
<META name="classification" content="iPlanet:New Products:Compass">
</head>

You can specify more than one category up to a maximum number. By default, you can specify a maximum of three categories. (Your site administrator can change this maximum number.)

To specify more than one category, separate the different categories with semicolons. For example, to specify both iPlanet:New Products:Compass and iPlanet:Market Share, your tag would look like this:
<META name="classification" content="iPlanet:New Products:Compass;iPlanet:Market Share">

Keep in mind the following points:

  1. Accuracy. You must enter the category and subcategory names exactly as they have been established by your site administrator. This includes capitalization and the spaces between words.
  2. Primary category. Documents displayed in a search results list have a link back to the document's primary category. If you do not specify a category with a <classification> tag, the server automatically determines which category is primary. If you use a <classification> tag to specify a category, that becomes the primary category. If you specify multiple categories, the first one you specify becomes the primary category.

    See Categorizing Your Documents for additional information.


    Adding a Compass Search Box to Your Documents

    You can easily add an iPlanet Compass Server search box to any of your HTML web documents. This allows anyone who is viewing your document to conduct a search from your page.

    You add a search box by entering lines of HTML code to your document at the place where you want the search box to appear as shown below. (Note: In the example below, replace http://your.host.com with the URL of your Compass Server. Ask your site administrator if you are unsure what URL to use.)

    This produces a search box in your document that looks like this:

    Using Java Script, you can also:

    See Compass Server Developer's Guide for information on enhancing search boxes with Java script.


    Removing Your Documents from the Index

    Note: The file containing your document is stored on a file server. iPlanet Compass Server cannot remove a document from a file server. To remove a document from a file server, you must use whatever tool, application, or command is appropriate.

    To remove a document's index listing, you first specify an expiration date with an <Expire> META Tag. Once the expiration date you specify has passed, your document will be removed from the index the next time your administrator purges expired documents.

    If you did not specify an expiration date when you originally created your document, you can edit it later to add one. You can also edit your document to change an expiration date. Note, however, that a new or altered expiration date will not take effect until the next time the server happens to index the site where the document is stored. (Your site administrator controls when and how often sites are indexed.)


    © Copyright © 2001 Sun Microsystems, Inc. Some preexisting portions Copyright © 2001 Netscape Communications Corp. All rights reserved.