Entrez PubMed Nucleotide Protein Genome Structure PMC Journals Books

LinkOut and Non-Bibliographic Resources

Updated: September 23, 2004

Introduction
Frequently Asked Questions
Step 1: Preliminary Contact
Step 2: File Preparation
     The Identity File
          Identity File Example
          Identity File Prolog
          Identity File Elements
     The Resource File
          Simple Resource File Example
          Resource File Prolog
          Resource File Elements
          Additional Information on Creating a Resource File
          Selecting Entrez Records in a Resource File
          Query
          ObjId
          Specifying URLs to Access the Provider's Resources
          Base
          Rule
          Putting it All Together
          Complex Resource File Example
Step 3: File Transfer
Step 4: Activate Provider's Resources in Entrez
Allowable Rule Keywords
Supply Links using a Simple Text File
Announcement Mailing Lists
For More Assistance

Introduction

LinkOut is a feature of Entrez where third parties provide information to link specific Entrez records to relevant web-accessible online resources, such as full-text publications, molecular biology databases (i.e., organism-specific, taxonomy, structure, etc.), catalogs of research materials (clones, cell cultures, primers, etc.), funding sources, medical resources, research groups, and others. This document explains how non-bibliographic resource providers can participate in LinkOut by supplying NCBI with the necessary information for creating links from Entrez records to the providers' resources.

A list of Frequently Asked Questions and answers is available to address questions that a link provider may have.

Step 1: Preliminary Contact

Providers should first email NCBI at linkout@ncbi.nlm.nih.gov, indicating interest in creating links from Entrez records to the providers' online resources. Please include the name, email address and phone number of an individual who will act as a designated contact person. In addition, the email should also include a LinkOut Identity File (providerinfo.xml) based on the file specifications provided in Step 2 below.

NCBI will establish a ProviderId, an FTP account and a name abbreviation (NameAbbr) for each provider, and will send this information to the designated contact person.

Step 2: File Preparation

Two types of files are necessary to participate in LinkOut. The files must be in the XML format using the Document Type Definition(DTD) specified in the LinkOut Document Type Definition (LinkOut DTD). XML tags are case sensitive.

The first is the identity file, "providerinfo.xml", that contains information about an online resource provider.

The second is the resource file (or files), typically named "resources.xml", that describes the online resources being made available through LinkOut.

The Identity File: providerinfo.xml

The identity file stores information about the online resource provider. This file should be sent via email with the preliminary contact initiated by the provider. The ProviderId field may be left blank. NCBI will change the provider's NameAbbr if it is not unique in LinkOut. Please consult the current list of LinkOut providers and their name abbreviaton (NameAbbr) to help to choose a NameAbbr. A list of current non-bibliographic LinkOut resources is also available.

Identity File Example
A providerinfo.xml file for a hypothetical LinkOut participant, WebDatabase Co., which has the ProviderId "777" and NameAbbr "WebDB":

<?xml version="1.0"?>
<!DOCTYPE Provider PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "http://www.ncbi.nlm.nih.gov:80/entrez/linkout/doc/LinkOut.dtd">
<Provider>
    <ProviderId>777</ProviderId>
    <Name>WebDatabase Co.</Name>
    <NameAbbr>WebDB</NameAbbr>
    <SubjectType>gene/protein/disease-specific</SubjectType>
     <Attribute>registration required</Attribute>
    <Url>http://www.webdatabase.com</Url>
    <IconUrl>http://www.webdatabase.com/images/webdb.gif</IconUrl>
    <Brief>On-line publisher of biomedical databases and other web resources</Brief>
</Provider>

Identity File Prolog

XML Declaration - <?xml version="1.0"?>
(optional)

Document Type Declaration -
<!DOCTYPE Provider PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "http://www.ncbi.nlm.nih.gov/entrez/linkout/doc/LinkOut.dtd">
(required)

Identity File Elements

Provider - root element of the file.
(required)

ProviderId - unique ID assigned by NCBI.
(required)

Name - full name of the resource provider.
(required)

NameAbbr - short, one-word name of the provider. May only include alpha and numeric characters, spaces and special characters such as hyphens are not allowed.
(required)

SubjectType, Attribute - descriptions of the resources and relationship of the provider to the resources listed in the resources file. SubjectType and Attribute values appearing in the identity file will apply to all the resources listed by that provider. See LinkOut SubjectTypes, Attributes and UrlName for the list and description of these elements.
(optional, repeatable)

Url - URL of the provider's web site, used in the LinkOut Providers list in Cubby.
(optional, repeatable)

IconUrl - logo of the provider, used to display the link from Entrez records. The icon should not be larger than 100 pixels in width, 25 pixels in height and should look like a button. An icon with a white or transparent background or without borders is not recommended. Note: The Url and IconUrl here, and in the resource file(s), may be different for different languages; see the LNG attribute in the LinkOut.DTD.
(optional, repeatable - currently not being displayed)

Brief - short (up to 256 characters) description of the provider.
(optional - currently not being displayed)

The providerinfo.xml file is specified in the LinkOut.DTD.

The Resource File: resources.xml

The resource file or files contain information about the provider's online resources that will be linked from Entrez records. Typically, this file is named "resources.xml".

Simple Resource File Example:
A simple resource file for WebDatabase Co., ProviderId 777, describing links from NCBI's Nucleotide database to its C. elegans sequence database, "Elegans". Note: This example is not functional, but intended only to demonstrate syntax.

<?xml version="1.0"?>
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "http://www.ncbi.nlm.nih.gov/entrez/linkout/doc/LinkOut.dtd"
[ <!ENTITY icon.url "http://www.webdatabase.com/images/webdb.gif">
<!ENTITY base.url "http://www.webdatabase.com/cgi-bin/elegans?">]>
<LinkSet>
    <Link>
    <LinkId>1</LinkId>
    <ProviderId>777</ProviderId>
    <IconUrl>&icon.url;</IconUrl>
    <ObjectSelector>
        <Database>Nucleotide</Database>
        <ObjectList>
            <Query>Caenorhabditis elegans [orgn]</Query>
        </ObjectList>
    </ObjectSelector>
    <ObjectUrl>
        <Base>&base.url;</Base>
        <Rule>an_lookup=&lo.pacc;</Rule>
        <UrlName>Caenorhabditis elegans</UrlName>
       <SubjectType>organism-specific</SubjectType>
    </ObjectUrl>
    </Link>
</LinkSet>

Resource File Prolog

XML Decalaration - <?xml version="1.0"?>
(optional)

Document Type Declaration and Entity Declaration -
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "http://www.ncbi.nlm.nih.gov/entrez/linkout/doc/LinkOut.dtd"
[<!ENTITY icon.url "http://www.webdatabase.com/images/webdb.gif">
<!ENTITY base.url "http://www.webdatabase.com/cgi-bin/elegans?">]>

The Document Type Declaration: <!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "http://www.ncbi.nlm.nih.gov/entrez/linkout/doc/LinkOut.dtd"> is required.  The Entity Declaration is optional. Providers may specify entities that will be used repeatedly in the body of the file. In this example, the entities icon.url and base.url were defined as "http://www.webdatabase.com/images/webdb.gif" and "http://www.webdatabase.com/cgi-bin/elegans?" respectively.

Once an entity is defined in the prolog, it can be used in the resource file by placing the entity name between an ampersand (&) and semicolon (;) and alleviates the need to replicate long, textual data. In the above example, '&icon.url;' and '&base.url;' are used  to represent the respective information.

Resource File Elements

LinkSet - the root element of the resource file (one LinkSet per resource file).
(required)

Link - an element that describes on a specific set of resources grouped together by access characteristics or for convenience. A resource file may have multiple Link elements.
(required, repeatable)

LinkId - an identifier assigned by the provider for its own reference. It may be any character string. Each Link should have a unique LinkId within each LinkSet or file.
(required)

ProviderId - the identifier number assigned to the provider by NCBI and listed in the providerinfo.xml file.
(required)

IconUrl - the URL to the icon that will be displayed on the Entrez/PubMed Citation and Abstract Display. The icon should not be larger than 100 pixels in width and 25 pixels in height, and should look like a button. Icons with white or transparent backgrounds, or without borders are not recommended.

The Cubby feature allows users to activate resource provider's icon on the Citation and Abstract display formats. In addition, a provider's icon can also be activated by searching PubMed with a holding parameter.
(required, repeatable)

ObjectSelector - an element containing sub-elements in which providers will specify which Entrez records are being linked from by a <Link> element.
(required)

Database - a sub-element of <ObjectSelector>. Databases available for linking include: PubMed, Protein, Nucleotide, Genome, Structure, PopSet, Taxonomy, OMIM, Gene, GEO, SNP, UniGene, UniSTS, NLMCatalog.
(required)

ObjectList - a sub-element of <ObjectSelector> containing either the <Query> or <ObjectID> that specifies the Entrez records from which the resource will be linked.
(required, repeatable)

Query - a sub-element of <ObjectList> that contains any valid Entrez search, used to select the Entrez records being linked from. Note: Do not use the search field tag [filter] in Query; filters are generated after the LinkOut files are processed.
(required unless ObjId is specified, repeatable)

ObjId - a sub-element of <ObjectList> that contains an Entrez record unique identifier (GI or PMID).
(required unless Query is specified, repeatable)

ObjUrl - an element that contains the necessary information for the Entrez system to construct URLs to link to the provider's resources.
(required)

Base - a sub-element of <ObjUrl> that is the base of the URL for the provider's records.
(required)

Rule - a sub-element of <ObjUrl> that specifies the construction of the remainder of the URL, based upon the specification of systems where the resources resided.
(required)

UrlName - a short (two or three words) description of the link.  This may be used when multiple links are available for a single Entrez record. This may also be used if the allowed terms in SubjectType and Attributes cannot meet the need of a provider.
(optional)

SubjectType, Attribute - sub-elements of <ObjectUrl>, used to describe the subject(s) of the provider's resources, barriers (if any) to using the resources, and relationship of the provider to the resources listed in the resource file. The SubjectType(s) and Attribute(s) will be applied to the all resources provided within a <Link>. See LinkOut SubjectTypes, Attributes and UrlName for the list and description of these elements.
(optional, repeatable)

The resource file is specified in the LinkOut DTD.

Additional Information on Creating a Resource File

The resource file contains a <LinkSet> which may contain one or more <Link> elements. Each <Link> element selects an Entrez record or range of records to be linked from using a particular URL generation <Rule>. The <Rule> will be used to generate valid URLs for links from Entrez records. Providers should examine the resources to be linked, determine how they will be accessed through LinkOut, and group the <Query>'s and <ObjId>'s of those that can be accessed via a single URL <Rule> in one <Link>.

Providers may choose to put their <Query>'s and <ObjId>'s in different <Link> elements even if the same URL <Rule> applies to all of them. Similarly, providers may supply multiple resource files to aid file management, if desired.

Selecting Entrez Records in a Resource File

The element <ObjectList> is used to specify Entrez records for linking. Within this element, providers may use either <Query> or <ObjId> to specify the Entrez records from which they are providing links.

<Query> may contain any valid search for the <Database>. Please consult Entrez Help for information on constructing Entrez queries, and on Entrez field tags. Additionally, more than one <Query> may be used within <ObjectList> to select a range of Entrez records.

Examples:

               <Query> Caenorhabditis elegans [orgn] AND 1996:1999 [pdat]</Query>

will select records with the organism "Caenorhabditis elegans" published from 1996 to 1999.

               <Query>Caenorhabditis elegans  [orgn] AND smith j [auth]</Query>

will select records with the organism Caenorhabditis elegans published by J. Smith.

Additional Query Rules

  1. Ranging is not allowed in Volume, Issue, Page, or PMID searches.
  2. Truncation using the asterisk should not be used in Query search statements.
  3. If search fields tags are used, enclose them in square brackets, e.g., smith j [au]
  4. Use either MEDLINE Title Abbreviations [ta] or ISSN numbers in journal searches. MEDLINE Title Abbreviations should be entered in double quotes, e.g., "J Mol Dis" [ta].
  5. Boolean operators AND, OR, NOT must be in upper case.
  6. Do not use the search field tag [filter] in Query; filters are generated after the LinkOut files are processed.
In some instances, records may be retrieved by multiple queries, such as the two records that would be retrieved in the Protein database by the queries above. If the queries are in different <Link> elements, the <Link> element appearing first in the resource file will have priority, and its <rule> will be used to construct the link; unless, different UrlName is used in each link.

<ObjId> is the unique identifier (PMID, GI) for a record in a particular database, and may be used in place of the <Query> element. It will select only one record, rather than a range of records, and more than one <ObjId> can be used in an <ObjectList>.

Example:

       <ObjId>6016240</ObjId>

will select the record with ID 6016240.

Specifying URLs to Access the Provider's Resources

The URLs of the provider's resources to be linked to are specified using the <ObjectUrl> element. In this element, the provider includes instructions for generating the URLs for the resources being linked to, and any additional information about the resources.

Entrez uses a rule-based mechanism to generate URLs to link records to the provider's resources. Typically, two elements are required to generate these URLs: <Base> and <Rule>.

<Base> is the base of the URL for retrieving the provider's resources. This may be the URL of a provider's Web site, or of a database's CGI program.

<Rule> is the remainder of the URL, to be generated based on the provider's resource file instructions. Information from a record can be used to generate the rule via a list of supported keywords (entities) which can be found at the end of this document or in the LinkOut.DTD.

Entrez will replace the keywords with actual values from the records retrieved. Providers may also add any additional information needed for the LinkOut URLs in the <Rule> element. <Base> is then concatenated with <Rule> to generate the full URL that will link to a specific provider resource.

Examples:

        <Base>http://www.webdatabase.com/cgi-bin/elegans?</Base>
        <Rule>an_lookup=&lo.pacc;</Rule>

Using this rule, the URL constructed for the record with the accession number "AL032671" would be: http://www.webdatabase.com/cgi-bin/elegans?an_lookup=AL032671

        <Base>http://www.webdatabase.com/cgi-bin</Base>
        <Rule>/db=elegans&amp;id_lookup=&lo.id;&amp;view=text</Rule>

In this case, the URL generated for the record with the unique ID "6016240" would be: http://www.webdatabase.com/cgi-bin/db=elegans&id_lookup=6016240&view=text

The following entities are supported and required for these symbols:

"&" - ENTITY amp
"<" - ENTITY lt
">" - ENTITY gt

The list of supported keywords(entities) in the <Rule> can be found at the end of this document or in the LinkOut.DTD.

Putting it All Together

The following hypothetical complex resource file (resources.xml) includes all of the elements described so far.

LinkId 1 describes links from particular Nucleotide records, specified by unique ID (GI), to a single record in Webdatabase Co.'s C. elegans sequence database.

LinkId 2 describes links from all Nucleotide records on C. elegans published by J. Smith from 1997 to 1999 to a set of Webdatabase Co.'s C. elegans records in PDF format. LinkId 2 also uses a different icon.

LinkId 3 provides links from all Nucleotide records on C. elegans to Webdatabase Co.'s C. elegans records.

Since both LinkId 1 and LinkId 2 describe specific requirements, they should be listed before the more general LinkId 3 in the LinkSet. If a particular record is selected by more than one Link, the information in the ObjectList of the first Link will be used to generate the URL of the link to the provider's record.

Complex Resource File Example

<?xml version="1.0"?>
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "http://www.ncbi.nlm.nih.gov/entrez/linkout/doc/LinkOut.dtd"
[ <!ENTITY icon.url "http://www.webdatabase.com/images/webdb.gif">
<!ENTITY base.url "http://www.webdatabase.com/cgi-bin/elegans?">]>
<LinkSet>
<Link>
    <LinkId>1</LinkId>
    <ProviderId>777</ProviderId>
    <IconUrl>&icon.url;</IconUrl>
    <ObjectSelector>
        <Database>Nucleotide</Database>
        <ObjectList>
            <ObjId>3810674</ObjId>
            <ObjId>1217583</ObjId>
            <ObjId>1181594</ObjId>
        </ObjectList>
    </ObjectSelector>
    <ObjectUrl>
        <Base>&base.url;</Base>
        <Rule>db=special&amp;ID=A594E</Rule>
    </ObjectUrl>
</Link>

<Link>
    <LinkId>2</LinkId>
    <ProviderId>777</ProviderId>
    <IconUrl>http://www.webdatabase.com/images/smith.gif</IconUrl>
    <ObjectSelector>
        <Database>Nucleotide</Database>
        <ObjectList>
            <Query>Caenorhabditis elegans [orgn] AND 1997:1999 [pdat] AND smith j [auth]</Query>
    </ObjectList>
    </ObjectSelector>
    <ObjectUrl>
        <Base>&base.url;</Base>
        <Rule>auth_lookup=j-smith&amp;view=pdf</Rule>
    <Attribute>full-text PDF</Attribute>
  </ObjectUrl>
</Link>

<Link>
    <LinkId>3</LinkId>
    <ProviderId>777</ProviderId>
    <IconUrl>&icon.url;</IconUrl>
    <ObjectSelector>
        <Database>Nucleotide</Database>
        <ObjectList>
            <Query>Caenorhabditis elegans [orgn]</Query>
    </ObjectList>
    </ObjectSelector>
    <ObjectUrl>
        <Base>&base.url;</Base>
        <Rule>an_lookup=&lo.pacc;&amp;view=full</Rule>
    </ObjectUrl>
</Link>
</LinkSet>

Step 3: File Transfer

Transfer both the providerinfo.xml and the resource files via ftp to the host ftp-private.ncbi.nih.gov. These files must be in plain text format. Use the LinkOut File Validation utility to validate all your LinkOut files against the LinkOut DTD before submitting them to NCBI. Place the files under the directory "holdings" in the FTP account setup by NCBI for each provider. No subdirectories should be created in the holdings directory.

There will be a test period for all new LinkOut participants. During this period, the provider should notify NCBI of all file submissions and updates, so NCBI staff can check the accuracy of the files and validity of the files.

When the submitted files are consistently error-free, NCBI will end the test period. From that time on, submitted files will be processed automatically every weekday morning (except federal holidays).

Providers may transfer new versions of current files, or add new resource files at their own discretion. It is the responsibility of providers to keep their files current and valid. Links in Entrez databases are regenerated each day based on the files in each provider's directory, therefore providers must delete obsolete files from their holdings directory.

Step 4: Activate Provider Resources in Entrez

Once a provider's LinkOut files are processed, the provider's URL icons can be displayed on Entrez records by adding the parameter holding=NameAbbr to the basic Entrez URL. Currently, only PubMed will display provider's icon.

This example URL illustrates how to list IconUrl's for records that link to WebDatabase Co.'s records:

        http://ncbi.nlm.nih.gov/entrez/query.fcgi?holding=WebDB

Multiple NameAbbr parameters may be used in a URL to activate more than one icon.   Example, to display icons for both WebDB and MyDB:

    http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? holding=WebDB,MyDB

A provider's icon can also be activated if a user selects the provider from the LinkOut Preferences in Cubby.

All access restrictions will still apply. For example, if access to a database is limited by user IP address, users will only have access via computers within an approved IP range; if access is password-protected, users must still enter the password. 

Below are the allowable Rule keywords(entities) as specified in the LinkOut.DTD.

lo.id - Unique identifier(PMID, GI, TaxID, etc)
For PubMed only:
lo.pii - Publisher Item Identificator. Must be submitted by Publisher.
lo.doi - Article DOI
lo.issn - Journal ISSN code
lo.issnl - Journal ISSN code with the stripped desh
lo.jtit - Journal title(MEDLINE abbreviation)
lo.msrc - MEDLINE source. Example: Exp Brain Res 1998 Oct; 122(3):339-350
lo.vol - Volume
lo.iss - Issue
lo.page - First page
lo.year - Four digit year of the publication date. Example: 1998
lo.yr - Last two digit of year of the publication date. Example: 98
lo.yl - Last digit of year of the publication date. Example: 1999 - 9
lo.month - The month of the publication date. Example: September
lo.mon - A 3-letter month abbreviation of the publication date. Example: Sep
lo.mo - Two digit month abbreviation of the publication date. Example: 01
lo.day - Two digit day of the publication date. Example: 01
lo.otit - article title
lo.auth - First Author. Example: Smith JE
lo.authln - First Author. Example: Smith
For Sequence databases (Nucleotide,Protein,Structure,Genome):
lo.pacc - Primary accession for sequences
For Taxonomy only:
lo.name - Scientific name. Example: "Homo sapiens neanderthalensis"
lo.genus - Genus name. Example: "Homo"
lo.species - Species epithet. Example: "sapiens"
lo.subsp - Sub-species epithet. Example: "neanderthalensis"

Supply Links using a Simple Text File

Links providers can also supply links to NCBI in a simple text file instead of XML. Please see the document Supply Links using a Simple Text File for details.
 

Announcement Mailing Lists

For general announcements regarding LinkOut you may subscribe to the linkout-news announcement mailing list. Please disregard the notice you receive about posting messages to the list.This mailing list is an announcement list only; individual subscribers may not send mail.  The list of subscribers is private. To subscribe send an e-mail message with subscribe in the Subject to:
linkout-news-request@ncbi.nlm.nih.gov

You may also subscribe on the web at:
http://www.ncbi.nlm.nih.gov/mailman/listinfo/linkout-news

Providers who maintain sets of LinkOut links to their taxonomic resources may subscribe to the tax-linkout announcement mailing list as well. To subscribe send an e-mail message with subscribe in the Subject to:
tax-linkout-request@ncbi.nlm.nih.gov

You may also subscribe on the web at:
http://www.ncbi.nlm.nih.gov/mailman/listinfo/tax-linkout
 

For More Assistance

Questions on constructing the provider and resource files may be sent to linkout@ncbi.nlm.nih.gov.
 
 

 Write to the Help Desk
NCBI | NLM | NIH
Department of Health & Human Services
Freedom of Information Act | Disclaimer