Entrez | PubMed | Nucleotide | Protein | Genome | Structure | PMC | Journals | Books |
LinkOut and Non-Bibliographic Resources
Updated: September 23, 2004
Introduction
Frequently Asked Questions
Step 1: Preliminary Contact
Step 2: File Preparation
The Identity File
Identity
File Example
Identity
File Prolog
Identity
File Elements
The Resource File
Simple
Resource File Example
Resource
File Prolog
Resource
File Elements
Additional
Information on Creating a Resource File
Selecting
Entrez Records in a Resource File
Query
ObjId
Specifying
URLs to Access the Provider's Resources
Base
Rule
Putting
it All Together
Complex
Resource File Example
Step 3: File Transfer
Step 4: Activate Provider's Resources in Entrez
Allowable Rule Keywords
Supply Links using a Simple Text File
Announcement Mailing Lists
For More Assistance
LinkOut is a feature of Entrez where third parties provide information to link specific Entrez records to relevant web-accessible online resources, such as full-text publications, molecular biology databases (i.e., organism-specific, taxonomy, structure, etc.), catalogs of research materials (clones, cell cultures, primers, etc.), funding sources, medical resources, research groups, and others. This document explains how non-bibliographic resource providers can participate in LinkOut by supplying NCBI with the necessary information for creating links from Entrez records to the providers' resources.
A list of Frequently Asked Questions and answers is available to address questions that a link provider may have.
Providers should first email NCBI at linkout@ncbi.nlm.nih.gov, indicating interest in creating links from Entrez records to the providers' online resources. Please include the name, email address and phone number of an individual who will act as a designated contact person. In addition, the email should also include a LinkOut Identity File (providerinfo.xml) based on the file specifications provided in Step 2 below.
NCBI will establish a ProviderId, an FTP account and a name abbreviation (NameAbbr) for each provider, and will send this information to the designated contact person.
Two types of files are necessary to participate in LinkOut. The files must be in the XML format using the Document Type Definition(DTD) specified in the LinkOut Document Type Definition (LinkOut DTD). XML tags are case sensitive.
The first is the identity file, "providerinfo.xml", that contains information about an online resource provider.
The second is the resource file (or files), typically named "resources.xml", that describes the online resources being made available through LinkOut.
The Identity File: providerinfo.xml
The identity file stores information about the online resource provider. This file should be sent via email with the preliminary contact initiated by the provider. The ProviderId field may be left blank. NCBI will change the provider's NameAbbr if it is not unique in LinkOut. Please consult the current list of LinkOut providers and their name abbreviaton (NameAbbr) to help to choose a NameAbbr. A list of current non-bibliographic LinkOut resources is also available.
Identity File Example
A providerinfo.xml file for a hypothetical LinkOut participant, WebDatabase
Co., which has the ProviderId "777" and NameAbbr "WebDB":
<?xml version="1.0"?>
<!DOCTYPE Provider PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "http://www.ncbi.nlm.nih.gov:80/entrez/linkout/doc/LinkOut.dtd">
<Provider>
<ProviderId>777</ProviderId>
<Name>WebDatabase Co.</Name>
<NameAbbr>WebDB</NameAbbr>
<SubjectType>gene/protein/disease-specific</SubjectType>
<Attribute>registration required</Attribute>
<Url>http://www.webdatabase.com</Url>
<IconUrl>http://www.webdatabase.com/images/webdb.gif</IconUrl>
<Brief>On-line publisher of biomedical databases
and other web resources</Brief>
</Provider>
XML Declaration - <?xml version="1.0"?>
(optional)
Document Type Declaration -
<!DOCTYPE Provider PUBLIC "-//NLM//DTD LinkOut 1.0//EN"
"http://www.ncbi.nlm.nih.gov/entrez/linkout/doc/LinkOut.dtd">
(required)
Provider - root element of the file.
(required)
ProviderId - unique ID assigned by NCBI.
(required)
Name - full name of the resource provider.
(required)
NameAbbr - short, one-word name of the provider.
May only include alpha and numeric characters, spaces and special characters
such as hyphens are not allowed.
(required)
SubjectType, Attribute - descriptions
of the resources and relationship of the provider to the resources listed in the resources
file. SubjectType and Attribute values appearing in the identity
file will apply to all the resources listed by that provider. See
LinkOut
SubjectTypes, Attributes and UrlName for the list and description of these elements.
(optional, repeatable)
Url - URL of the provider's web site, used in the LinkOut Providers list in
Cubby.
(optional, repeatable)
IconUrl - logo of the provider, used to display the link from Entrez records. The icon should not be larger than 100 pixels in width, 25 pixels in height and should look like a button. An
icon with a white or transparent background or without borders is not recommended. Note: The Url and IconUrl here, and in the resource
file(s), may be different for different languages; see the LNG attribute
in the LinkOut.DTD.
(optional, repeatable - currently not being displayed)
Brief - short (up to 256 characters) description
of the provider.
(optional - currently not being displayed)
The providerinfo.xml file is specified in the LinkOut.DTD.
The Resource File: resources.xml
The resource file or files contain information about the provider's online resources that will be linked from Entrez records. Typically, this file is named "resources.xml".
Simple Resource File Example:
A simple resource file for WebDatabase Co., ProviderId 777, describing
links from NCBI's Nucleotide database to its C. elegans sequence
database, "Elegans". Note: This example is not functional, but intended
only to demonstrate syntax.
<?xml version="1.0"?>
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN"
"http://www.ncbi.nlm.nih.gov/entrez/linkout/doc/LinkOut.dtd"
[ <!ENTITY icon.url "http://www.webdatabase.com/images/webdb.gif">
<!ENTITY base.url "http://www.webdatabase.com/cgi-bin/elegans?">]>
<LinkSet>
<Link>
<LinkId>1</LinkId>
<ProviderId>777</ProviderId>
<IconUrl>&icon.url;</IconUrl>
<ObjectSelector>
<Database>Nucleotide</Database>
<ObjectList>
<Query>Caenorhabditis elegans [orgn]</Query>
</ObjectList>
</ObjectSelector>
<ObjectUrl>
<Base>&base.url;</Base>
<Rule>an_lookup=&lo.pacc;</Rule>
<UrlName>Caenorhabditis
elegans</UrlName>
<SubjectType>organism-specific</SubjectType>
</ObjectUrl>
</Link>
</LinkSet>
XML Decalaration - <?xml version="1.0"?>
(optional)
Document Type Declaration and Entity Declaration -
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut
1.0//EN" "http://www.ncbi.nlm.nih.gov/entrez/linkout/doc/LinkOut.dtd"
[<!ENTITY icon.url "http://www.webdatabase.com/images/webdb.gif">
<!ENTITY base.url "http://www.webdatabase.com/cgi-bin/elegans?">]>
The Document Type Declaration: <!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "http://www.ncbi.nlm.nih.gov/entrez/linkout/doc/LinkOut.dtd"> is required. The Entity Declaration is optional. Providers may specify entities that will be used repeatedly in the body of the file. In this example, the entities icon.url and base.url were defined as "http://www.webdatabase.com/images/webdb.gif" and "http://www.webdatabase.com/cgi-bin/elegans?" respectively.
Once an entity is defined in the prolog, it can be used in the resource file by placing the entity name between an ampersand (&) and semicolon (;) and alleviates the need to replicate long, textual data. In the above example, '&icon.url;' and '&base.url;' are used to represent the respective information.
LinkSet - the root element of the resource file (one LinkSet per resource file).
(required)
Link - an element that describes
on a specific set of resources grouped together by access characteristics
or for convenience. A resource file may have multiple Link elements.
(required, repeatable)
LinkId - an identifier assigned by the provider
for its own reference. It may be any character string. Each Link should
have a unique LinkId within each LinkSet or file.
(required)
ProviderId - the identifier number assigned to the provider by NCBI and listed in the providerinfo.xml file.
(required)
IconUrl - the URL to the icon that will be displayed on the Entrez/PubMed Citation and Abstract Display. The icon should not be larger than 100 pixels in width and 25 pixels in height, and should look like a button. Icons with white or transparent backgrounds, or without borders are not recommended.
The Cubby feature
allows users to activate resource provider's icon on the Citation and Abstract display
formats. In addition, a provider's icon can also be activated by searching PubMed with a
holding parameter.
(required, repeatable)
ObjectSelector - an element containing sub-elements
in which providers will specify which Entrez records are being linked from
by a <Link> element.
(required)
Database - a sub-element of <ObjectSelector>. Databases available for
linking include: PubMed, Protein, Nucleotide, Genome, Structure, PopSet, Taxonomy, OMIM, Gene, GEO, SNP, UniGene, UniSTS, NLMCatalog.
(required)
ObjectList - a sub-element of <ObjectSelector>
containing either the <Query> or <ObjectID> that specifies the Entrez
records from which the resource will be linked.
(required, repeatable)
Query - a sub-element of <ObjectList> that contains any valid
Entrez
search, used to select the Entrez records being linked from. Note:
Do not use the search field tag [filter] in Query; filters are generated
after the LinkOut files are processed.
(required unless ObjId is specified, repeatable)
ObjId -
a sub-element of <ObjectList> that contains an Entrez record unique
identifier (GI or PMID).
(required unless Query is specified, repeatable)
ObjUrl - an element that contains the necessary
information for the Entrez system to construct URLs to link to the provider's
resources.
(required)
Base - a sub-element of <ObjUrl> that is
the base of the URL for the provider's records.
(required)
Rule - a sub-element of <ObjUrl> that specifies
the construction of the remainder of the URL, based upon the
specification of systems where the resources resided.
(required)
UrlName - a short (two or three words) description
of the link. This may be used when multiple links are available
for a single Entrez record. This may also be used if the allowed terms in SubjectType and
Attributes cannot meet the need of a provider.
(optional)
SubjectType, Attribute - sub-elements
of <ObjectUrl>, used to describe the subject(s) of the provider's resources,
barriers (if any) to using the resources, and relationship of the provider
to the resources listed in the resource file. The SubjectType(s) and Attribute(s)
will be applied to the all resources provided within a <Link>. See
LinkOut
SubjectTypes, Attributes and UrlName for the list and description of these elements.
(optional, repeatable)
The resource file is specified in the LinkOut DTD.
Additional Information on Creating a Resource File
The resource file contains a <LinkSet> which may contain one or more <Link> elements. Each <Link> element selects an Entrez record or range of records to be linked from using a particular URL generation <Rule>. The <Rule> will be used to generate valid URLs for links from Entrez records. Providers should examine the resources to be linked, determine how they will be accessed through LinkOut, and group the <Query>'s and <ObjId>'s of those that can be accessed via a single URL <Rule> in one <Link>.
Providers may choose to put their <Query>'s and <ObjId>'s in different <Link> elements even if the same URL <Rule> applies to all of them. Similarly, providers may supply multiple resource files to aid file management, if desired.
Selecting Entrez Records in a Resource File
The element <ObjectList> is used to specify Entrez records for linking. Within this element, providers may use either <Query> or <ObjId> to specify the Entrez records from which they are providing links.
<Query> may contain any valid search for the <Database>. Please consult Entrez Help for information on constructing Entrez queries, and on Entrez field tags. Additionally, more than one <Query> may be used within <ObjectList> to select a range of Entrez records.
Examples:
<Query> Caenorhabditis elegans [orgn] AND 1996:1999 [pdat]</Query>
will select records with the organism "Caenorhabditis elegans" published from 1996 to 1999.
<Query>Caenorhabditis elegans [orgn] AND smith j [auth]</Query>
will select records with the organism Caenorhabditis elegans published by J. Smith.
Additional Query Rules
<ObjId> is the unique identifier (PMID, GI) for a record in a particular database, and may be used in place of the <Query> element. It will select only one record, rather than a range of records, and more than one <ObjId> can be used in an <ObjectList>.
Example:
<ObjId>6016240</ObjId>
will select the record with ID 6016240.
Specifying URLs to Access the Provider's Resources
The URLs of the provider's resources to be linked to are specified using the <ObjectUrl> element. In this element, the provider includes instructions for generating the URLs for the resources being linked to, and any additional information about the resources.
Entrez uses a rule-based mechanism to generate URLs to link records to the provider's resources. Typically, two elements are required to generate these URLs: <Base> and <Rule>.
<Base> is the base of the URL for retrieving the provider's resources. This may be the URL of a provider's Web site, or of a database's CGI program.
<Rule> is the remainder of the URL, to be generated based on the provider's resource file instructions. Information from a record can be used to generate the rule via a list of supported keywords (entities) which can be found at the end of this document or in the LinkOut.DTD.
Entrez will replace the keywords with actual values from the records retrieved. Providers may also add any additional information needed for the LinkOut URLs in the <Rule> element. <Base> is then concatenated with <Rule> to generate the full URL that will link to a specific provider resource.
Examples:
<Base>http://www.webdatabase.com/cgi-bin/elegans?</Base>
<Rule>an_lookup=&lo.pacc;</Rule>
Using this rule, the URL constructed for the record with the accession number "AL032671" would be: http://www.webdatabase.com/cgi-bin/elegans?an_lookup=AL032671
<Base>http://www.webdatabase.com/cgi-bin</Base>
<Rule>/db=elegans&id_lookup=&lo.id;&view=text</Rule>
In this case, the URL generated for the record with the unique ID "6016240" would be: http://www.webdatabase.com/cgi-bin/db=elegans&id_lookup=6016240&view=text
The following entities are supported and required for these symbols:
"&" - ENTITY amp
"<" - ENTITY lt
">" - ENTITY gt
The list of supported keywords(entities) in the <Rule> can be found at the end of this document or in the LinkOut.DTD.
The following hypothetical complex resource file (resources.xml) includes all of the elements described so far.
LinkId 1 describes links from particular Nucleotide records, specified by unique ID (GI), to a single record in Webdatabase Co.'s C. elegans sequence database.
LinkId 2 describes links from all Nucleotide records on C. elegans published by J. Smith from 1997 to 1999 to a set of Webdatabase Co.'s C. elegans records in PDF format. LinkId 2 also uses a different icon.
LinkId 3 provides links from all Nucleotide records on C. elegans to Webdatabase Co.'s C. elegans records.
Since both LinkId 1 and LinkId 2 describe specific requirements, they should be listed before the more general LinkId 3 in the LinkSet. If a particular record is selected by more than one Link, the information in the ObjectList of the first Link will be used to generate the URL of the link to the provider's record.
<?xml version="1.0"?>
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "http://www.ncbi.nlm.nih.gov/entrez/linkout/doc/LinkOut.dtd"
[ <!ENTITY icon.url "http://www.webdatabase.com/images/webdb.gif">
<!ENTITY base.url "http://www.webdatabase.com/cgi-bin/elegans?">]>
<LinkSet>
<Link>
<LinkId>1</LinkId>
<ProviderId>777</ProviderId>
<IconUrl>&icon.url;</IconUrl>
<ObjectSelector>
<Database>Nucleotide</Database>
<ObjectList>
<ObjId>3810674</ObjId>
<ObjId>1217583</ObjId>
<ObjId>1181594</ObjId>
</ObjectList>
</ObjectSelector>
<ObjectUrl>
<Base>&base.url;</Base>
<Rule>db=special&ID=A594E</Rule>
</ObjectUrl>
</Link>
<Link>
<LinkId>2</LinkId>
<ProviderId>777</ProviderId>
<IconUrl>http://www.webdatabase.com/images/smith.gif</IconUrl>
<ObjectSelector>
<Database>Nucleotide</Database>
<ObjectList>
<Query>Caenorhabditis elegans [orgn] AND 1997:1999 [pdat] AND smith
j [auth]</Query>
</ObjectList>
</ObjectSelector>
<ObjectUrl>
<Base>&base.url;</Base>
<Rule>auth_lookup=j-smith&view=pdf</Rule>
<Attribute>full-text PDF</Attribute>
</ObjectUrl>
</Link>
<Link>
<LinkId>3</LinkId>
<ProviderId>777</ProviderId>
<IconUrl>&icon.url;</IconUrl>
<ObjectSelector>
<Database>Nucleotide</Database>
<ObjectList>
<Query>Caenorhabditis elegans [orgn]</Query>
</ObjectList>
</ObjectSelector>
<ObjectUrl>
<Base>&base.url;</Base>
<Rule>an_lookup=&lo.pacc;&view=full</Rule>
</ObjectUrl>
</Link>
</LinkSet>
Transfer both the providerinfo.xml and the resource files via ftp to the host ftp-private.ncbi.nih.gov. These files must be in plain text format. Use the LinkOut File Validation utility to validate all your LinkOut files against the LinkOut DTD before submitting them to NCBI. Place the files under the directory "holdings" in the FTP account setup by NCBI for each provider. No subdirectories should be created in the holdings directory.
There will be a test period for all new LinkOut participants. During this period, the provider should notify NCBI of all file submissions and updates, so NCBI staff can check the accuracy of the files and validity of the files.
When the submitted files are consistently error-free, NCBI will end the test period. From that time on, submitted files will be processed automatically every weekday morning (except federal holidays).
Providers may transfer new versions of current files, or add new resource files at their own discretion. It is the responsibility of providers to keep their files current and valid. Links in Entrez databases are regenerated each day based on the files in each provider's directory, therefore providers must delete obsolete files from their holdings directory.
Step 4: Activate Provider Resources in Entrez
Once a provider's LinkOut files are processed, the provider's URL icons can be displayed on Entrez records by adding the parameter holding=NameAbbr to the basic Entrez URL. Currently, only PubMed will display provider's icon.
This example URL illustrates how to list IconUrl's for records that link to WebDatabase Co.'s records:
http://ncbi.nlm.nih.gov/entrez/query.fcgi?holding=WebDB
Multiple NameAbbr parameters may be used in a URL to activate more than one icon.
Example, to display icons
for both WebDB and MyDB:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? holding=WebDB,MyDB
A provider's icon can also be activated if a user selects the provider from the LinkOut Preferences in Cubby.
All access restrictions will still apply. For example, if access to a database is limited by user IP address, users will only have access via computers within an approved IP range; if access is password-protected, users must still enter the password.
Below are the allowable Rule keywords(entities) as specified in the LinkOut.DTD.
Supply Links using a Simple Text File
Links providers can also supply links to NCBI in a simple text file instead of XML. Please see the document Supply Links using a Simple Text File for details.
For general announcements regarding LinkOut you may subscribe to the
linkout-news announcement mailing list. Please disregard the notice you
receive about posting messages to the list.This mailing list is an announcement
list only; individual subscribers may not send mail. The list of
subscribers is private. To subscribe send an e-mail message with subscribe
in the Subject to:
linkout-news-request@ncbi.nlm.nih.gov
You may also subscribe on the web at:
http://www.ncbi.nlm.nih.gov/mailman/listinfo/linkout-news
Providers who maintain sets of LinkOut links to their taxonomic resources may subscribe
to the tax-linkout announcement mailing list as well. To subscribe send an e-mail message
with subscribe in the Subject to:
tax-linkout-request@ncbi.nlm.nih.gov
You may also subscribe on the web at:
http://www.ncbi.nlm.nih.gov/mailman/listinfo/tax-linkout
Questions on constructing the provider and resource files may be sent
to linkout@ncbi.nlm.nih.gov.