Skip to Content
United States National Library of Medicine National Institutes of Health

Sample NLM® Data

Information for prospective licensees of NLM data is at http://www.nlm.nih.gov/databases/leased.html and information for current licensees is at http://www.nlm.nih.gov/bsd/licensee.html.

  1. MEDLINE®/PubMed®
    NLM distributes MEDLINE data in XML format. The current 2004 NLMMEDLINE DTD (* see note below) is available at http://www.nlm.nih.gov/databases/dtd/nlmmedline_031101.dtd. This DTD references the NLMMedlineCitation DTD at http://www.nlm.nih.gov/databases/dtd/nlmmedlinecitation_031101.dtd that in turn references the new NLMSharedCatCit DTD at http://www.nlm.nih.gov/databases/dtd/nlmsharedcatcit_031101.dtd that in turn references the NLMCommon DTD at http://www.nlm.nih.gov/databases/dtd/nlmcommon_031101.dtd.

    There are five sample files: one at http://www.nlm.nih.gov/databases/dtd/medsamp2004.xml contains 61 representative records and one at http://www.nlm.nih.gov/databases/dtd/medsamp2004a.xml contains five revamped records that have been edited to include the new attributes defined for Journal Issue and ElectronicPubDate elements for 2004 (these five records in medsamp2004a.xml are not true MEDLINE records). Three large files, each containing 30,000 records and named medsampv5a.xml, medsampv5b.xml, and medsampv5c.xml, are available on the ftp server (see access instructions below). **See note below.

    *Note: The 2004 version of the MEDLINE suite of DTDs currently in effect for distribution of MEDLINE, OLDMEDLINE, and other records in PubMed will be replaced by new DTDs dated November 1, 2004 that will be used for creating the 2005 version of the baseline databases and for subsequent update files during the 2005 production year.

    **Note: A small sample file of 68 records mocked up to reflect use of the forthcoming DTD for 2005 is available at http://www.nlm.nih.gov/databases/dtd/medsamp2005.xml. After NLM produces the 2005 baseline files, this sample file will be replaced and additional sample files of 2005 data containing many more records will also become available.

    MEDLINE data element descriptions are available at http://www.nlm.nih.gov/bsd/licensee/data_elements_doc.html. This document is not yet updated for 2005.

  2. OLDMEDLINE
    Beginning with NLM's 2004 production year, OLDMEDLINE data are generated using the suite of MEDLINE DTDs. Sample records are available on the web at http://www.nlm.nih.gov/databases/dtd/oldmedsamp2004.xml. The same file of 621 representative records is also available on the ftp server (see access instructions below).

    Beginning with NLM's 2005 production year, OLDMEDLINE records will use the 2005 version of the MEDLINE DTDs and will be distributed to all MEDLINE/PubMed licensees as part of the MEDLINE/PubMed distribution. Sample OLDMEDLINE records using the 2005 DTDs are not yet available, however, sample 2005 MEDLINE/PubMed records are available (see above).

  3. CCRIS, GENE-TOX and HSDB®
    Sample CCRIS, GENE-TOX and HSDB data in an abbreviated XML format are available for ftp. See instructions below for obtaining the abbreviated DTDs, sample records in XML format, and two files of documentation for each database from NLM's ftp server. The two documentation files are a .readme file containing definitions of the elements using legacy format element names and a conversion table showing conversion of data element names from legacy format to new XML element names.

  4. TOXLINE® Special
    Sample TOXLINE Special data in XML format are available for ftp. See instructions below for obtaining sample records and DTDs from NLM's ftp server. Multiple DTDs and sample files are available for TOXLINE Special: toxspec.dtd defines the XML for the entire TOXLINE Special and archival.dtd defines the XML for the archival subfiles only. (Note that licensees must have special arrangements with BIOSIS and IPA before NLM will distribute their data). Other DTDs and sample files are present for each individual subfile of the database. Updates for the various subfiles comprising this database, if available, will be placed on the NLM server for licensees at the end of each month. The frequency of updates will be irregular, as NLM is dependent upon the outside suppliers whose schedules are not fixed. Each update file will be a complete replacement for that specific subfile.

  5. CHEMIDplus and DIRLINE®
    Sample ChemIDplus and DIRLINE data in XML format are available for ftp. See instructions below for obtaining the DTDs and sample records in XML format from NLM's ftp server. Note that licensees must contact U.S. Pharmacopeia Convention, Inc. (USP), for possible special arrangements before NLM will distribute ChemIDplus.

  6. CatfilePlus in XML
    CatfilePlus in XML is defined by three NLM DTDs:
    The current 2004 NLMCatalogRecord DTD is available at http://www.nlm.nih.gov/databases/dtd/nlmcatalogrecord_031101.dtd. This DTD references the NLMSharedCatCit DTD at http://www.nlm.nih.gov/databases/dtd/nlmsharedcatcit_031101.dtd that in turn references the NLMCommon DTD at http://www.nlm.nih.gov/databases/dtd/nlmcommon_031101.dtd.

    A sample file of 150 CatfilePlus in XML records is at http://www.nlm.nih.gov/databases/dtd/catplussamp2004.xml.

    A file containing 4,485 CatfilePlus in XML records, one representative month's output, is named catplussamp2004a.xml and is available on the ftp server (see access instructions below).

    Data element descriptions applicable to CatfilePlus in XML are available at http://www.nlm.nih.gov/bsd/licensee/catrecordxml_element_desc2.html.

    Sample files of MARC 21-formatted products are available upon request

  7. Serfile in XML
    Serfile in XML is defined by three NLM DTDs:
    The current 2004 NLMCatalogRecord DTD is available at http://www.nlm.nih.gov/databases/dtd/nlmcatalogrecord_031101.dtd. This DTD references the NLMSharedCatCit DTD at http://www.nlm.nih.gov/databases/dtd/nlmsharedcatcit_031101.dtd that in turn references the NLMCommon DTD at http://www.nlm.nih.gov/databases/dtd/nlmcommon_031101.dtd.

    A small sample file of 60 Serfile in XML records is at http://www.nlm.nih.gov/databases/dtd/sersamp2004.xml.

    A file containing 1,000 Serfile in XML records, one representative month's output, is named serfilesamp2004a.xml and is available on the ftp server (see access instructions below).

    Data element descriptions applicable to Serfile in XML are available at http://www.nlm.nih.gov/bsd/licensee/catrecordxml_element_desc2.html.

    Sample files of MARC 21-formatted products are available upon request

INSTRUCTIONS FOR FTP OF SAMPLE RECORDS AND DTDs
Ftp to NLM's anonymous ftp server: ftp://ftp.nlm.nih.gov/nlmdata/sample/
(login as a non-fee/anonymous user; use your e-mail address as password)
You will see a directory for each NLM database. Go to the directory you want and get the desired files.


Return to Information for Licensees of NLM Data

Last updated: 21 October 2004
First published: 01 January 1999
Metadata| Permanence level: Permanence Not Guaranteed