Data.gov Catalog Update

The heart of Data.gov is the catalog. The Data.gov catalog brings together datasets from hundreds of agency sources across the federal government and from 50 non-federal sources. Metadata for datasets that are available to the public are maintained by the agencies that contribute to Data.gov. The Data.gov catalog harvests these agency metadata sources to present a unified, constantly updated catalog.

The total number of datasets on the front page of the catalog changes regularly as datasets are added or deleted at the agency metadata source. The total number includes many collections, such as a time series dataset that has periodic datasets across a number of years. A collection that includes thousands of datasets will only count as “1” dataset in the overall total.

You may have noticed significant changes recently in the total number of datasets on the Data.gov catalog front page. This fluctuation is due to a technical problem associated with the harvesting process that is creating duplicate datasets. The Data.gov team is working on carefully deleting the duplicates and restoring full functionality of the harvester. Currently, we are updating the catalog by initiating harvests of individual agency sources on a case by case basis as we work on restoring automatic daily harvesting for all sources.

We are also working on plans to upgrade the software that powers the Data.gov catalog, which is currently based on an older version of CKAN, the open source platform used by many national data catalogs. We will continue to provide updates on the status of the harvester and our migration of the catalog to the current version of CKAN. If you have any questions, please feel free to contact us.

Comments are closed.