FAQ
-
How accurate is the harvest?
The accuracy of each harvest was affected by these factors:
- The completeness of URL source lists,
- Whether URLs resolved successfully, and
- The capabilities of crawler tools used (see Heritrix at http://crawler.archive.org/) and the server environment being crawled. See a report on limitations of capabilities.
NARA has made every reasonable effort to ensure that websites' code and programming were captured accurately. NARA is not responsible for any websites' compliance with Federal laws, regulations, and requirements. NARA is responsible for providing public access to these copied websites but is not responsible for maintaining code such as links, accessibility features, search or site maps, or other functionality that may have been true of the sites before they were copied.
-
What does "harvested" mean?
Web harvesting is the process of automatically copying and organizing unstructured information from pages and data on the World Wide Web. It is also known as web mining, web scraping and web crawling. Websites are identified with a "seed list" of URLs which are "harvested" so that content within, or linked to an identified site, is captured and copied.
-
Who conducted the harvest?
NARA contracted with the Internet Archive (IA), a San Francisco nonprofit, to perform the harvest.
-
How large is the collection?
The harvest collection includes 121 terabytes of archived websites. The most recent harvest from the 115th Congress preserved over 470,000,000 URIs totaling over 61 terabytes of web data.
-
Why doesn't form input or streaming video work in the collection?
A harvest engine is not able to read and use the forms, video, or complex javascript. That means that forms and databases will not be active in the harvest, and files that can only be streamed from a website have not been harvested.
-
Can I search the archive?
Yes, by:
- Entering a search term which searches the combined House and Senate harvests, or
- Browsing from the House or Senate home pages.
-
Why isn't the site I'm looking for in the archive?
Sites were not harvested because:
- were not linked to one of those supplied-URLs
- they were password protected
- the harvest engine could not find or access them
(Note: Harvest engines do not capture dynamic web content. See a report on limitations of capabilities.)
-
Does webharvest.gov track usage statistics?
This website uses Google Analytics Premium. Please refer to the following policies on Google's website for more information: