Enter Web Address:      Adv. Search

FAQs
For the curious surfer, we've gathered the following commonly asked questions. For the supremely curious, we recommend contacting us directly at [email protected]

 Questions

The Wayback Machine
1. What is the Wayback Machine?
2. Can I link to old pages on the Wayback Machine?
3. Are other sites available in the Wayback Machine?
4. What does UTC mean?
5. What does it mean when a site's archive data has "updated" next to it?
The Archived Collection of Sites
6. What is an Internet Library?
7. What is the Election 2000 Internet Library?
8. Who was involved in the Election 2000 Internet Library?
9. How was the Election 2000 Internet Library made?
10. How large is the Election 2000 Internet Library?
11. Can I search the Election 2000 collection?
12. What type of machinery is used in this Internet Library?
13. How do you archive dynamic pages?
14. Why are some sites harder to archive than others?
15. Some sites are not available because of Robots.txt or other exclusions.
What does that mean?
16. How can I get my site included in the Archive?
17. How can I help?

 

 Answers


  1. What is the Wayback Machine?
    The Wayback Machine is a service that allows people to visit archived versions of stored websites.  Visitors to the Wayback Machine can type in an URL, select a date, and then begin surfing on an archived version of the web.  Imagine surfing circa 1999 and looking at all the Y2K hype, or revisiting an older copy of your favorite website.  The Wayback Machine can make all of this possible. See the Press Release.

  2. Can I link to old pages on the Wayback Machine?
    Yes! Alexa Internet has built the Wayback Machine so that it can be used and referenced by anybody and everybody. If you find an archived page that you would like to reference on your web page or in an article, you can copy the URL and share it with others. You can even use fuzzy URL matching and date specifications... but that's a bit more advanced.

  3. Are other sites available in the Wayback Machine?
    Not yet. Stay tuned for more information.

  4. What does UTC mean?
    Formerly and still widely called Greenwich Mean Time (GMT) and also World Time, UTC (Coordinated Universal Time) serves to coordinate the timekeeping differences that arise between atomic time (which is derived from atomic clocks) and solar time (which is derived from astronomical measurements of the Earth's rotation on its axis relative to the Sun). UTC is widely broadcast by precisely coordinated radio signals; these radio time signals ultimately furnish the basis for the setting of all public and private clocks.

  5. What does it mean when a site's archive date has "updated" next to it?
    This means that the content on the page has changed from the previous day we archived. If you don't see the word "updated" next to the archived page date, then the content is the same.

  6. What is an Internet Library?
    An Internet Library is a repository of digital materials that have been preserved. Historically, libraries have stored books, periodicals, newspapers, maps, photographs, sound, and broadcast media materials that may be of future historical interest. With the dawn of the digital era, libraries are now evolving to store digital content. The Election 2000 Internet Library is the first of its kind -- an Internet library that allows people to browse numerous copies of stored historical digital materials collected from the web. More information is available in Brewster Kahle's white paper Public Access to Digital Materials. (If it prompts you for a password, you can just hit cancel, and it should work fine...)

  7. What is the Election 2000 Internet Library?
    The Election 2000 Internet Library is the first archive available on the Wayback Machine.  It is a collection of 797 sites archived repeatedly from 8/1/200 to 1/21/2001, covering the controversial Election of 2000.  The Wayback Machine makes it possible to revisit these sites.

  8. Who was involved in the Election 2000 Internet Library?
    The Library of Congress commissioned the Election 2000 Internet Library. Compaq Computer Corporation crawled and stored all the digital materials collected from the Election 2000 web sites. The Internet Archive provided project coordination and quality assurance. Alexa Internet built the Wayback Machine, which provides access to all the sites in the collection.

  9. How was the Election 2000 Internet Library made?
    Several lists of sites known to carry election content were identified and later crawled by Compaq Computer Corporation crawlers. Compaq continued to crawl, adding more sites to the list as requested, through the Fall and Winter of 2000 and into January of 2001. By the end of January, Compaq had gathered approximately two terabytes of content. The remaining content was placed on a large capacity server at Alexa Internet where it was indexed and catalogued.

  10. How large is the Election 2000 Internet Library?
    The total size of the collection before compression was approximately 2 terabytes, or about 1.5 million floppy disks. Stack those disks one on top of the other, and you'll gasp for air as you climb two miles high.
    The Election 2000 Internet Library was gathered between the months of August 2000 and January 2001. 797 sites containing election content were crawled several times per week and stored on disk.

  11. Can I search the Election 2000 collection?
    Using the Wayback Machine, it is possible to search for the names of sites contained in the collection and to specify date ranges for your search. However,we do not yet have an indexed text search of these documents. The collection is a bit too large and complicated for that. We continue to work on it and should have a full text search soon.

  12. What type of machinery is used in this Internet Library?
    The Election 2000 Internet Library is stored on a single computer. The computer runs on the FreeBSD operating system. It has a total of twenty 75-gigabyte IDE disks and 512Mb of memory. Storage of these files was made possible by compressing the files and removing duplicate copies of identical files.

  13. How do you archive dynamic pages?
    There are many different kinds of dynamic pages, some of which are easily stored in an archive and some of which fall apart completely. When a dynamic page renders standard html, the archive works beautifully. When a dynamic page contains forms, JavaScript, or other elements that require interaction with the originating host, the archive will not accurately reflect the original site's functionality.

  14. Why are some sites harder to archive than others?
    If you look at our collection of archived sites, you will find some broken pages, missing graphics, and some sites that aren't archived at all. We have tried to create a complete archive, but have had difficulties with some sites. Here are some things that make it difficult to archive a web site:

    • Robots.txt -- If our robot crawler is forbidden from visiting a site, we can't archive it.
    • Javascript -- Javascript elements are often hard for us to archive, but especially if it generates links without having the full name in the page. Plus, if javascript needs to contact with the originiating server in order to work, it will fail when archived.
    • Server side image maps -- Like any functionality on the web, if it needs to contact the originating server in order to work, it will fail when archived.
    • Unknown sites -- If Alexa doesn't know about your site, it won't be archived. Use the Alexa service, and we will know about your page. Or you can visit our Archive Your Site page.
    • Orphan pages -- If there are no links to your pages, our robot won't find it (our robots don't enter queries in search boxes.)

    As a general rule of thumb, simple html is the easiest to archive.

  15. Some sites are not available because of Robots.txt or other exclusions.
    What does that mean?

    The Standard for Robot Exclusion (SRE) is a means by which web site owners can instruct automated systems not to crawl their sites. Web site owners can specify files or directories that are allowed or disallowed from a crawl, and they can even create specific rules for different automated crawlers. All of this information is contained in a file called robots.txt. While robots.txt has been adopted as the universal standard for robot exclusion, compliance with robots.txt is strictly voluntary. In fact most web sites do not have a robots.txt file, and many web crawlers are not programmed to obey the instructions anyway. However, Alexa does respect robots.txt instructions, and even does so retroactively. If a web site owner ever decides he / she prefers not to have a web crawler visiting his / her files and sets up robots.txt on the site, the Alexa crawlers will stop visiting those files and mark all files previously gathered as unavailable. This means that sometimes, while using the Internet Archive Wayback Machine, you may find a site that is unavailable due to robots.txt or other exclusions. Other exclusions? Yes, sometimes a web site owner will contact us directly and ask us to stop crawling or archiving a site. We comply with these requests.

  16. How can I get my site included in the Archive?
    Alexa Internet has been crawling the web since 1996, which has resulted in a massive archive. If you have a web site, and you would like to ensure that it is saved for posterity in the Alexa Archive, chances are that it's already there. We make every effort to crawl the entire publicly available web. However, if you wish to take extra measures to ensure that we archive your site, you can visit the Alexa "Archive Your Site" page.

  17. How can I help?
    The Internet Archive actively seeks donations of digital materials for preservation. Alexa Internet provides access to a web-wide crawl that contains copies of the publicly accessible web. If you have digital materials that may be of interest to future generations, let us know. The Internet Archive is also seeking additional funding to continue this important mission. Please contact us if you wish to make a contribution.


Home | Directory | FAQs | Privacy | Contact

The Internet Archive Wayback Machine is a service created by Alexa to enable people to surf an ongoing archive of the web.