-
What is the
Wayback Machine?
The Wayback Machine is a service that allows people to visit
archived versions of stored websites. Visitors to the
Wayback Machine can type in an URL, select a date, and then
begin surfing on an archived version of the web. Imagine
surfing circa 1999 and looking at all the Y2K hype, or revisiting
an older copy of your favorite website. The Wayback Machine
can make all of this possible. See the Press
Release.
-
Can I link to old
pages on the Wayback Machine?
Yes! Alexa Internet has built the Wayback Machine so that it
can be used and referenced by anybody and everybody. If you
find an archived page that you would like to reference on your
web page or in an article, you can copy the URL and share it
with others. You can even use fuzzy URL matching and date specifications...
but that's a bit more
advanced.
-
Are other sites
available in the Wayback Machine?
Not yet. Stay tuned for more information.
-
What does UTC mean?
Formerly and still widely called Greenwich Mean Time (GMT) and
also World Time, UTC (Coordinated Universal Time) serves to
coordinate the timekeeping differences that arise between atomic
time (which is derived from atomic clocks) and solar time (which
is derived from astronomical measurements of the Earth's rotation
on its axis relative to the Sun). UTC is widely broadcast by
precisely coordinated radio signals; these radio time signals
ultimately furnish the basis for the setting of all public and
private clocks.
-
What does it mean
when a site's archive date has "updated" next to it?
This means that the content on the page has changed from the
previous day we archived. If you don't see the word "updated"
next to the archived page date, then the content is the same.
-
What is an
Internet Library?
An Internet Library is a repository of digital materials that
have been preserved. Historically, libraries have stored books,
periodicals, newspapers, maps, photographs, sound, and broadcast
media materials that may be of future historical interest. With
the dawn of the digital era, libraries are now evolving to store
digital content. The Election 2000 Internet Library is the first
of its kind -- an Internet library that allows people to browse
numerous copies of stored historical digital materials collected
from the web. More information is available in Brewster Kahle's
white paper Public
Access to Digital Materials. (If it prompts you for a password,
you can just hit cancel, and it should work fine...)
-
What is the Election
2000 Internet Library?
The Election 2000 Internet Library is the first archive available
on the Wayback Machine. It is a collection of 797 sites
archived repeatedly from 8/1/200 to 1/21/2001, covering the
controversial Election of 2000. The Wayback Machine makes
it possible to revisit these sites.
-
Who was involved
in the Election 2000 Internet Library?
The Library of Congress commissioned the Election 2000 Internet
Library. Compaq Computer Corporation crawled and stored all
the digital materials collected from the Election 2000 web sites.
The Internet Archive provided project coordination and quality
assurance. Alexa Internet built the Wayback Machine, which provides
access to all the sites in the collection.
-
How was the Election
2000 Internet Library made?
Several lists of sites known to carry election content were
identified and later crawled by Compaq Computer Corporation
crawlers. Compaq continued to crawl, adding more sites to the
list as requested, through the Fall and Winter of 2000 and into
January of 2001. By the end of January, Compaq had gathered
approximately two terabytes of content. The remaining content
was placed on a large capacity server at Alexa Internet where
it was indexed and catalogued.
-
How large is the Election
2000 Internet Library?
The total size of the collection before compression was approximately
2 terabytes, or about 1.5 million floppy disks. Stack those
disks one on top of the other, and you'll gasp for air as you
climb two miles high.
The Election 2000 Internet Library was gathered between the
months of August 2000 and January 2001. 797 sites containing
election content were crawled several times per week and stored
on disk.
-
Can I search the
Election 2000 collection?
Using the Wayback Machine, it is possible to search for the
names of sites contained in the collection and to specify date
ranges for your search. However,we do not yet have an indexed
text search of these documents. The collection is a bit too
large and complicated for that. We continue to work on it and
should have a full text search soon.
-
What type of
machinery is used in this Internet Library?
The Election 2000 Internet Library is stored on a single computer.
The computer runs on the FreeBSD operating system. It has a
total of twenty 75-gigabyte IDE disks and 512Mb of memory. Storage
of these files was made possible by compressing the files and
removing duplicate copies of identical files.
-
How do you
archive dynamic pages?
There are many different kinds of dynamic pages, some of which
are easily stored in an archive and some of which fall apart
completely. When a dynamic page renders standard html, the archive
works beautifully. When a dynamic page contains forms, JavaScript,
or other elements that require interaction with the originating
host, the archive will not accurately reflect the original site's
functionality.
-
Why are
some sites harder to archive than others?
If you look at our collection of archived sites, you will find
some broken pages, missing graphics, and some sites that aren't
archived at all. We have tried to create a complete archive,
but have had difficulties with some sites. Here are some things
that make it difficult to archive a web site:
- Robots.txt -- If our robot crawler is forbidden
from visiting a site, we can't archive it.
- Javascript -- Javascript elements are often
hard for us to archive, but especially if it generates links
without having the full name in the page. Plus, if javascript
needs to contact with the originiating server in order to
work, it will fail when archived.
- Server side image maps -- Like any functionality
on the web, if it needs to contact the originating server
in order to work, it will fail when archived.
- Unknown sites -- If Alexa doesn't know about
your site, it won't be archived. Use the Alexa service, and
we will know about your page. Or you can visit our Archive
Your Site page.
- Orphan pages -- If there are no links
to your pages, our robot won't find it (our robots don't enter
queries in search boxes.)
As a general rule of thumb, simple html is
the easiest to archive.
-
Some sites are
not available because of Robots.txt or other exclusions.
What does that mean?
The Standard for Robot Exclusion (SRE) is a means by which web
site owners can instruct automated systems not to crawl their
sites. Web site owners can specify files or directories that
are allowed or disallowed from a crawl, and they can even create
specific rules for different automated crawlers. All of this
information is contained in a file called robots.txt. While
robots.txt has been adopted as the universal standard for robot
exclusion, compliance with robots.txt is strictly voluntary.
In fact most web sites do not have a robots.txt file, and many
web crawlers are not programmed to obey the instructions anyway.
However, Alexa does respect robots.txt instructions, and even
does so retroactively. If a web site owner ever decides he /
she prefers not to have a web crawler visiting his / her files
and sets up robots.txt on the site, the Alexa crawlers will
stop visiting those files and mark all files previously gathered
as unavailable. This means that sometimes, while using the Internet Archive Wayback Machine, you may find a site that is unavailable due
to robots.txt or other exclusions. Other exclusions? Yes, sometimes
a web site owner will contact us directly and ask us to stop
crawling or archiving a site. We comply with these requests.
-
How can I
get my site included in the Archive?
Alexa Internet has been crawling the web since 1996, which has
resulted in a massive archive. If you have a web site, and you
would like to ensure that it is saved for posterity in the Alexa
Archive, chances are that it's already there. We make every
effort to crawl the entire publicly available web. However,
if you wish to take extra measures to ensure that we archive
your site, you can visit the Alexa "Archive
Your Site" page.
-
How can I help?
The Internet Archive actively seeks donations of digital materials
for preservation. Alexa Internet provides access to a web-wide
crawl that contains copies of the publicly accessible web. If
you have digital materials that may be of interest to future
generations, let
us know. The Internet Archive is also seeking additional
funding to continue this important mission. Please contact
us if you wish to make a contribution.