The IT Law Wiki
Advertisement

Overview[]

Founded in 1996, the Internet Archive is a nonprofit organization designed to build an Internet library and make it available to the public.[1] Unlike the Library of Congress’s web collection efforts (e.g., Web Capture), which focus on particular topics, the Internet Archive seeks to create a comprehensive record of web content for use by scholars and researchers. It has been archiving web pages for almost twelve years, and archives approximately two billion web pages per month. It makes this material available over its website after a delay ranging from one week to six months after collection.

How it works[]

The Internet Archive relies on a protocol known as the “Oakland Archive Policy” in collecting and providing access to web content.[2] Website owners can opt out of having their content copied, or “harvested.” This can be done mechanically by putting a robots.txt file on the site. The Internet Archives web crawling utility will respond to the file and bypass the site. Upon notification, Internet Archive will also block access to previously collected website material.[3] The ability to opt out arguably protects website owners that derive financial and other benefits from making available older material, and minimizes the risks of a copyright infringement lawsuit against Internet Archive.

References[]

  1. See Michele Kimpton, "Written Response to Section 4, Section 108" (Apr. 7, 2006) (full-text).
  2. See Comments submitted by the Internet Archive to the 108 Study Group 4-6 (Apr. 7, 2006) (full-text).
  3. The Internet Archive has also developed a web application to allow “noncommercial ‘memory’ institutions” around the world that lack technical resources to archive content they regard as important. Statement of Michele Kimpton, supra (full-text).
Advertisement