The IT Law Wiki

Robot Exclusion Standard

32,062pages on
this wiki
Add New Page
Add New Page Talk0

Definition Edit

The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention used to limit the impact of automatic web crawlers (spiders) on a web server. Well-behaved web page retrieval software will only visit pages permitted by the robots.txt file.

Overview Edit

Web administrators who wish to limit bots’ actions on their Web server need to create a plain text file named “robots.txt.” The file must always have this name, and it must reside in the Web server’s root document directory. In addition, only one file is allowed per Web site. Note that the robots.txt file is a standard that is voluntarily supported by bot programmers, so malicious bots . . . often ignore this file.

The robots.txt file is a simple text file that contains some keywords and file specifications. Each line of the file is either blank or consists of a single keyword and its related information. The keywords are used to tell robots which portions of a Web site are excluded.[1]

References Edit

  1. NIST Special Publication 800-44, at 5-7.

Also on Fandom

Random Wiki