Definition Edit

The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention used to limit the impact of automatic web crawlers (spiders) on a web server. Well-behaved web page retrieval software will only visit pages permitted by the robots.txt file.

Overview Edit

Web administrators who wish to limit bots’ actions on their Web server need to create a plain text file named “robots.txt.” The file must always have this name, and it must reside in the Web server’s root document directory. In addition, only one file is allowed per Web site. Note that the robots.txt file is a standard that is voluntarily supported by bot programmers, so malicious bots . . . often ignore this file.

The robots.txt file is a simple text file that contains some keywords and file specifications. Each line of the file is either blank or consists of a single keyword and its related information. The keywords are used to tell robots which portions of a Web site are excluded.[1]

References Edit

  1. NIST Special Publication 800-44, at 5-7.

Ad blocker interference detected!

Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.