Robots.txt Survey

By Deane Barker on July 18, 2005

Robots.txt, The Big Crawl: These guys grabbed 75,000 robots.txt files, and found a few problems:

[…] we found a wide array of problems with peoples robots.txt files. We found more than 5% of the robots.txt used bad style and up to 2% were so badly formed that they would not be recognized by any spider.

One of the most common mistakes is backwards syntax […] A large number of people had multiple directories per line […] Another common mistake, is editing your robots.txt in DOS mode

Not only do they tell you the problems they found, but they explain how various spiders would interpret the problems. Some of the “problems” are correct per the spec, but spiders don’t always follow the spec…