Something fairly obvious hit me in the face yesterday: robots.txt files can be a cracker’s best friend.
We knew of someone who had a directory on their site filled with the install files and license keys of all their software so it would be easy to find. In a cursory nod to security, they put a “disallow” rule for this folder in their robots.txt file to ensure it wasn’t indexed. However, in doing this, they simply provided a handy record in a standardized location for anyone who was looking for something they were trying to hide.
How often does this happen, I wonder, and what does your robots.txt file reveal about your site? Yes, you can prevent search engines from indexing something (those that respect the file, anyway), but you’re also announcing to the world that there’s something there you don’t want anyone poking around in. (Remember when the White House tried this?). You may as well put out a “Start Hacking Here” sign.
If you have a secure area on your site, perhaps you’d do better with META tags?
Same effect, but the “don’t index me” command is embedded in the page itself, which means you have to find it first.
Perhaps we should all go check our robots.txt files right now to see if there’s anything incriminating in them? Mine’s cool.