Improvements to the Robots.txt Protocol

By Deane Barker on June 4, 2008

One Standard Fits All: Robots Exclusion Protocol for Yahoo, Google and Microsoft: Google, Yahoo, and Microsoft have gotten together and actually agreed on extensions to the REP — the Robots Exclusion Protocol, otherwise known as your robots.txt file.

For instance, they’re going to allow a new META tag: NOSNIPPET:

Tells a crawler not to display snippets in the search results for a given page.

How about NOARCHIVE:

Tells a search engine not to show a “cached” link for a given page.

Plus, you can have wildcards in URL patterns in robots.txt now, which is something people have been after for years.

And Yahoo has taken it one step further with what I absolutely think needs to be done for every search engine. You can put a CSS class on any element called “robots-nocontent.” Yahoo will strip this before indexing.

[…] webmasters can now mark parts of a page with a ‘robots-nocontent’ tag which will indicate to our crawler what parts of a page are unrelated to the main content and are only useful for visitors. We won’t use the terms contained in these special tagged sections as information for finding the page or for the abstract in the search results.

This has been available in localized search systems for years. I’ve used it in with both Swish-E and the Google Mini. It’s a great way to make sure that search engines don’t hit on irrelevant content, but instead focus on the core content of the page.

But, Joe brought up this point: shouldn’t navigation count? If I have a term in my nav, shouldn’t I get credit for this? If this is true — which it probably is — where do you draw the line? How do you decide what parts of the page are not “index worthy”?

Additionally (and cynically), why do you care for public search? The most basic SEO strategy is one of selfishness — you want every search hit, regardless of how relevant it is. For you to exclude content on your page just out of altruism or a desire to make general search results better, is just not likely.

What Yahoo needs to do is provide a benefit to doing this. If they explained that by doing this, you’re increasing keyword intensity by removing garbage words and thus making your keywords a larger proportion of total words, perhaps that would help. But there has to be an SEO advantage or no one is going to do this.

Via David Gammel