Sitemap Protocol: Google just released this today. It’s like robots.txt, except that it shows search engines (well, just Google right now, but others will follow…) how to get to URLs on your site that are not linked from other pages. Actually, you can put all your URLs in this file, if you want — it can be up to 10MB — or 50,000 URLs, after gzip compression.
The Sitemap Protocol allows you to inform search engine crawlers about URLs on your Web sites that are available for crawling. A Sitemap consists of a list of URLs and may also contain additional information about those URLs, such as when they were last modified, how frequently they change, etc.
Now the rush comes for content management systems to include the automatic generation of this file as a feature. I predict Movable Type will be first, since it’s just another index template. Someone could probably write the template in a couple of minutes.
See the comments of this post for a discussion about the pros and cons of the “use-it-if-you-find-it” theory behind robots.txt-like files (such as this one).