Google’s Sitemap Protocol

By Deane Barker on June 3, 2005

Sitemap Protocol: Google just released this today. It’s like robots.txt, except that it shows search engines (well, just Google right now, but others will follow…) how to get to URLs on your site that are not linked from other pages. Actually, you can put all your URLs in this file, if you want — it can be up to 10MB — or 50,000 URLs, after gzip compression.

The Sitemap Protocol allows you to inform search engine crawlers about URLs on your Web sites that are available for crawling. A Sitemap consists of a list of URLs and may also contain additional information about those URLs, such as when they were last modified, how frequently they change, etc.

Now the rush comes for content management systems to include the automatic generation of this file as a feature. I predict Movable Type will be first, since it’s just another index template. Someone could probably write the template in a couple of minutes.

See the comments of this post for a discussion about the pros and cons of the “use-it-if-you-find-it” theory behind robots.txt-like files (such as this one).



  1. Yes the Google Sitemap Protocol is an excellent idea. I believe that it is a step toward pushing content to search engines instead of the search engines pulling it.

    This, I think, will move toward something more like television.

    Take care


Comments are closed. If you have something you really want to say, tweet @gadgetopia.