The Googlebot and Subscription Sites

By Deane Barker on September 11, 2008

: Registration/subscription sites: You are apparently allowed to detect the Googlebot and allow it to bypass your login page and crawl content behind it. This means that subscription content will appear in the Google index, but people will be prompted to login when they click a search result.

Google News does include sites that require users to set up usernames and passwords to access content. Since crawlers can’t currently fill out registration or subscription forms, nor do they support cookies, we need to be able to circumvent those pages in order to successfully crawl those sites. The easiest way to do this is to configure your servers to not serve the registration or subscription page to our crawlers (when the User-Agent is “Googlebot”).

It also means, of course, that you could change your User Agent string and bypass the login page for any site that does this.

Via a good discussion on Reddit about Experts Exchange and how they do some funky things with Google.

Just for giggles, I downloaded the User Agent Switcher Firefox extension and confirmed that when you change your User Agent to impersonate the Googlebot, all the answers appear uncloaked on Experts Exchange pages.

