By Deane Barker | February 21, 2009 | 1 Comment
Jeff Dean keynote at WSDM 2009: A Google engineer gave a talk at a conference where he revealed some crazy stats about Google’s architecture:
Google now detects many web page changes nearly immediately, computes an approximation of the static rank of that page, and rolls out an index update. For many pages, search results now change within minutes of the page changing.
[…] Their performance gains are also impressive, now serving pages in under 200ms. Jeff credited the vast majority of that to their switch to holding indexes completely in memory a few years back. […] that now means that a thousand machines need to handle each query rather than just a couple dozen […]
So my query hits a thousand machines? Maybe the “Google kills trees” argument from a couple months ago wasn’t so far off base?
Google’s tweaking went all the way down to where the data was physically located on disk:
[…] Jeff said they paid attention to where their data was laid out on disk, keeping the data they needed to stream over quickly always on the faster outer edge of the disk, leaving the inside for cold data or short reads.
What This Links To
The last paragraph suggests a Google search traverses an Internet Database on the order of megaterabytes. Even if that means a request spans thousands of PCs, I don’t think they are all doing the same thing and racing each other, as suggested in the Google Kills Trees external article; it’s a kilo-processor (1000 CPU) parallel processing technology! And Google can still do its job in 200 ms, within the time frame of a car crash and a human operator coming to terms with that.
Seriously though, if there were no internet, then billions of computers won’t be left running 24/7 and the world would be greener. Imagine that..