Jeff Dean keynote at WSDM 2009: A Google engineer gave a talk at a conference where he revealed some crazy stats about Google’s architecture:
Google now detects many web page changes nearly immediately, computes an approximation of the static rank of that page, and rolls out an index update. For many pages, search results now change within minutes of the page changing.
[…] Their performance gains are also impressive, now serving pages in under 200ms. Jeff credited the vast majority of that to their switch to holding indexes completely in memory a few years back. […] that now means that a thousand machines need to handle each query rather than just a couple dozen […]
So my query hits a thousand machines? Maybe the “Google kills trees” argument from a couple months ago wasn’t so far off base?
Google’s tweaking went all the way down to where the data was physically located on disk:
[…] Jeff said they paid attention to where their data was laid out on disk, keeping the data they needed to stream over quickly always on the faster outer edge of the disk, leaving the inside for cold data or short reads.