March 9, 2009

Readability – An Arc90 Lab Experiment: This is kind of a cool little bookmarklet.

Readability is a simple tool that makes reading on the Web more enjoyable by removing the clutter around what you’re reading.

Now, what’s remarkable is that it works on almost any site. CNN, the New York Times, USA Today, your blog, etc. — click the bookmark and everything is stripped away, leaving the clean, uncluttered text of the main content. But how does it figure out what the main content is?

I dug into the code a bit. The bookmarklet just loads a script file from their site. I went through that, and it appears that it finds the DIV with the most P tags (it replaces BRs with Ps to normalize everything). It saves that and tosses the rest.




