Using Bayesian Analysis to Filter Blog Posts

By Deane Barker on November 23, 2003

Working with Bayesian Categorizers: Jon Udell tests a novel theory: if SpamBayes can effectively determine what I think is spam and what I don’t, then why couldn’t it be used to determine blog posts I want to read and those I don’t if, given a big enough sample of both?

There’s been some discussion in the blog world about using a Bayesian categorizer to enable a person to discriminate along various interest/non-interest axes. I took a run at this recently and, although my experiments haven’t been wildly successful, I want to report them because I think the idea may have merit.

Long article, with a lot of benchmarking and tests. The results are inconclusive, but given the effectiveness of SpamBayes and the “in-vogueness” of Bayesian analysis lately, I think it’s not a matter of “if” but a matter of “when.”

This brings back the Nirvana of digital content: a computer that knows what I want to read and delivers it to me, rather than me having to go find it. This vision has been with us for decades now. Recently, I saw fragments of it in the promotional video for Apple’s Knowledge Navigator from the late 80s.

Even before then, you’d read about computers that would predict what you wanted and would do things for you. The computer would just know that you’ll want to go to that concert, so it will get tickets, etc. Is text analysis the way to do this? Is that avenue in opposition to the lofty goals of The Semantic Web? Are they opposing paths to the same goal?