Diluted Spam Messages

By Deane Barker on November 27, 2003

Looks like spammers are trying to fool Bayesian filters by diluting their text. I got a spam today with two lines at the top advertising “cash freedom” or something, and I noticed that the message scrolled quite a bit. After about a hundred line breaks, I found this:

I can now see that I have not been and opposed slavery as a denial of that unity, have also won; but with large and protuberant noses, very furry or very bristly hair,

I did a Google search, and it looks like it’s an excerpt from “The Island of Doctor Moreau.”

It would appear that spammers are trying to “hide” the text of their message amidst “legitimate” text to trick Bayesian filters into letting them through. It worked in this case — Thunderbird did not mark it as spam.

I forwarded it to an account I monitor with Outlook equipped with SpamBayes. It was tagged as “Unsure” with a score of 20%, periloiusly close to the 15% it needs to be categorized as legitimate (although, to be fair, it was sent from one of my own accounts, and SpamBayes may have incorporated that fact into the score).

I think the key here is the number of spammish tokens compared to the number of legitimate tokens. If they can load a message up with legitimate text — hidden with line breaks or some other cloaking scheme — Bayesian filters will get confused enough to let it through.

My only comment to the spammers about this is: send me a different chapter next time. In fact, I’ll make you a deal. If you put a couple lines of spam at the top of a message, then, under that, print a complete chapter of “The Island of Doctor Moreau,” I will read your message. In fact, send me 23 messages — one for each chapter. After reading about penis enlargment 23 times, maybe I’ll consider it.