SpamPoison

By Deane Barker on January 26, 2004

Anti-Spam – Fight Back Against Spammers…: I don’t know how technically feasible this method actually is, but it’s an awfully funny concept.

All you have to do is link to this page so that whenever a spammer’s robot scans your page, they will be sucked into this one. To link to this page, just use this simple code… [link deleted]. E-mail collecting robots will be sent in an infinite loop and will get dynamically generated fake e-mail addresses, adding enormous quantities of bogus data to the databases of the spammers, thus polluting those files so badly that they become essentially useless.

The fake pages are pretty funny. This reminds me of that brilliant idea from a couple of months ago: Let’s All Respond to Spam.

Gadgetopia

Comments

  1. Hi,

    Many thanks for writing about SpamPoison!

    I am posting some comments to questions posted in Mamboserver forum, which quoted Gadegetopia in the folllowing topic:

    http://forum.mamboserver.com/viewtopic.php?t=6054

    “I have seen stuff like that before, and in theory it could cause the spammers some delays…but…”

    If the domain name exists, the spammer’s mail server will need to establish a tcp connection to discover if the e-mail address exists or not using SMTP protocol. It demands a lot of time for each account. Older applications used to generate the domains. It doesn’t work anymore because the spammers before sending msgs run DNS checkers, which are very very fast, dropping the inexistent/inactive/deleted domains from the database. In fact, it is inocuous to use generated domains.

    “For one…that method they sugest isn’t perfect. It randomly generated email address, but those could be valid email addresses!!! So, people will still get spammed by them…just not you. I don’t like that”

    All domains used in the addresses are owned by spammers and were taken from “Top 10 ROKSO Spammers” list (Spamhaus Project http://www.spamhaus.org/rokso/index.lasso). If some generated email address by chance exists, the spam msg will be delivered for just one of the top spammers! It’s really fight back!!!

    “One way that i thought was cool was where you have a script in a directory called something like EMAILADDRESSES that basically runs the bot around in circles with redirects until it hopefully times out.”

    Modern spammer’s bots are configured to extract max X e-mail addresses from a page, to visit max Y pages from a site, and to follow max Z links from a site. Adicional info is ignored. This way, it’s not possible to loop a bot using a simple script. SpamSaver/SpamPoison uses a lot of domains and infinite subdomains, generates variable number of e-mail addresses for each page, and few links for another sites, which are trap sites at all. At moment, the link in spamsaver.com shows URL using spamsaver.com domain but we will vary the domain all time as well the pages content. I think it will be hard to a spammer to detect the trap.

    Francisco Brazil

    PS: Please excuse my sorry English

  2. Somehow, that site ended up in my referrer logs.

    I imagine it was simple referrer spam, which coming from a company that claims to be anti-spam strikes me as somewhat hypocritcal.

    Unless of course, I just failed to find the link to my site from their front page….

  3. Sorry for the inconvenience. Please accept my apologies.

    We are doing a research, running a commercial e-mail collector bot, visiting 1-2 pages of 25.000 selected sites, checking how many pages contains e-mail addresses. It’s a kind of little Netcraft we are doing and we intend to publish the results.

    I think that the term “referer spam” is a little strong because these bot is checking only 1-2 pages from the entire site. Until now, I did believe the bots’ visits were buried in the logs, ignored by the users, because the number of hits is insignificant compared with the site’s daily traffic.

    Again, please excuse my sorry English.

    Francisco

  4. Uhhhhh, not very smart! Infinite domains and subdomains but the same IP addresses. Duh!!!!! Think like a spammer before you come up with a way to fight back, smart asses. Wasted my legit crawler’s bandwidth and two hours of my time changing the navigation model.

  5. @ken

    First, we have over 30 servers around the world running honeypots and sending info to important reputable anti-spam organizations.

    Second, since December 2003 we have a robots.txt file and each link has “nofollow”. You spent bandwidth and time to fix your faulty crawler because your robot’s bad behavior as it did disregard exclusion standards:

    From Google site:

    “Block or remove pages using a robots.txt

    A robots.txt file restricts access to your site by search engine robots that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages. (All respectable robots will respect the directives in a robots.txt file, although some may interpret them differently. However, a robots.txt is not enforceable, and some spammers and other troublemakers may ignore it. “

    Simple like that.

    Thanks

    Best Regards,

    Francisco

  6. BTW in shared hosting has always been usual — and nowadays an ARIN/RIPE/APNIC/LACNIC requirement — the use of just 1 IP to host several thousand users and domains. And, of course, one domain can host unlimited sites as subdomains. Real life example: blogs hosted as subdomains of WordPress, Blogspot, Blogger. A well-written crawler does not restrict the number of colleted pages based on number of sites under a domain or IP. It can’t aford to ignore huge amounts of data. Writting a crawler requires at least basic knowledge about Web hosting and Internet standards. Ken’s post makes evident he doesn’t know Jack about both.

Comments are closed. If you have something you really want to say, email editors@gadgetopia.com and we‘ll get it added for you.