Scraper Sites

By Deane Barker on August 3, 2007

Please Don’t Steal This Web Content: Remember those people that reprinted my content a few weeks back? Apparently this phenomenon has a name.

VanFossen isn’t referring to the kind of plagiarism in which a lazy college student copies sections of a book or another paper. This is automated digital plagiarism in which software bots can copy thousands of blog posts per hour and publish them verbatim onto Web sites on which contextual ads next to them can generate money for the site owner.

Such Web sites are known among Web publishers as “scraper sites” because they effectively scrape the content off blogs, usually through RSS (Really Simple Syndication) and other feeds on which those blogs are sent.

What This Links To


  1. So what we’re really dealin’ with here is RSSSS

    Real Simple Syndication Scraper Syndrome

    Hmmmm … now that we have a pithy acronym, think we can get government funding?

    BTW, seen that happen to my content as well. Emailed a cease and desist and the perp. played all victim card on me.

  2. One thing is that these (bottom) scrapers don’t bother to duplicate any images you may have in a post, just the text, and link to the images on your server. So swap out the linked images for something not so flattering, and you’ve got a little bit of revenge.

    The dirtbags.

  3. I’ve seen at least one site that (automatically) puts copyright notices in all of its articles (I think the notices only show up in the RSS feed, not in the actual articles on the site, but I may be wrong). I have no idea if that does any good or not, but it may be something to consider.

    Do you think that would stop visitors to those sites from frequenting the scraper sites? I guess you would have to try and make it clear in the notice that the content was stolen, which would probably be problematic (especially if you were allowing legitimate aggregators to use your content). If your copyright notice said, “Copyright 2007 This content was stolen from”, it might have some deterrence value.

    Just a thought. Again, not sure if it would do any good or not, but ….

  4. Screen scraping has been a common term since long before it showed up on Lorrelle’s radar, especially among plugin developers.

    There are plenty of plugins that deal with the problem, from things that simply add a copyright (no, it doesn’t usually deter them) to those that detect when something has been reposted via an embedded trigger image. Some plugins will allow you to switch your content to something else especially for those scraping programs, so that they not only don’t get your content, but you can have them post text about how despicable they are while they think they’re scraping you.

Comments are closed. If you have something you really want to say, tweet @gadgetopia.