Scraper sites – one more word in my web vocabulary

Image representing FairShare as depicted in Cr...
Image via CrunchBase

I heard about FairShare through one of the technology blog, and registered Monastic Musings Too just to see what would happen.  FairShare is a free service that checks to see if large chunks of one’s postings appear elsewhere on the web.  In academia, we have TurnItIn to detect plagiarism in student papers; FairShare is a blog equivalent.

There are legitimate reasons that 100% of one’s post might appear elsewhere.  I’m really pleased each time something I write is picked up by the English edition of Il Sussidiario – with whom I have a standing agreement.  My writing is properly attributed and gets a wider audience – it’s a win-win situation.

Scrapers.  After subscribing to FairShare, however, I discovered the existence of scraper sites.  They have the appearance of being a topical web site, but 100% of the content is stolen, often without attribution.  If you go to the web site Business Insurance, for instance, you’ll find my recent post on health care reform – word for word, even including the picture.

A little digging around reveals that this is just one of a fleet of sites all hosted out of; all of them seem to be scraper sites.  Based on a previous experience with being scraped, there seem to be bots that pounce on new postings that include particular key words – and then lift them wholesale and paste them elsewhere.  In the previous instance, the bot at least kept the name and source: but the article was re-posted within 2 minutes of the original posting.

What to do? It’s futile to do much protesting, at least without developing better web detective skills.  In the recent incident, the perpetrar seems to own the domain name, and has no contact information.   I can chuckle, I guess, about the poor soul who was looking for information on business insurance and ended up with my philosophical reflections on health care reform.  Perhaps I should start inserting a few scraper-enticing words into each post for that effect.   (not)

  • I do notice the authorship of posts more carefully
  • I’ve increased my diligence about including names of writers in the post along with anything I quote from them, and a link to the original wherever possible.  If my post of their work gets scraped, at least the attribution to them will go along with it!
  • I continue to use FairShare just to be aware of the extent of the problem
  • Benedictines practice common ownership of property; in an aggressive and underhanded way, scraping is simply a reminder that my writing is dependent on the information and writing of others who are sources and part of a larger conversation.

Don’t look for a post on the spirituality of scraper sites any time soon, though.

Reblog this post [with Zemanta]

Comments are welcome and moderated

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.