How to Stop Content Scraping (Or At Least Slow It Down)

Share on FacebookTweet about this on TwitterBuffer this pageShare on Google+Share on LinkedIn

This happens all the time.

As soon as a marketing method reaches maturity and is fully accepted by mainstream marketers, it develops a dark side. Like black hat SEOs for search engine optimization or spammers for email marketing.

Now that content marketing is a recognized marketing strategy for connecting with customers and appeasing search engines, it’s being hit by lazy marketers who want to enjoy the benefits without doing in any of the work.

I’m talking about content scrapers.

It blows me away that a website believes they can republish another brand’s blog articles within minutes of the original article’s publication—verbatem, giving no credit whatsoever—and believe it makes them look credible.

More irritating is the fact that copyright infringement and plagiarism are now accepted business practice. There is a way to copy with integrity, but site scraping isn’t it.

masked content scraper is hovering over computer

The problem with site scrapers

It’s a little contradictory, I know. If a reputable website emails and asks permission to reprint an article, I’m usually okay with it. I just ask that a few technical things be done so we don’t lose any SEO value to the article.

Ask, and I’m okay. But scrape? No way!

I’m offended that another site can build its entire blog with my blog’s articles. Some of them even list themselves as the author. Most of them have no contact information anywhere on their site, so it’s nearly impossible to track them down.

I suppose it’s a matter of principle. Working with other bloggers is cool. Having to protect your rights as a content creator is another.

I know I’m not alone in this. Here’s a quote from Breaking Media’s CEO, Jonah Bloom, in a 2010 Reuters article:

We’re a small company with limited resources, and I got fed up wasting valuable time trying to track down these parasites who aren’t only benefiting from our editors’ hard graft but also potentially messing with our search engine results by creating duplicates of our content on other sites.

Whereas others aren’t bothered at all. This comes from Felix Salmon who wrote the above-mentioned article:

In general, I’m flattered by scrapers, just as I’m flattered by people who send my blog entries around by email. It’s all good in the long run.

I can sort of see his point, but like I said, it’s the principle of the thing.

So what’s a blogger to do?

How do you stop content scraping? I’m not sure it’s actually possible to stop them. But there are a few things you can do to protect yourself. Here are four simple (non-technical) ideas that might make a difference.

First, use Yoast’s WordPress SEO plugin on your blog.

Yoast automatically prints this sentence at the bottom of your RSS feed:

The post Customer Engagement, A Lesson from Zappos 
appeared first on kathrynaragon.com.

It links to your article and cites your website, so scraped content references you as the original publisher.

Second, always cite the author in your article.

On Crazy Egg, I recently implemented an author box that automatically inserts the author’s bio at the bottom of each article. (A quick shoutout to AuthorSure, which is the best plugin I know for doing this, and developers Russell and Liz Jamieson, who provide great customer support.)

Prior to that, our authors added an “About the Author” blurb to the bottom of the article—so if an article was scraped, at least the author got credit. Unfortunately, after I moved to the author box, our writers could have their articles pop up on other sites with no byline.

So I started adding a sentence, centered at the bottom of every article. Something like this:

Read other Crazy Egg articles by Sharon Hurley Hall.

This mentions Crazy Egg again, so there’s no doubt about where the article originated, but it also links to the writer’s author page (also created by AuthorSure). That way, even if an article is scraped, my writers get credit for their work.

Side note: This was a recent change, so I haven’t been able to observe a difference in the number of articles scraped. My hope is that fewer scrapers will want our articles with this blatant reference to our site—but I’m not holding my breath.

Third, include internal links in all blog posts

You ought to be doing this anyway. A strong internal linking strategy is good for SEO and keeps people on your blog longer.

In every article, make sure you link to three or four other articles or pages on your site.

Google recommends this as a way to avoid duplicate content on your site. Instead of explaining a concept that you cover in another article, simply link to it.

I recommend that you always make those links open in the same window. (External links should open in another window.) That way, when readers click on links in scraped content, your site replaces the scraped site on that tab. (Yes, I know they can hit the back key, but I want people to experience my site and I’m hopeful they’ll stick around.)

Of course, this isn’t fool-proof. Content scrapers will sometimes strip out all your links, which means you lose any traffic value there might have been in having a site republish your articles.

Fourth, use excerpts in your RSS feed.

This is a growing trend among content publishers. Washington Street Journal does it. So does FT, NYTimes, Mashable and ConvinceandConvert.

Just for fun, I emailed Convince and Convert to ask if content scraping was behind their decision to use only excerpts in their RSS—and whether it was working for them. Here’s the answer:

Hi Katie. We do this for three reasons:

- traffic

- scraping

- our email goes through Feedblitz, and we use the RSS to create the email itself, and we only want the excerpt in the email

Does it work? Traffic from email is definitely up quite a bit this year, but that’s mostly a factor of list growth.

On Crazy Egg, I’ve considered limiting the RSS to an excerpt, but despite Convince and Convert’s answer, I’ve read that other bloggers have lost subscribers over this issue. RSS readers don’t want to click through to your article. That’s the point of an RSS reader, after all—to be able to read all the blogs you follow in one spot.

Truncated RSS feeds make sense from a business standpoint. You’d think they would increase click-through, which would increase traffic and ad income. But experts assert that you have more traffic, not less, with a full RSS feed. And if you have to deal with site scrapers, so be it.

So the challenge remains. How do you deal with content scrapers without damaging your traffic and/or reputation? There’s no definitive answer.

The real question is this: Is it worth the battle?

Thoughts? Opinions?

Additional reading:

How to Keep Scrapers from Ruining Your Content Strategy, by Neil Patel

Google’s Matt Cutts: That Scraper Isn’t Hurting Your Mom’s Site, by Barry Schwartz

7 Tips and Tools to Stop Content Thieves in their Tracks, by Chris Dyson

About 

Kathryn Aragon is an award-winning copywriter, blogger, marketing consultant and product creator specializing in social content and digital marketing. Connect with her on Twitter and Google+.

Opt In Image
Enjoy this post?
Get regular email updates

Be sure to subscribe to the C4 Report and get regular updates about awesome posts just like this one — and more! You will also receive the ebook, Email-ology, as my gift!

Comments

  1. says

    Thanks for this, Kathryn. I think all reputable bloggers have to deal with this issue. I think adding the author link makes it easier for authors to find the people scraping the content, though I don’t know if it stops it. As a writer, I can understand why people use excerpts. As a reader, I must say that I prefer full RSS feeds; when I don’t get them, I usually unsubscribe eventually, because it means lots of clicks to read an article. (However, Feedly mobile has a built in browser which means you can still read the article on your mobile device without leaving the app).

  2. says

    Excellent post Khatryn, and very timely as I am racking my brain on how to keep the scrappers from “stealing” my content. In my case I’ve put together the most comprehensive database of funding for grad school- This took me months of hard work and thousands of $. My objective is to have it available for everyone to use. But now I have no choice but to ask folks to sign-up to view the info. Still I’m not sure this will “protect” the content. Will definitely try the “author” strategy and we will see!! Thx a million for the info and all the best! Sarah

    • says

      Glad you found it helpful, Sarah. After putting so much work into putting your database together, it makes sense to do everything you can to protecct it. Let us know how the sign-up helps. Fingers crossed for you!

  3. says

    I usually go after content scrapers pretty aggressively. I’ve been dealing with one of the more persistent ones lately, but I haven’t had the time to go after him full force yet.

    Here’s what I usually do, and so far it hasn’t failed:

    1. I send the scraper a 48 hour takedown notice.

    2. I report all scraped content to the major search engines via DMCA notices to request that they de-index the stolen content. This prevents any risk of the scraper outranking you (which shouldn’t happen anyway if Google does its job).

    3. If they’re using any ad networks, I contact them next with DMCA notices, pointing out that the scraper is (usually) violating their TOS as well by posting their ads on copyright-infringing material. I’ve had scrapers get their ad network accounts banned over this in the past (which can affect their ability to monetize not only the site in question but also any other sites they own). If they only use private advertisers, I reach out to those advertisers publicly alerting them to the fact that they’re paying to promote their company on illegally-published content. You’ll get less of a response from them, but some will pull their ads, again hurting the scraper.

    4. Only after I send DMCA notices to those two sources do I send one to their host. In most cases the hosts are responsive. Even if they don’t have to abide by DMCA requests because of the country they’re located in, publishing scraped content will often violate their own TOS, meaning they’ll sometimes still take action.

    The idea is to hit the scrapers where it hurts — traffic and revenue sources. If you use letter templates for this where you only have to plug in the links to the stolen content and the scraping site’s information, it doesn’t take more than a few minutes.

    I added the RSS feed notices through WordPress SEO on my current blog being scraped, and so far it hasn’t helped. My message thanks subscribers and then says if they’re reading it anywhere but their feed reader the content is being scraped and published illegally. The scraper keeps publishing the content anyway, including the message that he’s a content thief. I suppose not everyone cares if they’re outed for being slime.

    • says

      Wow! Thanks for sharing your process, Jenn. You’ve got it down to a fine art.

      Like you, I haven’t found the RSS feed notices to make a difference. The scrapers probably don’t read the stuff they’re scraping, which is all the more offensive.

    • says

      Jenn, I really admire your tenacity, and I’ve saved this post (thanks Kathryn) and your comment in case I ever have a severe scraping problem.

      It’s a sad commentary on our world when people so blatantly try to profit from someone else’s work.

Trackbacks

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Subscribe without commenting