This happens all the time.
As soon as a marketing method reaches maturity and is fully accepted by mainstream marketers, it develops a dark side. Like black hat SEOs for search engine optimization or spammers for email marketing.
Now that content marketing is a recognized marketing strategy for connecting with customers and appeasing search engines, it’s being hit by lazy marketers who want to enjoy the benefits without doing in any of the work.
I’m talking about content scrapers.
It blows me away that a website believes they can republish another brand’s blog articles within minutes of the original article’s publication—verbatem, giving no credit whatsoever—and believe it makes them look credible.
More irritating is the fact that copyright infringement and plagiarism are now accepted business practice. There is a way to copy with integrity, but site scraping isn’t it.
The problem with site scrapers
It’s a little contradictory, I know. If a reputable website emails and asks permission to reprint an article, I’m usually okay with it. I just ask that a few technical things be done so we don’t lose any SEO value to the article.
Ask, and I’m okay. But scrape? No way!
I’m offended that another site can build its entire blog with my blog’s articles. Some of them even list themselves as the author. Most of them have no contact information anywhere on their site, so it’s nearly impossible to track them down.
I suppose it’s a matter of principle. Working with other bloggers is cool. Having to protect your rights as a content creator is another.
I know I’m not alone in this. Here’s a quote from Breaking Media’s CEO, Jonah Bloom, in a 2010 Reuters article:
We’re a small company with limited resources, and I got fed up wasting valuable time trying to track down these parasites who aren’t only benefiting from our editors’ hard graft but also potentially messing with our search engine results by creating duplicates of our content on other sites.
Whereas others aren’t bothered at all. This comes from Felix Salmon who wrote the above-mentioned article:
In general, I’m flattered by scrapers, just as I’m flattered by people who send my blog entries around by email. It’s all good in the long run.
I can sort of see his point, but like I said, it’s the principle of the thing.
So what’s a blogger to do?
How do you stop content scraping? I’m not sure it’s actually possible to stop them. But there are a few things you can do to protect yourself. Here are four simple (non-technical) ideas that might make a difference.
First, use Yoast’s WordPress SEO plugin on your blog.
Yoast automatically prints this sentence at the bottom of your RSS feed:
It links to your article and cites your website, so scraped content references you as the original publisher.
Second, always cite the author in your article.
On Crazy Egg, I recently implemented an author box that automatically inserts the author’s bio at the bottom of each article. (A quick shoutout to AuthorSure, which is the best plugin I know for doing this, and developers Russell and Liz Jamieson, who provide great customer support.)
Prior to that, our authors added an “About the Author” blurb to the bottom of the article—so if an article was scraped, at least the author got credit. Unfortunately, after I moved to the author box, our writers could have their articles pop up on other sites with no byline.
So I started adding a sentence, centered at the bottom of every article. Something like this:
Read other Crazy Egg articles by Sharon Hurley Hall.
This mentions Crazy Egg again, so there’s no doubt about where the article originated, but it also links to the writer’s author page (also created by AuthorSure). That way, even if an article is scraped, my writers get credit for their work.
Side note: This was a recent change, so I haven’t been able to observe a difference in the number of articles scraped. My hope is that fewer scrapers will want our articles with this blatant reference to our site—but I’m not holding my breath.
Third, include internal links in all blog posts
You ought to be doing this anyway. A strong internal linking strategy is good for SEO and keeps people on your blog longer.
In every article, make sure you link to three or four other articles or pages on your site.
Google recommends this as a way to avoid duplicate content on your site. Instead of explaining a concept that you cover in another article, simply link to it.
I recommend that you always make those links open in the same window. (External links should open in another window.) That way, when readers click on links in scraped content, your site replaces the scraped site on that tab. (Yes, I know they can hit the back key, but I want people to experience my site and I’m hopeful they’ll stick around.)
Of course, this isn’t fool-proof. Content scrapers will sometimes strip out all your links, which means you lose any traffic value there might have been in having a site republish your articles.
Fourth, use excerpts in your RSS feed.
This is a growing trend among content publishers. Washington Street Journal does it. So does FT, NYTimes, Mashable and ConvinceandConvert.
Just for fun, I emailed Convince and Convert to ask if content scraping was behind their decision to use only excerpts in their RSS—and whether it was working for them. Here’s the answer:
Hi Katie. We do this for three reasons:
- our email goes through Feedblitz, and we use the RSS to create the email itself, and we only want the excerpt in the email
Does it work? Traffic from email is definitely up quite a bit this year, but that’s mostly a factor of list growth.
On Crazy Egg, I’ve considered limiting the RSS to an excerpt, but despite Convince and Convert’s answer, I’ve read that other bloggers have lost subscribers over this issue. RSS readers don’t want to click through to your article. That’s the point of an RSS reader, after all—to be able to read all the blogs you follow in one spot.
Truncated RSS feeds make sense from a business standpoint. You’d think they would increase click-through, which would increase traffic and ad income. But experts assert that you have more traffic, not less, with a full RSS feed. And if you have to deal with site scrapers, so be it.
So the challenge remains. How do you deal with content scrapers without damaging your traffic and/or reputation? There’s no definitive answer.
The real question is this: Is it worth the battle?
How to Keep Scrapers from Ruining Your Content Strategy, by Neil Patel
Google’s Matt Cutts: That Scraper Isn’t Hurting Your Mom’s Site, by Barry Schwartz
7 Tips and Tools to Stop Content Thieves in their Tracks, by Chris Dyson