How to Identify Content Thieves and Hit Them Where it Hurts

Share on FacebookTweet about this on TwitterBuffer this pageShare on Google+Share on LinkedIn

Kathryn recently shared some ideas about how writers can deter content scrapers—the people and bots that republish your content without your permission, often using your RSS feed.

In the comments on that post I shared a few examples of what I do in that situation to hit content thieves where it hurts—their traffic and income.

hit content thieves where it hurts

What happens if your content thief is anonymous though? Can you identify them? Who do you send a take down notice to? How do you find out who their hosting company is?

I did a bit of digging to help Kathryn identify her own scraper and which company hosts the scraper’s website so she could try to have the infringing content removed. In this post I’ll give you some tips and tools to help you do the same. But first:

Why You Should Pursue Content Scrapers

Many bloggers ignore content theft. The most common excuse I’ve heard is that it’s better for them to just take the links back to their own website that come with it. But the truth is, content scrapers can do real damage. Here are some things to consider:

  • Scraped content might outrank your content. Yes, it has been known to happen.
  • Low quality “unnatural” links can hurt your site’s search engine rankings (don’t assume the links they give you are a good thing).
  • If you let these bad links accumulate, you might end up with a huge mess to clean up later using Google’s disavow tool. Remember, Google has a history of getting stricter about what kinds of links can hurt you, not getting more lenient. While they usually identify and discount links from big scrapers, there is no guarantee they’ll catch smaller or newer ones (or manual content thieves).
  • Failing to go after content thieves could hurt your ability to license your content to other publications later, even in print. Even if you don’t do this now, do you really want to rule this option out for the future? Remember, attribution and a link back in no way changes the fact that a scraper is violating your intellectual property rights—something that has real value.
  • While this won’t be a concern for all bloggers, if you happen to work as a freelance writer, failing to pursue content thieves could devalue your work. Why should clients pay you for unique material if they see you let others profit from your work for free by simply taking it?
  • It doesn’t have to take a lot of time to get stolen content removed if you use standard templates for your take down notices.
  • Frankly, sometimes it just feels good.

Now that you know why it can be important to pursue content scrapers, let’s talk about how you can go about it.

How to Identify Content Thieves

You know a site is stealing your content. But you don’t know who to contact to request that the stolen content be removed.

You’ve tried the obvious only to find out they have no contact page and there are no names or email addresses anywhere on the site. Who do you contact? If you can’t contact someone from the site, how do you find out who their host is so you can contact them?

Here are some tips and tools that might help you identify content thieves:

WHOIS Records

Check WHOIS records for domain registrant information if the site is hosted on its own domain.

If you’re lucky, you’ll find a name, email address, mailing address, and phone number. If you’re unlucky, you’ll only find information for a domain privacy company blocking the registrant’s contact info. You might also find their site’s nameservers (more on that below).

The Scraper’s Own Website

Browse around the scraper’s site a bit. See if they’ve responded to any blog comments. If so, their name might be there. See if they have a forum or other features on the site that might offer their name.

This is how I found the handle Kathryn’s scraper was using. She was the administrator listed for the site’s forum, which only had two users. Then I could easily tie her to the site through other sources like social media profiles where she promoted the site.

Social Media Accounts

On that note, search popular social networks for links to the scraper’s site. If there is only one person linking to it in their profile or updates, chances are good you’ve found the site’s owner. You might even be able to contact them this way with an informal request to remove your content.

Your Blog Comments

Check your blog comments from your blog’s admin area. Scraper sites often leave trackbacks in these comments. If they did, you might find an IP address associated with their site. Try to grab this information before deleting them. I usually move them to my trash folder so I still have them for reference, and then I delete them when the site finally complies with the take down notice.

Reverse IP Lookups

Use a reverse IP lookup tool. Enter the scraper’s IP address or domain name. You’ll get a list of other sites hosted on the same IP address.

If there are a lot of sites, they’re likely using a shared host, so you won’t learn much from this alone. But if there are just a few sites hosted on that IP address, it’s more likely these sites are owned by the same person. Try running them through a WHOIS search to see if there’s a common owner.

This can be especially helpful if a scraper has legitimate sites in addition to their scraper site. They sometimes only think to use domain privacy services for the scraper blog, meaning you can find their contact information through one of the other sites they own.

WhoIsHostingThis.com

Next visit WhoIsHostingThis.com to help you identify the scraper’s hosting company.

Sometimes you’ll be able to find the host you need to contact right away. Other times you might get a reseller or a datacenter instead of the actual hosting company. In that case, pay attention to the domain name listed within the “Name Servers” section.

For example, with a scraper I’ve recently been dealing with, CyrusOne was listed as the host. But they run datacenters which is a little bit different than what we’re looking for—the host that leases their server space to the scraper.

Looking beyond that, WebsiteWelcome.com was listed in the name servers section. So I visited WebsiteWelcome.com and was immediately provided with an abuse-related email address.

I contacted them with a take down notice, and received a very prompt reply from HostGator, the actual host on record. They informed me they were giving the site owner 48 hours to resolve the issue, and if they failed to do so HostGator would “disable” the content in question. This is the end result you’re after.

Chances are good that these tactics will help you identify your scraper, or at least their hosting company. At that point, it’s time to take action. While it looks like a lot of work, it really isn’t. It takes no more than a few minutes to go through most of these tools, where all you have to enter is the scraper’s domain name. And you probably won’t have to use all of these.

How to Stop Content Thieves

I take a multi-tier approach to dealing with content thieves, and so far it’s never failed. While I have some go-to lawyers I would use if necessary, it has never had to go that far. So don’t assume the only way to get results is to spend a lot of money taking someone to court. It’s not.

Here’s the basic progression I follow and one I recommend:

1. The “Nice” Approach

This is when you would contact the site owner directly with a polite, yet firm, take down request, giving them an opportunity to remove the content without you taking any further action against them. (I find that my level of “politeness” is inversely proportional to the amount of my content they’ve stolen.)

I officially give them 48 hours. I’ve occasionally given them a bit longer if they contact me saying it will be taken care of soon, or if it slips my mind. It happens.

This is as far as it usually has to go, and it weeds out the truly ignorant who believe there is nothing wrong or illegal with what they’re doing.

2. Hit ‘Em Where it Hurts

If they ignore my take down request, the gloves come off. Hey, they were warned.

Unlike many people I don’t go right to the content thief’s hosting company to have the material removed. Why? They’re usually stealing from others as well, and my DMCA notice to the host will only affect my stolen articles most of the time. Plus, it feels good to strike back once in a while.

Instead, I start by hitting them where it really hurts—their traffic and revenue sources.

Getting the Stolen Content De-indexed from Search Engines

First, I contact the major search engines with DMCA requests to have the infringing material removed from their search results. This way, if the scraper happens to be getting any search traffic, that can be shut down.

Stripping Their Ad Revenue

If the site has any advertisements, the fun part comes next. Make a note of every advertiser they work with. This will usually be an ad network, but sometimes there are individual advertisers.

If the site owner is using an ad network to serve ads on the infringing content, report them to the network. They’re almost guaranteed to be in violation of the ad network’s terms. After all, the advertisers paying that ad network don’t want their ads running alongside illegally-published material. So the network is in a position to take action.

The real perk of this is that scrapers frequently have more than one website. They also can usually associate multiple sites with one ad network account. If they get their ad network account banned, it not only prevents them from running ads on the site stealing your content, but it has the potential to strip their ad revenue from all sites they own.

If they have private advertisers, you could also reach out to them. Chances are they’ll discontinue their ad contracts if they find out their company is associating with a site owner who openly breaks the law. Some won’t. And it won’t always be worth your time if there are many private advertisers.

3. Go Above Their Head

If, after all of this, you still can’t get your content removed (or if you don’t care about going through the second step at all, which is fine), go to the host.

If the host is in the U.S., they will very likely act quickly. With my own most recent scraper, I actually received an email while writing this guest post letting me know that the infringing content was removed. In fact, the entire site scraping my content was removed.

This site was also hotlinking my images (instead of hosting a copy themselves, they were loading it directly from my server to their site, which steals your bandwidth). This is what one stolen article looked like when I had a bit of fun and redirected image files to an “I steal content” image when they were loaded from his site.

hotlinkredirection

Some hosts will have a DMCA request form for you to fill out. But usually you can send them a DMCA request via email. Here is a cease and desist letter template and DMCA notice templates you can use.

Even if the content thief isn’t located in the U.S. or another country that honors those copyrights, the site might be hosted there. And keep in mind that even if a host isn’t located somewhere a DMCA request is valid, they likely have their own terms of service with their customers. And it’s highly likely that publishing stolen content violates those terms. Sometimes you can get the host to take action on the TOS violations even if it doesn’t technically go through a DMCA notice.

In rare occasions you might come across your content being published on an “untouchable” site where both the site owner and the host are out of reach and uncooperative. At that point you can always opt to pursue it legally if the issue is causing enough harm. But for most bloggers, that probably won’t be worthwhile.

In that case, your best bet is to stick to getting them de-indexed from search engines and targeting their revenue streams. Sometimes they’ll remove your content just to get back in the good graces of Google and their advertisers, or to get you out of their hair.

How do you usually deal with scraping and other content theft?

About the Author: Jennifer Mattern is a professional blogger, freelance business writer, and owner of All Indie Writers where she writes about freelance writing, indie publishing, and blogging. Find her on Twitter (@AllIndieWriters) or on Google Plus (+JenniferMattern).

Image Credit: Canva

About 

Do you have something to contribute to the C4 Community? Submit your guest post ideas to Kathryn by clicking on the "Contact" tab above.

Opt In Image
Enjoy this post?
Get regular email updates

Be sure to subscribe to the C4 Report and get regular updates about awesome posts just like this one — and more! You will also receive the ebook, Email-ology, as my gift!

Comments

  1. says

    In the freelance writing world, it’s tough sometimes to “out” one of our own for swiping content, as you well know. In one case, I’ve been dying to tell this particular blogger to stop ripping off my ideas, rewriting my posts, and making money off my original ideas. But that starts a flame war. Frankly, I don’t know what else to do beyond warning my close chums off that blogger and posting veiled references on my site to let the person know I’m on to the thievery.

    I did contact Google at one point regarding this stolen/reworked content, but to no avail. Unless it’s word-for-word, they don’t seem to care (or respond).

    • says

      It’s also a problem when the thief is operating out of another country. Apparently, they don’t believe copyright laws (American or European) cross borders. Have you tried alerting the advertisers/sponsors on the blogger’s site?

    • says

      Unfortunately I know exactly who you’re talking about. Not only have they stolen from me as well, but I’ve had at least two other colleagues do the same. Worse actually. The other two both stole my professional website’s copy outright (one also stealing the design, thinking she could get away with changing a few images and a few words here and there). The last one you know was taken down. I called them out on it. They actually had the nerve to post asking for help in my writing forums, about their website no less. I have to wonder if they forgot exactly who they stole it from. People never cease to amaze.

      This is a tougher situation than dealing with blatant scrapers. If it’s still fairly clear theft, you can still file a DMCA notice with the host. Even if Google doesn’t respond, the host has to look into it or risk losing their own protection under the law. They must notify the other site owner and give them a chance to either take down the content or file a counter argument as to why they don’t believe they’re infringing your rights. Of course just seeing the content in question would make it pretty clear who complained. So you can’t avoid all confrontation.

      Some people are worth confronting privately, like those who naively believe what they’re doing is fair use or that you’ll find it flattering. Others are worth going the official DMCA route. Some are better suited to a public outing (like the gal I outed on my forum because she was blatantly showing off the infringing site there). And are better ignored. Or perhaps I should say some are better dealt with quietly. You know as well as anyone that word tends to get around the freelance writing community when it comes to folks like these. People will always talk. And no matter how obnoxious they might be right now, they usually fade away. Think of it as waiting for the inevitable. :)

  2. says

    Ha! I can’t believe you do that with your images. That’s hilarious and definitely what they deserve.

    Unfortunately a lot of writers aren’t willing to go through all this hassle. I have yet to see someone steal my content. I have seen small blurbs that then link to the full post, which I don’t really mind since they don’t republish the entire thing.

    Cheers!

    • says

      That’s the thing. It’s not nearly the hassle some writers think it is. A DMCA notice is very easy to send out. There are a lot of templates available. You plugin in the two sites (yours and the scraper’s), and you plugin in the infringing links. In the case of whole site scraping, just the domain is usually good enough. Hit send. Done! That portion takes 5 minutes tops in a typical case.

      Sure, if you struggle to find contact info it takes a little longer. But really it’s one or two simple searches — through the host lookup and the reverse IP lookup. Bookmark those those sites, and just paste in an infringer’s domain when something is found. In 90+% of cases, I’ve found that’s enough to find the host.

      You can make it an even quicker process by sending out requests in batches (once weekly or monthly for example) rather than doing them as you find them. :)

      The small blurbs are technically considered content curation, and there isn’t much we can do about that anyway. They’d make a fair use argument if it got to court, and that’s further than most folks want to take it. It’s the aggregation that’s a bigger problem — when a scraper pulls together full content from a number of blogs to populate their own site which has little to no original content.

      Even the image thing didn’t take too long. I created that one from scratch which took some time, but others just put up an obscene image or something (hopefully one they have rights to use!). The thing is, it’s usually set to show for everyone hotlinking, which can affect feed readers and Google Images. I tried setting it up to only affect my scraper-friend. It didn’t affect other websites, but for some reason a few feed readers were ignoring the site-specific command (they’d show the real images only on reload — very strange). So I wouldn’t recommend that in a typical case. I was just extra ticked at this guy for how long he ignored the takedown request. And I’m usually the wrong person to tick off. ;)

Trackbacks

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Subscribe without commenting