Whenever I publish a blog post I manually search Google to make sure that the post is indexed. And yeah, Google used to index my latest blog posts in real-time until few months back. Nowadays there’s a short delay in indexing but nonetheless, the crawling speed is not bad.
Now what’s annoying is content scrapers. I noticed that another blog was copying all my latest posts by fetching the same from RSS feeds. And what’s even more uglier is that it’s ranking higher than me even though Google is indexing my articles before the spammer’s blog. It’s really embarrassing to see original articles showing as supplemental results (omitted results) on SERPs.
And when I checked the content scraper’s website I noticed that 100% of the content is copied from other blogs by way of RSS syndication. Now what’s interesting is that the spammer is copying the articles word-by-word and is even linking to the original source (in my case) within the blog post and also at the end of the article as I have enabled RSS credits at the end of each feed.
That’s not all! That website was even showing Google AdSense ads. AdSense’s policy clearly says that we’re not allowed to place Google ads on websites distributing copyrighted materials. And it includes sites hosting copyrighted content as well as sites linking to other sites containing copyrighted materials.
Today, even those websites with unique content find it difficult to get into AdSense and here a Made For AdSense (MFA) blog with 100% copied content was showing Google ads.
It’s really a shame for Google that it can’t determine the original author of an article especially as a search engine that claims itself to be so advanced and smart after the recent algorithmic updates – Panda, Penguin, EMD, etc.
Dear Google, you’re STILL far away from that “perfect” search engine definition.
Here’s What I Did
You can check out the webmaster discussion What to do if someone copied your article and is now ranking higher than you? which I started at Google Webmaster Forum for more insights. Since Google didn’t respond directly (other users were helpful though) I decided to contact the webmaster directly and this post gave additional insights (and even an email template which I used to send the DMCA notice) on what to do if someone steals our content. Finally, the webmaster did respond to my email and he removed all the copied blog posts and I see that the site is not showing AdSense ads as I reported the website to Google AdSense as well.
How To Detect And Beat Content Scrapers
The quickest way is as I mentioned earlier to Google search the titles of your latest blog posts with quotes so that you can find the exact copies of your articles as long as your titles are unique.
Copyscape is another tool to search for copies of your webpage. You can enter an URL and it will find all the duplicates of that URL on the web.
Another way is by using FeedBurner. Go to FeedBurner > Your Feed > Analyze > View > Uncommon Uses.
FeedBurner: What are Uncommon Uses?
As you probably know FeedBurner manages thousands of feeds daily. So it can identify any potential abuse of your content and they call it “Uncommon Uses”. It can be a credible website or even a blog spam (content scraper). The spammer which I have mentioned in this blog post was listed under “REFERRERS” when I first noticed the same.
What You Should Do If Someone Copied Your Content And Is Ranking Higher?
The first step should be to get in touch with the webmaster of the website. Use the Whois Lookup to find the address of the domain registrant and notify him about the copyright infringement.
If this does or doesn’t work then you should report the content scraper site as a spam to Google. Use the Report webspam form to submit the URL(s). Please note that “Webspam” refers to webpages that try to manipulate Google SERPs. Use this form to report other issues like paid links, malware, phishing, legal issues, etc.
The next step should be to file a DMCA (Digital Millennium Copyright Act) complaint to Google (and other search engines). Use the Removing Content From Google form to submit a DMCA complaint to Google and use this form to submit a copyright or trademark infringement notice to Bing.
If the content scraper’s blog or the blog in question is hosted by Blogger.com then you can select “Blogger/Blogspot” under “What Google product does your request relate to?” or else you should select the option “Web Search”. The status of your DMCA Notice can be tracked at the Removal Dashboard and you can file a fresh complaint by using the “Create a new notice” button.
Additionally you should contact the domain registrar and hosting provider to report the copyright infringement. I contacted the domain registrar but their response didn’t help as they told me that “We are only the domain provider. We have no control over the website content. In this case, you will need to contact the hosting provider of this website”.
So I contacted HostGator (as they were the host of that domain name) and their support was friendly and told me that they require DMCA notices to be filed via fax or letter.
And if the copyright infringing website is spamming then you can report the same to its hosting company as almost all web hosting companies have zero tolerance on spam.
How You Can Build Links Using Content Scrapers
You can really stop a content scraper from copying content on your blog but you can make sure that they’re linking to you so that you get few backlinks that way that can send you some free traffic.
If you’re using Yoast WordPress SEO plugin then you can navigate to SEO > RSS to change the content of your RSS feed.
It’s used to append content at the beginning or end of your RSS feeds so that content scrapers will automatically include backlinks to your blog so that search engines can identify the original author of an article. If you don’t use Yoast WordPress SEO plugin then you can use the RSS Footer plugin by Yoast which does the same job.
Now How To Beat Content Scrapers Using PubSubHubbub
Today, Google indexes blogs (almost) in real-time. Gone are the days we submit URLs manually and it used to take several weeks or months to get indexed.
But it’s still possible that content scrapers might have copied your article and got indexed before you. The solution is to implement PubSubHubbub (PuSH) on your website. It’s different from traditional pinging of servers.
PubSubHubbub (also known as PuSH) provides a way for services to subscribe to new entries in feeds (RSS & Atom) when they are published. This allows for new content to be distributed in a rapid fashion and reduces the need to regularly poll feeds to check for new content.
Googlebot checks your website to see if there’s any new content to index. PubSubHubbub (PuSH) distributes the new content to all the subscribers via a hub.
Of course you can ping different sites that you’ve updated your content. But it’s still not an effective way as the bots of those websites have to crawl your website to fetch the new content.
Now if you have enabled PubSubHubbub then your hub fetches your new content and then it pushes the same to all subscribers.
What is PubSubHubbub?
Blogger.com, WordPress.com, Posterous, Tumblr, etc. are already PubSubHubbub enabled so you don’t have to do anything to i`t. Now if you have a self-hosted WordPress.org blog then you can use the plugin PuSHPress or PubSubHubbub.
And apart from that you should Ping the following servers:
…and should also enable FeedBurner > Publicize > PingShot.
(PingShot is a quick notification service that enables your feed to be updated in the widest variety of places as quickly as you add new content.)
Here’s a Google Webmaster Help Video where a user asked:
Google crawls site A every hour and site B once in a day. Site B writes an article, site A copies it changing time stamp. Site A gets crawled first by Googlebot. Whose content is original in Google’s eyes and will rank highly? If it’s A, then how does that do justice to site B?
And Matt Cutts’ response was:
In a nutshell: he suggested to share our latest webpages on social media (Facebook, Twitter, etc.) so that Google may crawl those links faster and also recommends PubSubHubbub. He later recommends filing a DMCA notice against the content scraper and a spam report if its an auto-generated website.
Image Credit: Free Digital Photos