Google last month came up with another webmaster video about duplicate content. This time, Matt Cutts discussed about how Google treats quotes or a block quote copied from another website/blog. It’s an interesting video as I used to quote and reference a lot to other blogs and sites in most of my blog posts. So I decided to write a blog post by consolidating Google’s view on several aspects of duplicate content.
What Is Duplicate Content?
Duplicate content usually refers to similar content within a single domain or across multiple domains. Duplicate content can be exactly the same content or similar content (also known as spinning) and are categorized as Malicious and non-malicious duplicate content. Malicious duplicate content is otherwise known as web spam as it refers to content that’s written to manipulate search engines. Non-malicious duplicate content generally refers to the variations of the same web page. For example, printer friendly and mobile friendly pages that are hosted on another domain or subdomain or subdirectory.
The Google Panda update was probably the first move by Google to penalize content farms, low quality web pages and those websites that massively duplicated content to manipulate search engine rankings.
What You Should Do To Avoid Duplicate Content Penalty
Non-malicious duplicate content is not really an issue but it’s better if you could avoid that. There are several methods to avoid duplicate content within a single domain name.
1. Use 301 Redirects & Set Your Preferred Domain Name
If the web pages can be accessed with and without using a WWW in your URLs then search engines treat those pages as separate URLs. For example, the web page http://www.minterest.com/web-directory/ and http://minterest.com/web-directory/ are two separate pages according to search engines unless you use a 301 redirect from your non-preferred URL to the preferred URL. My preferred domain is http://www.minterest.com/ and hence I redirect all the non-WWW URLs to the one with a WWW.
Once you set a 301 redirect make sure that you choose your preferred domain name in your Google Webmaster Tools so that Google will show your preferred URLs in search engine results pages (SERPs).
Go to Google Webmaster Tools > Configuration > Settings > Preferred domain
2. Always Use Your Preferred URLs
Now that you set a preferred domain by making use of a 301 redirect, make sure that you always use consistent URLs for all links – even if it’s internal linking or for your link building campaigns. For example, if your URLs contains a trailing slash then use the same whenever you write your URLs.
3. Use Meta Tags Effectively
If your website contains too many empty pages or under construction pages then use the noindex meta tag so as to block search engines from indexing those pages.
4. Noindex Metatags – noindex, nofollow and noindex, follow
<meta name=”robots” content=”noindex, nofollow” />
<meta name=”robots” content=”noindex, follow” />
5. Canonicalization Of URLs
Canonicalization is the best way to manage the duplicate content within your own website.
What is a canonical page?
“A canonical page is the preferred version of a set of pages with highly similar content.”
Let’s say I offer some service via a sales page that targets U.S., U.K. and India and I drive targeted traffic by using Search Engine Marketing. Now, since the currency units of these countries are different I choose to create 3 different landing pages with exactly the same content and the only difference is the currency units of my services.
If I allow search engines to index those pages then it treats them as duplicate content. So, I can either add a noindex, nofollow meta tag to the UK and India page or make use of canonicalization.
Assume that my primary market is U.S. and I want search engines to show my sales page targeting U.S. customers on SERPs. What I do in this case is specify a canonical link for each version of my sales page.
Let’s say my preferred sales page is:
and I created two copies of the sales page targeting UK and India:
Now, to specify the canonical link to search engines I add the link tag
<link rel=”canonical” href=”http://www.minterest.com/services/internet-marketing-usa.html“>
to the <head> section of the duplicate sales pages which tells the search engines that all those duplicate sales pages refer to the canonical page at:
That said, if it’s just few sales pages then there’s no need to block those pages as search engines are smart enough to figure it out.
Duplicate Content – Google SEO Guidelines
Now coming back, the purpose of this blog post was to highlight Google’s view on block quotations we bloggers use. Block quotes are used when we use an excerpt from another website and wants to distinguish it from our own content.
So, what is the best way to quote from another website without getting penalized for the so called duplicate content?
Google’s recommendation is that if we’re just quoting someone or highlighting something which is said by some other blogger then what we need to do is put that in a block quote with a link back to the original source.
Now, if you simply copy an entire article from another website or multiple websites and mark it as a block quote without adding your own insights or views then it can raise a red flag.
That said, if you quote excerpts from multiple articles or block quotes and then add your own comments or views or whatever with a linkback to the original source then it is absolutely fine.
Ask Yourself: What Is Compelling About Your Website?
I know tons of tech blogs out there and all they do is write the same story that appears on Mashable, TechCrunch or something else in their own words. Now, in that case what is compelling about that website? Nothing!
Google understands that “You can’t make up news!” but their recommendations for news websites is that you write your own version of the original story rather than rewriting what’s written on some other site. It means that you should add some value to your version by adding some insights.
And that exactly is the reason why I DON’T publish news posts.
Image Credit: Free Digital Photos