Google Webmaster Tool Alerts about Duplicate Content
Last night Google implemented a really useful alert system to Google webmaster tools, an alert that will notify us about duplicate content issues across domains. In other words, if your pages get removed from the SERPs because they are thought to be a duplicate from one of the other pages on the internet, Google will let you know about it.
Now in some cases we had our pages disappear from the search and couldn’t find a reason for it, this will help us understand Google better, but it doesn’t mean we will agree with Google every time. Their post as well as the webmaster help topic explain how Google determines which is the original and which is duplicate content. We will get to that in a second, but the basic problem is that Google may as well make a mistake in figuring out which one is original or even remove similar content because it finds it to be duplicate, and in a massive web that is growing even larger, this can pose a real problem for most websites.
In most cases we will be at fault, as we have multiple domains or simply used someone else’s content without letting Google know that our version is not original, and we have tools to let Google know of the canonical version. So with the new alert, Google will tell us as soon as it finds a page that is not from our website, in other words, sometimes Google will not agree with our point of view on which is the original and which is the duplicate.
One of the most common reasons to find ourselves dealing with duplicate content issues, according to Google, is our own lack of discipline. A simple canonical or 301 lets Google knows of the preferred URL for our content, or the original version. That usually happens with multi domain ownership as webmasters tend to use similar content on their websites. Another thing is multilingual websites as well as rel=”alternate” tag which helps Google distinguish which content is intended for which region or language, of course along with the use of rel=”canonincal.”
Other factors that can influence Google discovery and identification of duplicate content include configuration mistakes and as well as malicious website attacks. Configuration mistakes include fault canonicalisation and misconfigured servers and malicious attacks include cases where 301 redirect or a cross domain rel=”canonical” tag is forcefully implemented into the <HEAD> section of your website and points to an external non-related domain, in which case Google suggest going through a cleanup process of your website.
An when Google does make mistake and removes your URL which is the original and leaves the syndicated or the copied one, they suggest you follow up on their DMCA policy and submitting a request for removal of the copied URL and reinstating of your URL.
1. Raising awareness of cross-domain URL selections – http://googlewebmastercentral.blogspot.com/2011/10/raising-awareness-of-cross-domain-url.html
2. Cross-domain URL selection – http://www.google.com/support/webmasters/bin/answer.py?answer=1716747&topic=20985
3. Multi-regional and multilingual sites – http://www.google.com/support/webmasters/bin/answer.py?answer=182192
4. About rel=”alternate” hreflang=”X” – http://www.google.com/support/webmasters/bin/answer.py?answer=189077
5. Cleaning your site – http://www.google.com/support/webmasters/bin/answer.py?answer=163634
6. Google’s DMCA policy – http://www.google.com/support/bin/answer.py?answer=1386831