Following the Google Panda updates, website owners have been forced to acknowledge SEO issues that they had so far been able to conveniently bypass or ignore. One of the most popular of these issues is duplicate content.
If truth were told, duplicate content has always been an SEO problem but post the Google Panda updates, Google now has a better approach to the issue and the way duplicate content is handled by the search engine only gets more dramatic and complicated with every algorithm update.
So let’s talk about duplicate content.
First off, what is duplicate content?
By definition, duplicate content is the existence of the same content on two or more pages on the web. To the search engine crawlers, a page is basically any unique URL that has been created. Intentionally or unintentionally, website owners sometimes create two or more URLs leading to the same content. This gives rise to duplicate content issues.
So why is it such a big problem if you have duplicate content?
1) About Google’s supplemental index
Back in the days, indexing pages on the web was a big fat task for the giant search engine, Google. To deal with the issue of duplicate content, pages on the web that were thought to contain duplicate content were stored in what was known as Google’s supplemental index. Once pages landed in this index, they automatically lost the ability to rank.
In 2006, pages in the supplementary index were unified with the main index but pages that contain duplicate content are automatically omitted from the SERPs with serious consequences when it comes to search engine optimization.
2) About Google’s crawl limit
When it comes to Google’s crawl budget, there isn’t an absolute number as to how many pages a website should contain. What you need to know though is this: There is a limit to how many pages Google will crawl on your site before it gives up.
So what happens when Google’s crawl limit is reached?
The best-case scenario: Your pages get indexed a lot less often
The worst-case scenario: Your pages don’t be crawled at all.
To make sure your pages are crawled and indexed, you want to keep your content unique and well organized.
3) About Google’s Panda update
Following the much-talked-about Google Panda updates, there has been a drastic change in the way the search engine handles the issue of duplicate content on the web. In the past, if pages contained duplicate content, they were filtered out or they would cause crawling issues.
Post the Google panda updates, duplicate content on a page can now affect the entire website. Once you get penalized by Panda, even your website pages with non-duplicate content will lose the ability to rank in the SERPs.
How to Fix Your Duplicate Content Issues
1) Delete the page
The most straightforward fix for a duplicate content issue is to simply take off that content from your site (deleting the page). People trying to access the page will be shown a ’404 Not Found Error’ (which you may want to customize to redirect visitors to your home page and popular pages).
2) Redirect your visitors to another page
A second way to deal with duplicate content is to permanently redirect your visitors to another page within your site through a 301 redirect. The cool thing about using a 301 redirect instead of returning a 404 error is that the new page benefits from the link juice of the old page; the links pointing to the old page are not “lost” in terms of SEO.
3) Robots.txt
Next, another option that website owners can resort to for the purpose of fixing duplicate content issues is to leave the duplicate content up for the human visitors of the site but hide it from the search engine crawlers. The best way to do this is to use robots.txt.
The benefit of using robots.txt is that entire folders can be locked.
The drawback is that this duplicate content fix is not really reliable. In addition to that, robots.txt can only come in handy for hiding content that has not yet been crawled. It does nothing for content that has already been indexed
4) Meta Robots
Just like the Meta Title and Meta Description tell the search engines what your page is about, the Meta Robot tag also tells the search bots something. You can use the following tag in the HEAD section of your page to tell the search bots not to index the page, while still following the links on the page:
<META NAME=”ROBOTS” CONTENT=”NOINDEX, FOLLOW”>
Not indexing a page means that they won’t “see” that there is a duplicate page on your site.
5) Rel=”canonical”
There are a few legitimate reasons to have duplicate content on a website (for example a printer-friendly and mobile version of a desktop page – so 3 different versions of the same content).
Since it would be unfair to penalize those websites, the search engines have made a fairly new tag available to us to let them know which version is our “preferred” version.
In-depth information on how to use this tag can be found here:
Image courtesy of Stuart Miles/FreeDigitalPhotos.net
About the Author – This article has been written by Sameer Panjwani, founder of Directory Maximizer, a premier online marketing company offering services like manual directory submission, paid submissions, guest blogging, content writing, social bookmarking etc. To get links from relevant niche sites and rank high in SERPs, don’t forget to check out Directory Maximizer’s niche directory submission service today.