Although there is some debate to whether the search engines will penalise your site directly for duplicate content, in certain circumstances it can directly affect your websites rankings.
There are a vast amount of pages for the Search engine bots to crawl so to make things more efficient each bot arrives at a website with a “crawl budget”. If you have a large amount of duplicate pages on your site this budget can get used up before the entire site is crawled. As a result fewer pages may end up getting indexed. Incidentally this is another reason it’s worth keeping JAVA script in external files and using external CSS style sheets rather than embed the code on the actual page.
Another issue to consider is when a search engine comes across duplicate pages how does it decide which page to show in the rankings.
For example many companies choose to have both the .com and .co.uk versions of their domain names. Unfortunately this often results in duplicate content issues as the search engines end up seeing each domain as two separate, but identical, websites.
Identical Pages from both sites can get indexed but there is no control over which page ends up getting ranked. For instance the .com/products.html may have more links than the .co.uk/products (because the .com domain has more links) but the search engine chooses to show the lower ranked .co.uk version instead for a particular keyword. A lower ranking means less traffic.
Fortunately this scenario can be easily rectified by using an .htaccess file to 301 redirect the .co.uk domain to the .com domain. From then on the search engines will only see one version of the website.
Sometimes duplicate content on a website is impossible to avoid. For instance you may have a print version of a particular page as well as the web version. In this circumstance it is simply a matter of telling the search engine bots not to crawl the page by using nofollow links to the print version and a Meta noindex on the page itself.
Leave a Reply