Subscribe in a reader


Tricks For Avoiding Duplicate Content Penalties

Having quite a bit of experience handling duplicate content problems in A Little Google DeIndexing Puts Things In Perspective, An Accident Discovers The Cause Of My Google Deindexing, Site Pages Reindexed, and from running WM Media, here’s a few tips for bloggers (and other website owners) to avoid Google from assigning a duplicate content penalties to their blogs.

www vs non www - If your website is accessible via both www.domain.com and domain.com, then Google will end up indexing both at some point. To fix it, you should redirect everything (using a 301 redirect) to www.domain.com. Just modify the .htaccess file of your public_html folder and use the following code:

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com [NC]
RewriteRule ^(.*) http://www.yourdomain.com/$1 [L,R=301]

slash vs no slash - If any of your website addresses is accessible with a trailing slash or with no trailing slash (for example, www.whatithinkabout.com/do-you-get-money and www.whatithinkabout.com/do-you-get-money/), Google also treats those as two separate webpages. Again, you’ll have to redirect no slashes to slashes. For wordpress, simply use the redirect plugin.

index.php vs no index.php - Another way the same page can be indexed in Google is if your webpage is available via the index file also. For example, www.whatithinkabout.com and www.whatithinkabout.com/index.php. In this case, you should redirect index.php to www.whatithinkabout.com via the following snippet in your .htaccess file:

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} index.html
RewriteRule .* http://www.yourdomain.com/? [R=301,L]

Categories & Archives - Make sure you add a noindex tag to these. Since these pages are all snippets of your other pages, it’s all just duplicate content organized differently. Therefore, you don’t want Google to index these pages and possibly devalue your article page(s)! Here’s another handy plugin for this: No index plugin

Search Box - The search box is another dangerous place where duplicate content may be indexed. For example, www.whatithinakbout.com/?s=development would be indexed if someone referenced this search result from another blog. Therefore, it is recommended that you disable search all together (by redirecting anything with parameters) and just putting in a Google search box (you make more money from that anyway!)

Oh, so to disable it, you can probably just redirect everything with parameters (since you should have permalinks for your posts anyway):

RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} .
RewriteCond %{REQUEST_URI} !^/wp
RewriteRule .* http://www.yourdomain.com/? [R=301,L]

Title And Description Tags - If all your pages have the same title and description, then you’re making Google work that much harder to decipher what your pages are about. It would be wise to add a different description tag to every one of your pages, just in case the content seems similar. Besides, search engine visitors need to find your description compelling to click through to your blog! If you don’t have one, Google may not be so great at picking one for you! To do this, just install the description plugin!

Why Avoid Duplicate Content?

There are three good reasons to avoid duplicate content.

1. The ranking power of your page is decreased if the pages are spread apart. For example, incoming links to www.whatithinkabout.com and www.whatithinkabout.com/index.php could have been pointing to the same page. That means neither page ranks as highly on Google as just one would.

2. Possible duplicate content penalties. Since Google’s algorithm isn’t completely open and they need to prevent spam, it’s logical to assume that they may assign some sort of penalty to your website if there’s too much duplicate content. Imagine if tons of your blog’s searche pages got indexed (such as www.whatithinkabout.com/?s=blog), in which case you may have 1000s of pages of duplicate content!

3. The number of indexed pages is capped. Google has some limits on how many pages of a website can appear in their index. For example, site:www.google.com only shows 30 million pages or so indexed, whereas it probably has trillions of pages. Therefore, if you have 10 duplicate pages indexed, then that might just crowd out that all those important page you spent hours writing!

The Wrath Of Duplicate Content

That’s about it for duplicate content. It might not seem like a big deal, but it actually makes the difference between your blog being virtually unnoticed in search engines vs. your blog being one of the best ranked ones out there!

Just think about all the points above! Let’s say you don’t do any of them. Then google may index all of these URLs:
http://www.whatithinkabout.com
http://www.whatithinakbout.com/
http://www.whatithinakbout.com/index.php
http://www.whatithinakbout.com/index.php/
http://whatithinkabout.com
http://whatithinakbout.com/
http://whatithinakbout.com/index.php
http://whatithinakbout.com/index.php/

Additionally, let’s say you have one category and 5 pages of content. You’ll also have these pages indexed:
http://www.whatithinkabout.com/category
http://www.whatithinkabout.com/category/
http://www.whatithinkabout.com/post-1
http://www.whatithinkabout.com/post-1/
http://www.whatithinkabout.com/post-2
http://www.whatithinkabout.com/post-2/
http://www.whatithinkabout.com/post-3
http://www.whatithinkabout.com/post-3/
http://www.whatithinkabout.com/post-4
http://www.whatithinkabout.com/post-4/
http://www.whatithinkabout.com/post-5
http://www.whatithinkabout.com/post-5/
http://www.whatithinkabout.com/2008/02/post-1
Plus other archive links

Plus the non www versions of these

Not to mention if someone links to www.whatithinkabout.com/?s=post-1 in a search, that’s another 10 or 20 links.

Yet, how much content do you actually have here? Just 5 posts! What if Google indexes say only 90% of your website? With this setup, it’s possible that Google indexes all the duplicate content pages and none of your real content pages get indexed! Compare that to five pages ranking really well!

It seems like such a small thing, but it’s such a huge difference!

If you feel that this post has been of value to you, please leave a donation to show your appreciation and allow me to bring this value to other people as well!

Ask a question or discuss this post in the personal development forum.

Email This Post Email This Post


Related Posts


How Much Is Your Life Worth?
How To Start A Successful Blog
The Guy Who Keeps Hitting Himself
An Accident Discovers The Cause Of My Google DeIndexing
An Eye Experiment Begins
My Google Ranking For The Keyword “Blockbuster”
How To Find A Good Investment
Taking Risks
Why You Should Be Careful When Your Business Is Going Well
What Is The P/E Ratio And What The Price Earnings Ratio Means

Free Personal Development Email Updates

Not sure when the next article will appear?
Why not subscribe to email updates and get articles delivered to you instead?

Enter your email address:

Comments

2 Responses to “How To Avoid Google’s Duplicate Content Penalty (For Bloggers)”

  1. Horacio on March 10th, 2008 12:36 pm

    Hi An INTJ,

    Just a question. How much should I change the content so as not to be considered duplicate. Would a 30% change be enough?

    Thanks,

    Horacio

  2. Warren on March 10th, 2008 12:49 pm

    Hey Horacio,

    Yeah, probably. Although, best to check with Copyscape.

    Thanks,
    Warren

Leave a Reply