WordPress and Duplicate Content -how to channel traffic to the right URLs

This is mainly about Google indexing, but some good general SEO too. If you’ve read anything about duplicate content penalties with regard to the WordPress index type of pages (archives, home page, category index,  etc), you’ve probably read something about using noindex noarchive and nofollow rules in the header.php file of your WP theme. This article will cover a reminder of the simplest few lines of header code to add noindex tags too all but the full content posts/pages, and the home page, but we’ll also go into how to use Google Webmaster Tools to make some really simple, effective tweaks to the way Google sees and ranks your site.

The key thing to remember is that writing good content is the number one thing to building a following and traffic. If you manipulate the search engines to get traffic, it may boost popularity for a short period of time, but in the end the only thing that’s going to keep people interested is solid content. This can’t be overemphasized. The goal of search engines like Google is to become as human as possible in the way it determines usefulness of a site. From the SEO / webmaster side of things, this also means that as time goes on there are fewer tweaks that need to be done -that now are not as important to do because search engines can figure out a lot of stuff they couldn’t before.

With this in mind, I’ve found that a solid ‘duplicate content solution’ can be as simple as 1) take a few simple steps to ‘helping Google’ know how to sort their index of your site, and 2) use Google Webmaster Tools to simplify the rest of it -concentrating the ‘rank’ of each content page into a distinct URL.

noarchive, noindex, nofollow tags

The goal of using noarchive tags is to concentrate all ranking of each page into the full content post or page URLs and not have it dispersed among various archives and other index pages. With WordPress, adding the few lines below to the <head> section of your header.php file tells search engines to only index/archive single posts, static pages, and the front page:

<?php if(is_single() || is_page() || is_home()) { ?>
<meta name=”googlebot” content=”index,follow” />
<meta name=”robots” content=”index,follow” />
<meta name=”msnbot” content=”index,follow” />
<?php } else { ?>
<meta name=”googlebot” content=”noindex,follow” />
<meta name=”robots” content=”noindex,follow” />
<meta name=”msnbot” content=”noindex,follow” />
<?php }?>

www versus non-www, .htaccess and WordPress sites

For the most part Google and other SEs figure out a lot of this, but I was surprised recently when I went to check on some newer sites. Though I had always entered the non-www version of site URLS, still searches for the site title came back with some www URLs. It’s not really in the search engines’ interests to automatically equate www.yoursite.com with yoursite.com because not all sites have the same content at the two URLs. So I decided it’s worth taking a minute to add a few lines to the .htaccess file of my WordPress sites.  If you want to help Google decide which format to to use for your site, you can use Google Webmaster Tools to simply input which version you want Google to associate with your site. If you want to go a step further you can add a rule to the .htaccess of the root of your site.

The Google Webmaster step is described below, but first some code for placing into the .htaccess file of your site root. By choosing between the www and non-www versions of your blog, you’re encouraging all incoming traffic (including search engines) to use one root URL. Be sure to download the .htaccess from your site before adding these lines because WordPress creates an .htaccess file and makes alterations to it (you may have to select ‘force show hidden files’ in your FTP program to see it). This code should go below the WordPress stuff, so that WordPress can still edit its portion of the file.

Change yoursite.com to your site’s URL

Removing www

RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.yoursite.com$ [NC]
RewriteRule ^(.*)$ http://yoursite.com/$1 [R=301,L]

Always use www

RewriteEngine on
Options FollowSymlinks
rewritecond %{http_host} ^yoursite.com [nc]
rewriterule ^(.*)$ http://www.yoursite.com/$1 [r=301,nc]

Google Webmaster Tools and SEO

Though many people use Google Analytics to check out traffic stats and stuff, it seems like I run into a lot of people who aren’t using the Webmaster Tools section of their Google account. When logged into google, click ‘my account’ and then find Webmaster Tools, or go to Google Webmaster Tools.

The two main areas to access on a quick trip are the Sitemaps and Settings sections for each domain that you have registered. There are some other useful stats that can help you see patterns and diagnose traffic patterns to see if you have the time, but as far as basic SEO tune-up, the following steps will help you make the most of Google Webmaster Tools for your site. note: Tools > Remove URLs is great for removing URLs from the Google index.

Google Webmasters

adding a site

First you have to add your site on the main dashboard area. After you submit your site’s URL, there is a verify link that will take you through a site ownership verification process. You can choose  ‘add a meta tag’ or ‘upload HTML file’ to verify your site. If you use the add meta tag option just copy the code and paste it into the header.php file of your theme using the wp-admin theme editor (Appearance/Editor). Or you can do the HTML version by creating a new plain text file and saving it as the exact file name Google gives you (no .txt on the end), then upload to your site’s root using FTP.

Google Webmaster Tools > Settings to set your preferred domain

From the main Webmaster page, click on one of your domain names. Then click on Settings. Choose the format (www or non-www) that is your preferred domain.

Google Webmaster Tools > Sitemaps

From the main Webmaster page, click on one of your domain names to see the Sitemaps, Settings and other sections. Click on Sitemaps and follow the instructions to add your sitemap. Wait a couple of days to come back to the Sitemaps section to see how Google handles crawling your XML sitemap.

Don’t have a Sitemap?

If you don’t have a XML sitemap use a plugin like Google XML Sitemaps. The instructions are standard plugin installation to the wp-content/plugins/ directory of your site. Then be sure to visit the main plugin page in your wp-admin panel to build the sitemap for the first time.

There are some other settings you can configure, but for the most part those are for if you run into trouble; the default settings are good for nearly all installations. If you ever start having trouble because you have so many posts that your sitemap times out, use the plugin settings to define a longer timeout period, and limit the number of articles that are included in the sitemap to less than 5,000 pages. I would go even lower than that (500) though because the main purpose of XML sitemaps and connecting it to Google Webmaster is to quickly update Google  (and other search engines who look at XML sitemaps) of your new articles, an XML sitemap does not need to have every URL of your site. If you do not include every URL of your site in an XML sitemap it will not have a negative impact. Google will keep track of your overall site through its usual indexing, but the Sitemaps section of Google Webmaster Tools helps conect your sitemap to Google so that you are instantly updating Google whenever you publish a new article.

Other Google Webmaster Tools

There are several different sections of Google Webmaster Tools that can show you statistics about your site and traffic, so you might want to look around a bit. One area very worth mentioning is Tools > Remove URLs. If there are URLs in Google’s index that you don’t want there, this is the easiest way to get them removed.

If you would like to have a duplicate content tune-up done to your site feel free to contact us for more information..

Erin Bruce