|Having the same content on multiple pages can get you into trouble with the search engines.|
Duplicate content is a hot, if not over discussed, topic among the search engine marketing community. In a nutshell, if you have basically the same content on more than one page, the search engines will choose the one they think is the most important and rank it and only it. If you do too much of this, you could get some demerits from the search engines. If you really overdo it with mirror sites and the like, you could get banned or at least penalized.
Most web sites will have at least some duplication of content. For instance, an e-commerce site will probably use a single template for every product page. The trick is to give each product page a unique title and description tag as well as some unique content on the page in the form of descriptive text and images. As a rule, something as simple as this can keep you out of trouble.
Here are some other tips that might come in handy.
1. Use tools like Copyscape.com to find stolen content. If you are concerned that other sites are stealing or scraping your content to use on their own pages, use this free web site to find out. Simply paste your URL into the form and do a search. What you will get is a list of pages with text that is similar to yours. If you see anything blatant, contact the offender and ask for your content to be removed. If they won’t, report them to the search engines.
2. Use analytics software. Your web analytics can tell you what pages are converting. Put those into your sitemap and any that aren’t converting that could be considered duplicates can be excluded in your robots.txt file or by using the “No Index” meta tag on the individual pages. That tells the spiders which pages you want indexed and there’s less risk one of the non-converting pages will be considered by the engine as the most important and index it instead.
3. Choose one domain for branding. Don’t go overboard and put up a bunch of domains with similar content on them. Focus on one. If you’ve already got several domains up, consolidate them into one and 301 redirect the others to it.
4. Test domains should be invisible. It you are using a domain simply for testing new designs, functions, etc., be sure it is not accessible to spiders or users, who will both be confused about which domain is the real thing.
5. Choose www or no www. Most search engines can figure this out these days, but it is still wise to choose whether your site is http://www.yourdomain.com or http://yourdomain.com. In the past, these have been seen as two separate sites and could cause duplication problems. Best to decide on one and 301 redirect the other to it.
6. Don’t do server load balancing. You’ve probably noticed sites whose URL will be something like http://www1.domain.com and then maybe http://www2.domain.com the next time. That’s server load balancing. Problem is, the search engines will see www, www1, www2, etc. as duplicate copies of the site. That’s asking for duplicate content problems.
7. Use absolute URLs. An absolute URL is http://www.yourdomain.com/yourpage.html as opposed to making the link on your page a relative URL like yourpage.html. This is especially important if you use secure pages (https). Without using absolute URLs, you can go to an https page and then try to leave by way of a relative link. Problem is, you’re still in https without that http://www.yourdomain.com/yourpage.html absolute link and every page you go to after that will be https://www.yourdomain.com/page.html instead of http://www.yourdomain.com/page.html . Not only is it a pain for the user who will get secure page notification pop-ups, but the spiders will see all https as a duplicate site. Besides, https pages generally should NOT be indexed.
8. Session IDs can be a nightmare. Yes, this problem will take some advanced technical help. The problem with session IDs is that on these dynamically created (database created, for you newbies) pages you can have exactly the same content on a multitude (thousands or even hundreds of thousands) of pages with completely different session IDs. Dump the session ID info into a cookie for all users or identify the spiders and strip the session IDs for them only. For more info, see Google Webmaster Guidelines.
9. WordPress Canonical URL Plugin. If you are using WordPress (as I do), install this plugin to take care of duplicate pages that can occur after you do permalink customization. Basically, this redirects posts in default WordPress URLs to your new URL structure.
10. Top level domains in different countries should not be a problem. If you have basically the same site up in different countries using country-specific domains, you will probably not have duplicate content issues. It’s best to host the sites in the appropriate countries and customize the language and keywords for each, though.
Bonus tip: Want to see what Google has indexed from your domain (or any domain for that matter) during, say, the past seven days? Just point your brower to http://www.google.com/search?q=site:yourdomain.com&as_qdr=d7 . Simply change the “yourdomain.com” to your actual domain name and alter the “=d7” to be whatever number of days you are looking for (d5, d10, etc.). Or, change the “d” to “w” for weeks or “y” for years.
As always, these are just a few tips to help you avoid duplicate content issues. The suggestions on this page are by no means the only ways to deal with duplicate content.
This article is intended as a companion piece to S E O 101 and will be updated periodically.
I go into more detail in my S E O 101 workshop, offered to web site owners and small businesses. Check my blog at http://www.weboptimist.com for more information or contact me to set up a custom workshop for your business group of five or more people in the Palm Springs area of Southern California. Travel is possible for large groups.