Can I Do This Myself?
Index Duplication Examples
There are many reasons why your website will create duplicate pages. It is important you go through each of the below to see if they apply to your site. The age and size of a website has a huge effect on index bloat, especially if you have not been maintaining your website regularly.
This can run out of control over time and most websites see a huge 76% off all indexed pages being duplicated. Having this amount of duplicated web pages will cause huge ranking issues.
If you have a blog and contribute to it regularly, its more than likely you have split your posts in to categories, which is the right thing to do. This helps users browse through all the posts that are relevant to what they are looking for.
The problem with blog categories is that they create there on sets of pages and URLs, which Google will then index.
You also have the same posts (in a different order) on the pagination of your blog. Eg, ‘View Page 2’.
Depending on the amount of blog post and categories your site has, your content keywords will be dramatically altered. If Google indexes say 600 of these pages and you only have say 300 landing pages and blog posts, the main content keyword Google will see your site being will be ‘Blog”.
Blog Post Archives
This one is the worst! With WordPress all your blog posts get categorised into date archives (yearly, monthly and daily archives). This creates loads of unwanted pages that Google will index, especially of you have been blogging for many years. The more you blog, the more damage you are doing to your index bloat!
If you have your website built in WordPress you will know a little bit about Tags. This also your to tag your pages and posts to help with internal searching. Tags work in the same way as categories (mentioned above).
Most people apply a lot of tags to pages and posts, mainly just throwing loads of keywords for tags as they believe it will help with SEO. It doesn’t! Every tag you create will create a new page. If you have a total of 1000 tags over your posts, this means you have over 1000 pointless web pages created diluting your content even more.
Personally i have never seen the point in using tags. Never used them, never will. Including tags into your posts is the biggest cause of index bloat. We spend many hours a day sorting out index bloat for clients with tags being the main cause.
Another huge problem that causes index bloat is eCommerce websites. The same with blog categories and tags, your products produce the same. Although categories are great for helping users find what they are looking for, they’re a pain in the arse when it comes to search engines!
How we advise our clients with shop categories is to build real landing pages that only display products within that category. This allows more freedom of content and design in opposed to the standard automatically created shop categories.
Applying these methods for your shop will achieve higher search engine rankings and reduce index bloat significantly.
Listing Order Types
If you have a large website with lost of blog posts, with loads of categories & tags, and also have a large shop… you are in the shit if you haven’t kept on top of it! Not only have all your categories and tags for both your blog and shop created additional web pages, there are even more variants of all those pages!
All category and tag pages also allow you to change the order of the items to be displayed:
- – Order by date ascending.
- – Order by date descending.
- – Order by price lowest.
- – Order by price highest.
- – Order by reviews.
These are the most common but there can be a lot more! lets say you have 10 categories. This is a very small amount, most eCommerce websites have 100’s. So…
10 Categories x Archive x Order by date ascending x Order by date descending x Order by price lowest x Order by price highest x Order by reviews = 70 TOTAL PAGES CREATED
If you have pagination on these category pages, eg, page 1, 2, 3 etc, these 70 pages will grow significantly. That’s 70 identical pages all causing duplication and content keyword problems for your rankings.
Dead Pages and Products
If the above hasn’t scared you, then this will! Every company will drop a product or service in its lifetime. If these products/services pages have just been deleted without being redirected correctly, the will probably still be indexed in google.
Not only will this cause unwanted index bloat but imagine if a user found that deleted page on Google, clicks on it and the page/product is missing? Doesn’t look very professional does it?
Steps To Fix Index Bloat
Before today you were probably completely unaware of the effects that index bloat can cause. You may of thought that your websites SEO is far more important than your website maintenance. The fact is, all the link building and quality content in the world will not get you ranked if you haven’t managed your website correctly.
Not to worry though! The below steps can help you fix Google index bloat. If you have a very large website and do not have time to go through the below steps please take a look at our website maintenance services to help you get back on track.
Honestly, i never used to be a fan of Yoast until recently. With more features available and other SEO plugins falling behind, i believe that Yoast is the best SEO plugin available for WordPress.
You can easily change your meta robots settings to fix most of your index bloat problems.
Once installed go to SEO > TITLES AND METAS in your WordPress Dashboard.
Once there select TAXONOMIES
You will be able to see multiple sections with all of them having an option under META ROBOTS (index and noindex). Make sure you have the below settings. Depending on your theme you might see additional options such as, Portfolio, Testimonials, Team Members etc. Apply the same settings to these sections as well.
Now click on ARCHIVES on the top menu. Make sure you apply the below settings.
Finally, click the last tab marked OTHER and apply the below settings.
If you apply all the above settings this is the first step to solve all your category, archive and tag indexing problems.
Meta Robot Testing
The next step is to test these web pages to ensure that the Google bots cannot crawl and index these pages. Go to SEO Review Tools and input an archive URL that we have just blocked. If all is working correctly you will receive the below.
It is important you see noindex under the meta robots column. This will tell Google not to index this page when they come to crawl it, which is what we want.
Repeat these step with a few other category and tag page to ensure that everything we want to mark as noindex is working properly.
Create A List of URLs
Now here comes the really time consuming part! We now have to go through all of our pages that are indexed in Google and create a list of all the ones we don’t want in to TWO excel sheet. Obviously we don’t want the archives, categories, tags, dead pages and dead products indexing.
Search Google like below (site:yourdomainhere.com), but replace our domain name with yours. You don’t need to include http:// or www.
Currently for our site we have around 536 web pages indexed by Google, which is about right. You will probably see a lot more. Go through result listed on every page, as far as you can go, and copy the dead URLs in to one spreadsheet and all the categories, archives, tags etc pages into another spreadsheet..
Remember to save the spreadsheets every couple of minutes otherwise you will have to start from scratch!
Now you have spent what seems like half of your life creating these spreadsheets, its time to put it to use!
Get your list of dead URLs. What we need to do with this is to redirect them to an active page. Instead of just removing them from the search results altogether, its best redirecting the dead pages to an active page. You never know what social shares or backlinks that you may have pointing to these dead pages! Also the Google Search Console step wont work if this hasn’t been done.
You have 2 options to perform 301 page redirects. 1) redirect at server level, or 2) use a redirect plugin (if using WordPress).
Personally i use a redirect plugin called Eggplant 301 Redirects which allows you to upload .csv files, like the ones we just created.
Before we upload we need to prep our spreadsheet. we need 3 columns. 1st column should have 301 listed against all URLs. 2nd column needs to contain the URL of the dead page and the 3rd column needs to be the redirect URL. Look at the below, then save as a .csv. There is a demo file you can download if you need help.
Once you have got your .csv to time to upload these redirects. Go to SETTINGS > EPS REDIRECTS then in the top sub menu, go to IMPORT / EXPORT. Here you can upload your redirect CSV.
Click on REDIRECTS on the top sub menu to make sure you haven’t redirected any main pages by mistake.
Google Search Console
Now its time to tell Google we don’t want these pages indexed in there results anymore. Login to your Google Search Console (formally Webmasters Tools), if you haven’t got a Google Search Console Account you can set one up here, setup tutorial here if needed.
Make sure you have your website verified.
In your Google Search Console click on your website and go to GOOGLE INDEX > REMOVE URLS on the left menu.
Open up both spreadsheets containing the dead URLs and all the category, tags, archives etc URLs. You now have to submit each URL one at a time on your lists!
Click the TEMPORARILY HIDE button then click SUBMIT REQUEST button to request a URL is removed from Google’s index.
Google are pretty quick when it comes to URL removals and they will be removed within usually 24 hours. You can check back the next day to view the status, if it says removed then its gone.
After 24-48 hours, perform the site:yourdomainhere.com search again in Google and see how many results you have. This should of dropped significantly. Go through the pages again and list any new URLs still indexed. Make sure they have a 301 redirect if they are dead pages otherwise they will keep coming back.
If you are using Yoast you can have sitemaps created for you automatically. In your WordPress Dashboard go to SEO > XML SITEMAPS and follow the steps.
If you are not using Yoast you can have a sitemap created for you using XML Sitemaps.
Once you have your sitemaps ready login to your Google Search Console and click on your website, Go to CRAWL > SITEMAPS on the left menu.
If you are using Yoast, you can go to the XML SITEMAP sub page and click the xml sitemap button. Here you can copy and paste the file destinations for each sitemap. There should be at least 2 links. Right click over each link and copy the link locations and save them in a safe place.
If you are using a sitemap generator, you need to upload the sitemap.xml to your server. make note of the URL path (eg, sixtymarketing.com/sitemap.xml)
In your Google Search Console click the red ADD / TEST SITEMAP button and paste in the last part of the URLs you just saved. Click SUBMIT.
This will now tell Google what pages you only want indexing in the search results.
Regular Website Maintenance & Monitoring
All the above steps need to be managed on a regular basis and should be a huge factor of your website maintenance. If you don’t, you will soon be back to the same situation when it comes to index bloat.
After a short period of time you will see an improvement of your search engine rankings.
It is important you never just focus on content and SEO… managing your website efficiently is probably more important… as you can now see!