Below I will explain how to remove some of your pages from Google’s index. The process is very simple itself, but you should pay close attention to small details to get the exact results you need.
What’s the purpose of having a page removed?
The main reason to remove pages from Google’s index is to avoid having duplicate low-quality content. This is especially important for sites that have hundreds of pages.
Take a critic look at all your pages within your site and identify those that you don’t want indexed by Google. Some typical candidates to be permanently removed are:
Keep in mind that a standard or humble site with no more than 100 pages will usually be better ranked than a huge site with 1000s of redundant pages. Because better ranking translates to more traffic, it is important to know exactly what pages you want to indexed!
Scan for low-quality content within your website
This process is pretty simple, just use Google’s ‘site:’ command. For example, site:
The number of results is the number of pages that have been indexed by Google. With this number in mind, ask yourself the following questions:
Is that number almost equal to the entries you have posted? If so, well done! Is that number about twice or three times the entries you have posted? You should find those lower-quality pages that are being indexed.
Is that number at least 10 times the entries you have posted? Something needs to be done immediately!
If you fit it one of the last two categories, you might want to reschedule your plans because you’re going to need some time to revamp your site. It will worth it, though!
Ok, now that you have identified your redundant indexed pages, it’s time to get rid of them for good!
Should I start by blocking the pages in robots.txt?
Short answer: No
At some time in this process you will come across with this step, but by all means avoid starting with it.
When you block with robots.txt file you are not telling Google to remove an indexed page, you are just saying that you don’t want that page crawled again.
A common result of blocking with robots.txt is having a very old page showing and collecting dust for eternity. You definitely don’t want that!
Block search indexing with meta tags
Ok, so the first thing that you should do is go to the robots meta tag and specifically instruct the algorithm not to index that page. At the same time, you might want it to follow the links of that same page. How do you do that?
Fortunately, this is a rather simple step; just add this tag in the <head> section of your page:
<meta name=”robots” content=”noindex”>
Add this tag to some of the pages you want removed and then perform a search with Google. If done correctly, those pages should display an error message. It will be the first time that “The requested URL was not found on this server” message makes you happy.
However, this tag does not necessarily means that your page has been deleted from the index right away. You can read more on how to block access to your content here.
Time to go to the next step!
Speed up the process: Use webmaster tools
Speed is king when it comes to search results. If you find out that your pages still show in the search results even after you added the meta tag, it’s because Google’s spiders have not recrawled your site.
If you want to delete a certain page from the index, enter the URL of that page under the Google Index -> Remove URLs menu, then select ‘Remove page from search results and cache’.
This process is also extremely useful when it comes to removing entire directories. In order to do so, follow these steps:
Now you should have all those redundant pages or directories removed from Google’s index. However, to ensure that they won’t be reindexed again, there’s one final step to do.
Make sure the removed pages don’t come back
Now is the right time to block pages in the robots.txt.
Before you perform this step, make sure that you successfully removed the pages you want. Otherwise, you will face problems like the one I described on the first step. Use the ‘site:’ command to confirm that those pages are gone from the index.
Go to your robots.txt file and add this rule:
Disallow: /(directory you want uncrawled)
Make sure everything is working as it should by checking blocked URLs with Google webmaster tools: Crawl -> Blocked URLs
Congratulations! You have removed your pages from the index
The steps I mentioned are all you need in order to remove those redundant or uninteresting pages from the index. It is not that complicated, right?
If you have any questions or if you know about another way to perform this action, I’d be more than happy to read about it on the comments. Don’t forget to subscribe if you want to receive up-to-date information on how to improve your site’s ranking.