It’s easy to think that both the noindex Meta tag and the robots.txt disallow command appear to do the same thing: stop a page being listed on search engines. But there’s more to it than that.
Robots.txt, disallow:
- A search engine bot will recognise that the page exists but not index the content, therefore the page URL may still display in search results, but with no other content displayed.
- A search engine bot will not crawl the page, therefore will not find any content that you might want it to eg other Meta tags, or links out of the website.
- Page can still accrue PageRank.
Format:
User-agent: *
Disallow: /page.php
Noindex Meta tag:
- A search engine bot will not index the page and the page will not show up in the search engine results.
- A search engine bot still crawls page and find content.
- Page can still accrue PageRank.
Format:
<meta name=”robots” content=”noindex, follow”> = the page will not be in search results but Google can still follow the outbound links in the page.
<meta name=”robots” content=”noindex, nofollow”> = the page will not be in search results and Google will not follow the outbound links in the page.
<meta name=”robots” content=”none”> = the page will be ignored ie the same as <meta name=”robots” content=”noindex, nofollow”>
Author: Ashley Bryan
Webstrategies
Subscribe to our RSS feed
Receive Webstrategies blog posts by Email
www.webstrategies.co.nz





Download the Webstrategies Service List
I’m currently caught in this very dilemma. In fact I just did a Google search for “meta tag noindex google” and your article popped up. I’m about to add about 10,000 pages to my site, but they will all be stock pages. However, once my customers input data into their control panels, these pages will be full of original content. My worry is- I don’t want Google to know about the pages UNTIL customers enter data, yet I have to create them all in advance (its done dynamically). It is easy for me to just programmatically insert a meta tag for “noindex” on any of the pages that do not contain data, however, I am worried if that may not be enough? The last thing I want is our already excellent search rankings to be hurt. Is a Robotx.txt file superior to the Meta tag? I’m still not clear
Thanks for the question.
The Robots disallow will still allow the URL to display in the SERPS, although there wil be no page content dispayed in the SERPS. As the URL is available people can still go the URL and see the pages. So in this scenario Google will not consider the page content, so you should not have a diplicate content issue, however people can see the URL and follow it.
The meta noindex tag will keep everything out of Google including the URL thus making it a bit more hidden from humans as well (but not fully of course!)