We are going to begin to clean up and revamp TimesToCome. We will first redo the robots.txt file to mark off directories and pages that we don’t want to see on the search engines.
To prevent pages on your site from being indexed by robots you can use the robots META TAG or you can include those pages in robots.txt
Put robots.txt in the root directory of your website. Include pages you only intend to use in iframes or other such pages that are not intended to be viewed and indexed separately.
robots.txt
User-agent: *
Disallow: /directory/
Disallow: /private.html
Disallow: /directory/donotinclude.html
Disallow: /cgi-bin/
You can specifically block off specific search engines or you can use ‘*’ to block off all search engines.
Disallow: tells the robots not to go to that file or directory. Not all robots are well behaved and will follow the robots.txt rules, but the main ones you know and love will do so.
/directory/ will block off a specific directory
/file.html will block off a specific file
And or you can include this in the head part of any documents you do not wish to be indexed.
< META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW” >
More information:
The Web Robots Pages
Google Webmaster Blog: Speaking the language of robots
0 responses so far ↓
There are no comments yet...Kick things off by filling out the form below.
You must log in to post a comment.