Creating a Robots.txt file

A robots.txt file, What is this ? what does it do ?

This is simply a file that you place on your root folder of your web server and on this file, it will instruct search engine spider crawling your site, which folder to crawl for information and which folders to avoid. As simple as that. 

What are the benefits of having a robots.txt file?

As all major search engine spider looks for the robots.txt file when they crawl your site, it’s like opening your doors to welcome them. So make your robots.txt file and open the door to the spiders.

Another advantages of a robots.txt file is that you can restrict the spiders to avoid particular folder, especially your folders for your development site, and most probably your cgi-bin folders, and any site which you do not want the spiders to crawl, but do bear in mind that this is not a security file, if you have any important or secret folders inside your web server, you still need to have other settings in place to secure it.

How to create the robots.txt file?

Well first of all you need your notepad and then remember these few commands

User-Agent :
Disallow: /folders/
Allow:/folders/

User-Agent, refers to the search engine spider which you want to give instruction to.  For a full list, you can refer to this site which have the whole list, for example, google’s spider is googlebot and Lycos spider is Lycos_Spider_(T-Rex)

Disallow, is simply, do not crawl this folder
Allow, is the opposite of disallow

So take a look at your folders of your site or blog and determine which folders you want to spiders to crawl and which you don’t want.

For example to deny access to all spiders, your robots.txt file will look as follows.

User-agent : *
Disallow: /

For example to allow access to all spiders, your robots.txt file will look as follows.

User-agent : *
Allow:/

For example to deny access to  google spiders, your robots.txt file will look as follows. This will stop google spiders from crawling your site.

User-agent : googlebot
Disallow : /

For example to allow access to a particular inside a disallow folder for all spiders, your robots.txt file will look as follows, but do bear in mind that if you are doing it this way, always write the Allow command first

User-agent : *
Allow: /folder/file.html
Disallow: /folder/

Sitemap inside robots.txt file

Most of the spiders will be able to pickup the sitemap command inside the robots.txt file, therefore to notify the spiders of your sitemap file just type the following command inside your robots.txt file after all the Allow and Disallow command.

Sitemap: http://yoursite.com/sitemap.xml

For sitemap, google still recommends that you do a manual submission through the google webmaster tools first and then have it inside you robots.txt file.

If you do need any help or have any suggestions, please feel free to leave a comment and let me know.

Be Sociable, Share!

Leave A Comment...

*