Brief about Robots Exclusion Standard and Guidelines to Create Robots.txt File

Posted By
Robots.txt File
Recently one of my friends emailed me a query that he doesn’t know the concept of robots exclusion standard or robots exclusion protocol and also he wants information about how to create the robots.txt file. This is just one example but there are many SEO peoples who really don’t know about robot exclusion standard or robots.txt file.

So considering all queries about robots exclusion standard or robots.txt file; in this article Seogdk sort out the information about robots exclusion protocol and gives guidelines to create the robots.txt file.

First of all don’t be confused about the terms robots exclusion standard, robots exclusion protocol, and robots.txt file because these three terms are same. Basically, these are the guidelines to keep crawlers in line. The robots.txt file simply defined as the file that is used to tell robots and crawlers what not to crawl on your website. The robots.txt file is the actual component that you will work with. It is a text-based document that should be included in the root of your domain and it essentially contains directions to any crawler that comes to your website about what they are and are not allowed to index.

Every search engine has their own crawler with a specific name and if you want to see the name of crawler then just check your web server log; you will probably see that name. Below see the list of different search engines with their crawler names:

-Google – Googlebot
-Bing – Bingbot
-Yahoo Search – Yahoo!Slurp
-MSN - Msnbot
-Baidu – Baiduspider
-Yandex – Yandexbot
-Alexa – ia_archiver
-Ask – Teoma
-Searchsight – SearchSight
-AltaVista – Scooter
-Guruji – GurujiBot
-Goo – Ichiro
-LookSmart – FurlBot
-FyberSearch – FyberSpider
-SiteSell – SBIder

Guidelines to Create Robots.txt File

Robots Exclusion Protocol
1. To communicate with the crawler you need a particular syntax that it can understand. Below see the basic form of the syntax:

User-agent: *
Disallow: /

Above both lines are mandatory when you create the robots.txt file.

2. The first line User-agent:, tells a crawler what user agent you are commanding. The asterisk (*) denotes that all crawlers are covered but you can specify a single crawler or even multiple crawlers.

3. The second line Disallow:, tells the crawler what it is not allowed to access. The slash (/) denotes “all directories.” So in the previous code example, the robots.txt file is mainly saying that “all crawlers are to ignore all directories.”

4. When you creating robots.txt file always remember to include a colon (:) after the ‘User-agent’ indicator and after the ‘Disallow’ indicator. The colon denotes that important information follows to which the creator should pay attention.

5. If you want to all crawlers to ignore specific directories then you simply mention particular directory name as below:

User-agent: *
Disallow: /private/

As well as you can take one step further and tell all crawlers to ignore multiple directories as below:

User-agent: *
Disallow: /private/
Disallow: /public/
Disallow: /program/links.html

It means that the text tells the crawler to ignore private directories, public directories and program directories that contain links which are not accessed by the crawler.

6. One thing always keeps in mind about crawlers is that they read the robots.txt file from top to bottom and as soon as they find a guideline that applies to them then they stop reading and begin crawling your website. So be careful about to write when you are commanding multiple crawlers with your robots.txt file.

7. Below text format totally wrong to write robots.txt file:

User-agent: *
Disallow: /private/

User-agent: CrawlerName
Disallow: /private/
Disallow: /program/links.html

First, this text tells crawlers that all crawlers should ignore the ‘private’ directories. So every crawler reading that file will automatically ignore the ‘private’ files. But you have also told a particular crawler denoted by ‘CrawlerName’ to disallow both ‘private’ directories and ‘program’ directories which contain links. The problem is that the specified crawler will never get that message because it has already read that all crawlers should ignore the ‘private’ directories.

8. When you want to command multiple crawlers then you need to first begin by naming the crawlers you want to control. Only after they have been named should you leave your instructions for all crawlers. After written correctly the previous code should look like below:

User-agent: CrawlerName
Disallow: /private/
Disallow: /program/links.html

User-agent: *
Disallow: /private/

9. You view the robots.txt file for any website that has one by adding the robots.txt extension to the base URL of the website. For example, will display a page that shows you the text file guiding robots for that website.

10. If you use blank robots.txt file then crawlers automatically assume an empty file means you don’t want your website to be crawled. So using blank robots.txt file is the best way to keep you out of search engine results.


From above information, you can easily create a robots.txt file and if you have certain pages or links that you want the crawler to ignore then you can achieve this without causing the crawler to ignore a whole website. Additionally, you can find a complete list along with the text of the robots exclusion standard document on the Web Robots Pages. So friends convey your feedback about this article through your comments and emails till then enjoy your life.....!!!

Gangadhar Kulkarni
Gangadhar Kulkarni is an Internet Marketing Professional having extensive experience in digital marketing. He is also the founder of Seogdk and Director at DigiTechMantra Solutionsa one-stop shop for all that your website needs. It provides you cost-effective and efficient content writing and digital marketing services. For more information catch him on Facebook | Twitter | LinkedIn | G+ | Pinterest 


  1. To be honest, the Robot.txt file and its features is one of the practices I still find difficult to master. Its good to have the explanations shared in this post.

    At least, some good explanations have been proffered on the User-Agent and Disallow syntaxes.

    I have left this comment in - the content syndication and social marketing platform for Internet marketers, where this post was shared.

    Sunday - contributor

    1. Thanks Sunday,

      Actually the perception behind writing this article is that I realized many of us not aware about robots exclusion standard or robots.txt file so focused some key points which are important to explore this concept.