Click here for SEO Training Material
|SEO Training Hyderabad | SEO Institute Hyderabad|
Click here for SEM Training Material
|Search Engine Marketing Training|
What Is Robots.txt?
Introduction to Robots.txt
Click here for more info robots.txt
The robots.txt is a very simple text file that is placed on your root directory. An example
would bewww.yourdomain.com/robots.txt. This file tells search engine and other robots
which areas of your site they are allowed to visit and index.
The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct
robots (typically search engine robots) how to crawl and index pages on their website
The Robots Exclusion Protocol (REP) is a group of web standards that regulate web robot
behavior and search engine indexing. The REP consists of the following:
The original REP from 1994, extended 1997, defining crawler directives for robots.txt. Some
search engines support extensions like URI patterns (wild cards).
Its extension from 1996 defining indexer directives (REP tags) for use in the robots meta
element, also known as "robots meta tag." Meanwhile, search engines support additional
REP tags with an X-Robots-Tag. Webmasters can apply REP tags in the HTTP header of
non-HTML resources like PDF documents or images.
The Microformat rel-nofollow from 2005 defining how search engines should handle links
where the A Element's REL attribute contains the value "nofollow."
Structure of a Robots.txt File
The structure of a robots.txt is pretty simple (and barely flexible) – it is an endless list of
user agents and disallowed files and directories. Basically, the syntax is as follows:
“User-agent” are search engines' crawlers and disallow: lists the files and directories to be
excluded from indexing. In addition to “user-agent:” and “disallow:” entries, you can include
comment lines – just put the # sign at the beginning of the line:
# All user agents are disallowed to see the /temp directory.
The Traps of a Robots.txt File
When you start making complicated files – i.e. you decide to allow different user agents
access to different directories – problems can start, if you do not pay special attention to the
traps of a robots.txt file. Common mistakes include typos and contradicting directives.
Typos are misspelled user-agents, directories, missing colons after User-agent and
Disallow, etc. Typos can be tricky to find but in some cases validation tools help.
The more serious problem is with logical errors. For instance:
The above example is from a robots.txt that allows all agents to access everything on the
site except the /temp directory.