Robots Text File - robots.txt



The robots.txt file is a set of instructions for visiting robots (spiders) that index the content of your web site pages. For those spiders that obey the file, it provides a map for what they can, and cannot index. The file must reside in the root directory of your web. The URL path (web address) of your robots.txt file should look like this...

/robots.txt

The Robots text file open in Notepad might look like this:

screen shot robots.txt file This is a screen shot of an empty robots.txt file Example of robots.txt File
Screen Shot - Robots Text File

Definition of the above robots.txt file:

User-agent: *
The asterisk (*) or wildcard represents a special value and means any robot.

Disallow:
The Disallow: line without a / (forward slash) tells the robots that they can index the entire site.

Any empty value, indicates that all URLs can be retrieved. At least one Disallow field needs to be present in a record without the / (forward slash) as shown above.

The presence of an empty "/robots.txt" file has no explicit associated semantics, it will be treated as if it was not present, i.e. all robots will consider themselves welcome.

The Disallow: line without the trailing slash (/) tells all robots to index everything. If you have a line that looks like this:

Disallow: /private/

It tells the robot that it cannot index the contents of that /private/ directory.

Summarizing the Robots Exclusion Protocol - robots.txt file

To allow all robots complete access:

User-agent: *
Disallow:

screen shot robots.txt fileThis is a screen shot of an empty robots.txt file Example of robots.txt File
Screen Shot - Robots Text File

To exclude all robots from the server:

User-agent: *
Disallow: /

To exclude all robots from parts of a server:

User-agent: *
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/

To exclude a single robot from the server:

User-agent: Named Bot
Disallow: /

To exclude a single robot from parts of a server:

User-agent: Named Bot
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/

Note: The asterisk (*) or wildcard in the User-agent field is a special value meaning "any Robot" and therefore is the only one needed until you fully understand how to set up different User-agents.

If you want to Disallow: a particular file within the directory, your Disallow: line might look like this one:

Disallow: /private/top-secret-stuff.htm

Keep in mind that using the above example excludes that specified page (top-secret-stuff.htm) but will not exclude the entire /private/ directory.

You should validate your robots.txt file. Enter the full URI to the robots.txt file on your server. The robots.txt file always resides at the root level of your web.

Full URI:

Example of robots.txt - Whitelisting Method

The SEO Consultants Directory currently uses the whitelisting method via the robots.txt file. We block all bots with the exception of those listed in the User-agent listing.

# robots.txt for http://www.SEOConsultants.com/
# Last modified: 2008-12-22T07:50:00-0800

User-agent: googlebot
User-agent: slurp
User-agent: msnbot
User-agent: teoma
User-agent: WDG_SiteValidator
Disallow: /js/
Disallow: /webservices/

User-agent: Mediapartners-Google*
Disallow:

User-agent: *
Disallow: /

robots.txt References

Here are a few good online references for information on the Robots Exclusion Protocol.

  1. WebmasterWorld - robots.txt Validation
  2. WebmasterWorld - robots.txt Forum
  3. The Web Robots Pages - Robots Exclusion Protocol
  4. Three Easy Ways to Reduce robots.txt Code Bloat
  5. The Web Robots Pages - Database of Web Robots

 


SEO Consultants Directory