Management of indexation


That that your site proindeksirovalsja is still half-affairs, it is much more important to learn to operate indexation competently. Think, what you would want to see pages of your site in delivery of search engines: what from them will be useful to the user and what from them do not bear{carry} any semantic loading and are used exclusively as the technical information, for example. It is desirable to close from indexation administrative section of a site, a directory/images/(if she is named thus) where the graphic information is stored{kept}. Owners the Internet - maagazinov should close service villages, for example, those pages of a site through which original purchase of this or that product, etc. is carried out. Taken the given measures, first, you will be sure that robots proindeksirujut that information which actually is important, second, facilitate to the robot to robots which will not visit{attend} all pages of a site.


1. Management of indexation by means of a file robots.txt

The file robots.txt is the most popular the tool by means of which you can effectively operate indexation of your site. It is extremely simple in operation, does not demand special skills. By and large, it is necessary to forbid only indexation of pages or sections of a site for this or that search engine.


2. The basic information on a file robots.txt

The file /robots.txt is intended for the instruction{indication} to all search robots how to index information the server.

Syntax of a file allows to set forbidden areas of indexing, both for everything, and for certain , robots.

To a file special requirements robots.txt are showed, not which performance can lead to to wrong reading of the information by the robot of the search engine or in general to incapacity of the given file.

The basic requirements:

• all letters in the name of a file should be capital, i.e. should have the bottom register: robots.txt - it is correct, Robots.txt or ROBOTS.TXT - it is not correct;

• the file robots.txt should be created in a text format. At copying the given file on a site, the ftp-client should be adjusted to a text mode of an exchange by files;

• the file robots.txt should be placed in the root of a site.


2.1. Contents of a file robots.txt

The file robots.txt necessarily includes two directives: "User-agent" and "Disallow". Some search engines support also additional recordings. So, for example, the search engine the Yandex uses "Host directive for definition of the basic mirror of a site.

Each recording has the applicability and can meet some times, depending on quantity{amount} of pages closed from indexation or (i) directories and quantity{amount} of robots to which you address.

Completely the empty file robots.txt is equivalent to his  absence that assumes the sanction to indexing of all site.

"User-agent directive

Recording "User-agent" should contain the name of the search robot. The example of recording "User-agent" where the reference{manipulation} occurs to all search engines bar none and is used a symbol "*":

User-agent: *

Example of recording "User-agent" where the reference{manipulation} occurs only to the robot of the search engine a Yandex:

User-agent: Yandex

The robot of each search engine has the name. There are two basic ways to learn{find out} these names:

1. On sites of many search engines there is a specialized section « the help to the web designer » (on a Yandex he too is http://webmaster.yandex.ru/faq.xml <http: // www.internet-technologies.ru/? url=http%253A%252F%252Fwebmaster.yandex.ru%252Ffaq.xml>) in which names of search robots often are specified.

2. At viewing dens of the web - server, namely at viewing references{manipulations} to a file robots.txt, it is possible to see set of names at which there are names of search engines or their part. Therefore you need to choose only the necessary name and to enter it  in a file robots.txt.

Names of the basic robots of popular search engines:

Google - "googlebot";

Yandex - "Yandex";

Rambler - "StackRambler";

Yahoo! - « Yahoo! Slurp »;

MSN - "msnbot".

"Disallow directive

"Disallow directive should contain instructions which specify to the search robot from recording "User-agent", what files or (i) catalogues to index it is forbidden.

Let's consider various examples of recording "Disallow".

Primer1. The site is completely open for indexing:

Disallow:/

Example 2. For indexing the file "page.htm", taking place in the root and a file "page2.htm", settling down in a directory "dir" is forbidden:

Disallow:/page.htm

Disallow:/dir/page2.htm

Example 3. For indexing directories "cgi-bin" and "forum" and, hence, all contents of the given directory are forbidden:

Disallow:/cgi-bin/

Disallow:/forum/

Closing from indexation of some documents and (or) the directories beginning with the same symbols is possible, using only one recording "Disallow". For this purpose it is necessary to register initial identical symbols without closing inclined feature.

Example 4. For indexing files are forbidden by a directory "dir", and as all files and the directories, beginning letters "dir", i.e.: "dir.htm", "direct.htm", directories: "dir", "directory1", "directory2" and t. d:

Disallow:/dir

Some search engines resolve use of regular expressions in recording "Disallow". So, for example, search engine Google supports in recording "Disallow" symbols "*" (means any sequence of symbols) and "$" (the termination{ending} of a line). It allows to forbid indexing of the certain type of files.

Example 5. An interdiction of indexation of files with expansion "htm":

Disallow: *.htm $

"Host directive

"Host directive is necessary for definition of the basic mirror of a site, that is if a site has a mirror with the help of "Host directive it can be chosen url that site under which proindeksiruetsja your site. Otherwise the search engine will choose the main mirror independently, and other names will be forbidden to indexation.

With a view of compatibility with search robots which at processing a file robots.txt do not perceive Host directive, it is necessary to add her  directly after recordings Disallow.

Example 6. www.site.ru - the basic mirror:

Host: www.site.ru

Registration of comments in a file robots.txt

Any line in robots.txt, beginning with a symbol "*", is considered the comment. It is authorized to use comments at the end of lines with directives, but some robots can incorrectly distinguish the given line.

Example 7. The comment is on one line together with the directive:

Disallow:/cgi-bin/*¬«????????®

It is desirable to place the comment on a separate line.


2.2 Management of indexation with the help meta-tegov

With the help meta-tegov too it is possible to operate indexation of pages of a site. Meta - tegi should be in heading of the HTML-document (between tegami i).