|
|
||
SEO and Indexes - Getting indexedThe three leading search engines, Microsoft, Google and Yahoo!, use crawlers for finding pages for their algorithmic search results. Pages linked from other search engine indexed pages are found automatically so do not have to be submitted. Some search engines, notably Yahoo!, use a paid submission service for either a set fee or cost per click that guarantee crawling. These programs normally guarantee inclusion in the database, but are no guarantee of a specific ranking position within the search results. Yahoo's paid inclusion program has attracted criticism from both competitors and advertisers. Two major directories, the Open Directory Project and the Yahoo Directory both require human editorial review and manual submission. Google offers Google Webmaster Tools, for which an XML Sitemap feed can be created and submitted for free to ensure that all pages are found, especially pages that are not usually discoverable by automatic link following. Search engine crawlers might view several factors when crawling a site. The search engines do not necessarily index every page. How far the pages might be from a site’s root directory can also be a crawling factor. Preventing crawling and indexingIn order to prevent undesirable/unnecessary content in the search indexes, webmasters may command spiders not to crawl particular types of file or directory through the standard robots.txt file in the root directory of the domain. Also, a page can be specifically excluded from a search engine's database by utilizing a meta tag exclusive to robots. On a visit to a site, the robots.txt located in the root directory is the first file crawled for the search engine. The robots.txt file is then translated to issue the command for the robot as to which pages are to be excluded from the crawl. Since a search engine crawler might retain a cached copy of this file, it could occasionally crawl those pages a webmaster had designated as excluded. The sort of pages usually excluded from a crawl include login specific pages like shopping carts and user-specific content such as search results from internal searches. In March 2007, Google alerted webmasters to pages considered ‘search spam’ and that they should prevent indexing of internal search results, which they regarded as falling into that category. |
||
