Thursday, October 11, 2007

Pitfalls for Search Engine Spiders

When putting together your site, you need to bear in mind a number of factors that will hinder or prevent search engine spiders from crawling over and indexing your site. The following cause massive problems for spiders and should be avoided at all costs:

Deep Content: Any pages that cannot be reached within three clicks from the homepage.

Session Pages: Pages requiring a Cookie to enable navigation may casue problems because of the way spiders navigate.

Frames: Search engines do not like sites built on frames.

Pages Generated By a Database-Driven Site: URLs with ?, % & = signs are like red flags to search engines. It is easy for spiders to get lost in dynamic content & so much of it is not indexed.

Pages with More Than 150 Links: Search engines do not crawl large numbers of links on a site.

Pages Behind a rel="nofollow" Link: Google spiders will not follow these hyperlinks.

Pages That Require a Button to be Clicked: Pages that are only accessible using a submit button, drop-down menu or search are not indexed.


Pages Requiring a Login: Because spiders won't be able to log in.

Pages that Re-direct Before Showing Content: This is called 'cloaking' and can cause sites to be banned.

The simplest way to get your page spidered and indexed is to have HTML links to the pages you want to be crawled. Everything should be availalbe within 3 clicks of the homepage as a general rule & you should have an XML sitemap on your site to ensure that all of your content can be seen by spiders. Each page on a sitemap is considered relevant to users of search engines.

No comments: