May 11 2006

Search Engine Spider Experiment

Search Engine Spider Experiment
This is a visual representation of how yahoo crawled an experimental website.

Here's a great experiment done at http://drunkmenworkhere.org/ about search engine spider behavior.

They created a binary tree website which is 30 levels deep. The first page is connected to only 2 pages, and each of the 2 pages are connected to 2 more pages, and so on until it reaches 30 levels deep. The total web pages is about 2 billion.

The goal of the year long experiment is to find out how the 3 major search engines search and crawl your websites.

Some of the highlights of their findings.

Search engine ranking based on the following criteria:

Number of unique pages crawled
1. yahoo
2. google
3. MSN

Number of pageviews
1. yahoo
2. google
3. MSN

Number of pages indexed
1. yahoo
2. google
3. MSN

Some other findings:

Yahoo
- yahoo crawled 30,000 pages on the first month
- during the next 3 months, it requested the same pages
only after the 4th month it finds new web pages
- yahoo visited only the first 3 levels more frequently

Google

- google seems to alternate periods of discover and refresh of
pages already viewed before.
- Unlike yahoo, google frequently visited up to 12 levels
- the frequency of visit seems to be related to the pagerank

MSN
- has a smaller tree compared to yahoo and google
- msnbot virtually ceased to crawl Binary Search Tree 2 after five months


No Comments

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment