What is a Search engine Spider ?

A spider is a software program that travels the Web (hence the name “spider”), locating and indexing websites for search engines. All the major search engines, such as Google and Yahoo!, use spiders to build and update their indexes. These programs constantly browse the Web, traveling from one hyperlink to another.

For example, when a spider visits a website’s home page, there may be 30 links on the page. The spider will follow each of the links, adding all the pages it finds to the search engine’s index. Of course, the new pages that the spider finds may also have links, which the spider continues to follow. Some of these links may point to pages within the same website (internal links), while others may lead to different sites (external links). The external links will cause the spider to jump to new sites, indexing even more pages.

Because of the interwoven nature of website links, spiders often return to websites that have already been indexed. This allows search engines to keep track of how many external pages link to each page. Usually, the more incoming links a page has, the higher it will be ranked in search engine results. Spiders not only find new pages and keep track of links, they also track changes to each page, helping search engine indexes stay up to date.

Search engine Spiders

Googlebot

Googlebot is the name of the search engine spider for Google. Googlebot will visit sites which have been submitted to the index every once in a while to update its index.

Googlebot obeys the contents of your Robots.Txt file as well as the Robots Metatag. Google has also created a special version of the robots metatag to control indexing of just Googlebot. Like most search engine spiders, Googlebot follows HREF tags. It also follows SRC tags.

Yahoo! Slurp (Slurp)

Yahoo! Slurp (Slurp) is a web crawler from Yahoo! group. It crawls the web and puts content into the Yahoo! Search engine. Slurp is based on Inktomi’s web search technology – Inktomi was acquired by Yahoo! in late 2002.

Slurp obeys the contents of your Robots.Txt file as well as the Robots Metatag. Like most search engine spiders, Slurp follows HREF tags. It does not follow SRC tags.

In general, Slurp will not index documents which are dynamically created. They recommend that you create static copies of dynamic pages so that Slurp will index them.

Slurp will reindex a site once every three to four weeks.

Scooter

Scooter is the spider for AltaVista. What this spider does is scan websites all day long, looking for things to add to the AltaVista index. Thousands of threads are sent out simultaneously all day, each and every day, to all of the corners of the World Wide Web.

This spider spends it’s time scanning web pages for hyperlinks and text to add to the index. Each page is torn apart and a specific algorithm is applied to determine how (and if) all of the information is to be added to the monstrous index.

Scooter visits pages which are submitted to AltaVista via the “Add URL” link. In addition, it revisits old pages to determine if there were any changes which need to be updated into the index. Pages which no longer exist are deleted if they continue to get errors on several visits (this implies that 404 errors will always exist as they are not deleted right away). And of course, new links that are found are investigated by Scooter – sometimes.

Here are some other search engine spiders

Search Engine Spider Names

Spider Name Search Engine Status
AbachoBOT Abacho -
Acoon Acoon -
AESOP_com_SpiderMan Aesop -
ah-ha.com crawler Ah-ha -
appie Walhello -
Arachnoidea Euroseek active
ArchitextSpider Excite inactive
Atomz Atomz -
DeepIndex DeepIndex (www.en.deepindex.com) -
ESISmartSpider Ttravel Finder -
EZResult EZResults -
FAST-WebCrawler AlltheWeb active
Fido PlanetSearch -
Fluffy the spider SearchHippo active
Googlebot Google active
Gigabot Gigablast active
Gulliver Northernlight inactive
Gulper Yuntis active
HenryTheMiragoRobot Mirago -
ia_archiver Alexa active
KIT-Fireball/2.0 Fireball (German SE at www.fireball.de) -
LNSpiderguy Lexis-Nexis -
Lycos_Spider_(T-Rex) Lycos inactive
MantraAgent LookSmart active
MSN Microsoft Prototype Crawler
Added 5.2003 by Dale Shad of
www.118group.com
active
NationalDirectory-SuperSpider National Directory -
Nazilla Websmostlinked -
Openbot Openfind -
Openfind piranha,Shark Openfind -
Scooter AltaVista active
Scrubby Scrub The Web active
Slurp.so/1.0
Slurp/2.0j
Slurp/2.0
Slurp/3.0
Inktomi active
Tarantula AltaVista inactive
Teoma_agent1 Teoma active
UK Searcher Spider UKSearcher -
WebCrawler WebCrawler -
Winona WhatUSeek
Added 3.2003 by Dale Shad of
www.118group.com
active
ZyBorg Wisenut active

Submit your site to major Search engines

When you submit your website to a search engine, it reads your site meta tags, looks their relationship with the contents, indexes you website and assigns a rank to your site according to the algorithm it follows. Here, you should understand that by submitting your site to a search engine does not mean that you will start getting high traffics just after its submission. This simply means that now the search engine knows about your site and its pages and would place you in its SERP (Search Engine Result Pages) according to your rank in its index.

Here are site submission URLs and little info about their strategies.

Google

http://www.google.com/addurl/?continue=/addurl
They ask you to submit your top level page and have pretty easy to understand instructions for submission. Google updates its index normally once a month.

Yahoo

http://submit.search.yahoo.com/
They have two options free and paid. Free listing takes about 30 to 45 days. However paid listing assures a quick listing of your site.

MSN

http://search.msn.com/docs/submit.aspx
MSN in routine picks new websites having good inbound links. So if you have good inbound links, your site will be picked for listing in MSN even if you don’t submit your site to them.

AOL

You can not submit to AOL directly but if your site is indexed by Google, AOL will most likely include your site in its index too.