Page 1 of 1

crawler texis.exe speed

Posted: Fri Sep 01, 2006 10:47 am
by edev
Hi,

I have a HP Quad Xeon (Intel Xeon CPU 3.2GHz Processors) with a dedicated 400 GB hard drive partioned for storing walk data, the OS is Windows 2003 Server and total virtual memory 2.35 GB (now I have 1.6 GB available when running a dowalk).

This server only runs webinator enterprise so texis.exe is the only big process running. Looks like a good server on paper, but when I send out the crawler to crawl some 2000 external websites, with a limited page size and max depth set to 2, no off-sites and stays under, the speed gets very slow - it has only indexed over 55,000 pages so far and with previous experience it would've indexed alot more than that.

I checked the performance log on the server and the CPU usage shows between 50% - 65%, with two texis.exe process running each taking 25%. Memory usage is about 360,000 K. There are two texis.exe running probably because I set the number of servers to "2", and number of threads to "2".

My question is is this a reasonable speed for the crawler and is it normal for texis.exe to take up so much CPU time? If I limit the number of server and the number of thread to 1, would it help to increase the cralwer speed?

Any advice is appreciated.

crawler texis.exe speed

Posted: Fri Sep 01, 2006 12:09 pm
by mark
55000 pages in what time frame?
What kind of pages? Mostly html or lots of pdfs or others like doc or flash?
How are you specifying the 2000 sites? In base urls or what?

BTW, I moved this to the webinator group. You had posted in the appliance group.

crawler texis.exe speed

Posted: Fri Sep 01, 2006 12:36 pm
by John
If you are giving it 2,000 base urls you may want to set the "Follow Cross-site Links" option to "N". That will prevent it needing to check each URL against those 2,000 sites.

crawler texis.exe speed

Posted: Fri Sep 01, 2006 3:20 pm
by edev
Thanks for the reply, the 55000 pages was in a 24 hour frame, and the page extensions were set to "jsp, asp, html, cfm and .txt" only. Those pages are listed on a text file and called through URL file option. The process size is set to "small".

I will set the follow cross link option to no and try it again over the weekend. Thank you!

crawler texis.exe speed

Posted: Fri Sep 01, 2006 5:10 pm
by mark
Also turn servers up to to 4 or 6 so slow servers don't bog it down. And make sure you have enough bandwith between the crawling machine the and crawled.