crawler texis.exe speed

Post Reply
edev
Posts: 127
Joined: Wed Sep 14, 2005 5:10 pm

crawler texis.exe speed

Post by edev »

Hi,

I have a HP Quad Xeon (Intel Xeon CPU 3.2GHz Processors) with a dedicated 400 GB hard drive partioned for storing walk data, the OS is Windows 2003 Server and total virtual memory 2.35 GB (now I have 1.6 GB available when running a dowalk).

This server only runs webinator enterprise so texis.exe is the only big process running. Looks like a good server on paper, but when I send out the crawler to crawl some 2000 external websites, with a limited page size and max depth set to 2, no off-sites and stays under, the speed gets very slow - it has only indexed over 55,000 pages so far and with previous experience it would've indexed alot more than that.

I checked the performance log on the server and the CPU usage shows between 50% - 65%, with two texis.exe process running each taking 25%. Memory usage is about 360,000 K. There are two texis.exe running probably because I set the number of servers to "2", and number of threads to "2".

My question is is this a reasonable speed for the crawler and is it normal for texis.exe to take up so much CPU time? If I limit the number of server and the number of thread to 1, would it help to increase the cralwer speed?

Any advice is appreciated.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

crawler texis.exe speed

Post by mark »

55000 pages in what time frame?
What kind of pages? Mostly html or lots of pdfs or others like doc or flash?
How are you specifying the 2000 sites? In base urls or what?

BTW, I moved this to the webinator group. You had posted in the appliance group.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

crawler texis.exe speed

Post by John »

If you are giving it 2,000 base urls you may want to set the "Follow Cross-site Links" option to "N". That will prevent it needing to check each URL against those 2,000 sites.
John Turnbull
Thunderstone Software
edev
Posts: 127
Joined: Wed Sep 14, 2005 5:10 pm

crawler texis.exe speed

Post by edev »

Thanks for the reply, the 55000 pages was in a 24 hour frame, and the page extensions were set to "jsp, asp, html, cfm and .txt" only. Those pages are listed on a text file and called through URL file option. The process size is set to "small".

I will set the follow cross link option to no and try it again over the weekend. Thank you!
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

crawler texis.exe speed

Post by mark »

Also turn servers up to to 4 or 6 so slow servers don't bog it down. And make sure you have enough bandwith between the crawling machine the and crawled.
Post Reply