Walks with URL URL are slow

Post Reply
jens.ni
Posts: 3
Joined: Thu Oct 13, 2011 4:41 am

Walks with URL URL are slow

Post by jens.ni »

We have one Webinator-Server (Webinator 5.1.50-Windows).
There are about 20 profiles defined.
They all have a similar configuration.
Since some weeks (or months, we have no history) we're having problems with two profiles: one runs extremely slow, the other is hanging somehow.


Slow profile
-------------------
Configuration of the profile
Base URL: http://www.intranet.company.com
Robots.txt: no
Meta: yes
Placeholder: yes
Extensions: .html .htm .txt .pdf .doc .dot .xls .docx .xlsx .ppt .pptx .swf .zip
Exclusions: 5 custom exclusions
Crawl Delay: 0
Threads: 2
Servers: 2
Verbosity: 4
Rewalk Type: New
URL URL: http://www.intranet.company.com/spiderpage
Required REX: http://www.intranet.company.com/


We discovered that, if we do not specify an "URL URL", the walk runs very fast.

Without "URL URL" specified:
- Webinator walks every 5100 site on www.intranet.company.com
- CPU is about 50-70%
- Duration is about 10 minutes

With "URL URL" specified:
- There are "only" 1800 URLs specified in the URL-File
- CPU is at 100% during the whole walk
- Duration is about 6 hours
- We can find some errors in the vortex.log
"Timeout reading from www.intranet.company.com in the function htbuf_readnblk" and "Cannot read from www.intranet.company.com: An existing connection was forcibly closed by the remote host in the function htbuf_readnblk"




Hanging profile
-------------------
Similar issues with this profile as with the first one:
Without an "URL URL" specified the walk is fast (about 30 minutes).
But with an "URL URL" specified (6400 pages), the walk hangs up somehow. We stopped the walk after 60 hours....





We have other profiles with an "URL URL" specified; they are not that slow (i.e. 1000 pages, 1 hour).
We also tried to recreate the profiles but with no effect.


Any suggestions?

Kind regards,
Jens
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Walks with URL URL are slow

Post by mark »

Your required rex is not necessary. The walker will stay on the site by default. What's the purpose of your URL URL? You're providing 1800 starting points, not just individual pages, in addition to the starting point listed in base url. If you want to specify specific single pages to index use PAGE URL instead.

From your error message it seems the server is slow or overloaded or is blocking you for hitting it too hard.
jens.ni
Posts: 3
Joined: Thu Oct 13, 2011 4:41 am

Walks with URL URL are slow

Post by jens.ni »

Hi Mark

Thanks for the advice.
We'll check our settings and try again.
jens.ni
Posts: 3
Joined: Thu Oct 13, 2011 4:41 am

Walks with URL URL are slow

Post by jens.ni »

Hi Mark,
Thanks for the advice.
We're using now PAGE URL instead of URL URL and everything works fine.
Kind Regards,
Jens
Post Reply