errors during walk

Post Reply
jgdoke
Posts: 167
Joined: Wed Jul 14, 2004 10:52 am

errors during walk

Post by jgdoke »

006 2005-01-07 00:14:07 /usr/local/morph3/texis/scripts/dowalk:3714: Timeout reading from www.ab.com:80 in the function htbuf_readnblk

This is happening about twice a minute.

This server is a very fast server and should not be giving errors.
Crawl Delay ? 0
Parallelism ? Threads: 3 Servers: 1

Verbosity ? 2
Rewalk Type ? New
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

errors during walk

Post by mark »

Is the network between the appliance and the server www.ab.com consistently fast and reliable?
Does www.ab.com sputter if you access many pages per second? Try setting Threads to 2 or 1 instead of 3.
jgdoke
Posts: 167
Joined: Wed Jul 14, 2004 10:52 am

errors during walk

Post by jgdoke »

User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

errors during walk

Post by Kai »

The close of a connection (ezclosesock()) happens after the transaction is complete, so it has a fixed timeout of 3 seconds (regardless of profile timeout) because there's no data to wait for. The latest appliance update (out today) should alleviate timeout errors like this during close()/ezclosesock(), though they are generally caused by network slowness or intermittent dropouts.

For the timeout during read (msg 1), is there any firewall or proxy between the appliance and the crawled site?
jgdoke
Posts: 167
Joined: Wed Jul 14, 2004 10:52 am

errors during walk

Post by jgdoke »

it goes > thunderstone > Stateful firewall > Reverse Proxy > Web Server. Reverse Proxy and web Server are on same box. 100MEG connection between them all.

Further questions..
If the box has a timeout error will it try again later??
Also during the last walk I saw this in the report:
The link : http://www.ab.com/plclogic/slc/
Had this error: Document not found: http://www.ab.com/plclogic/slc/jsmenu/overlib.js returned code 404 (Not found)
Referenced by : http://www.ab.com/logix/compactlogix/

It looks as though it found the page http://www.ab.com/plclogic/slc/
and found a link error in it. BUT the page:http://www.ab.com/plclogic/slc/ does not show up in List/edit url's

How come???
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

errors during walk

Post by mark »

Post Reply