gw fails when server is HUPed

User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

gw fails when server is HUPed

Post by Thunderstone »



Webinators,

Every morning the web server logs are rotated and the servers are
HUPed. If gw has a request in progress at that moment, the request
fails with these messages:

1997/06/11 04:05:03 Retrieving http://www.columbia.edu/cu/humanrights/visiting.html
1997/06/11 04:05:03 connect: Connection refused
1997/06/11 04:05:03 Can't connect to www.columbia.edu:80: Connection refused
1997/06/11 04:05:04 Retrieving http://www.columbia.edu/cu/gs/bulletin9596/gs18.html
1997/06/11 04:05:04 connect: Connection refused
1997/06/11 04:05:04 Can't connect to www.columbia.edu:80: Connection refused

After failing twice, gw stops trying. We've been using webinator for
months and this problem just started. I ran the job again, and it
failed again at the same time. My questions are these:

1) Has anyone else seen this problem?

2) Would it be fixed by upgrading our servers to Apache 1.2 (from 1.1)?
Perhaps the web server should finish any outstanding requests
before restarting.

3) Can we tell gw to try a few more times, instead of giving up after
two failed attempts?

Ben Beecher
AcIS R&D
Columbia University
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

gw fails when server is HUPed

Post by Thunderstone »




gw does not quit after some number of errors. It continues as long as
there is something in its todo list. You must have restarted when
there were only a couple things left in the todo list.

You need to start gw after the web server restart. Or start it
early enough that it finishes before the web server restart.
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

gw fails when server is HUPed

Post by Thunderstone »



To paraphrase; you would like gw to get pages from a server which
isn't running.

This is tricky. The problem isn't gw, but rather the method that is
being used for log switching. If gw is getting "Connection refused"
, so too will users trying to access the Web-Site at that same moment.
This doesn't seem like a good practice.

You could work around the gw part of the issue two ways:

1: dont walk at the same time the servers are being killed.

2: examine the error table ( gw -s "select * from error" ) and
use the output to determine which things you'll need to do over.

Webinator will wait for slow server (the time is also settable), but
if Port 80 is closed, the fetch is aborted. I'm not sure theres
anything we can do about this.

I'd try option 1.




User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

gw fails when server is HUPed

Post by Thunderstone »




Mark,

Thanks for your quick reply. Turns out I was barking up the wrong
tree. I was looking at the end of the log to find the number of pages
found by gw:

End (22751) Visited 2561 pages total

But the total is much higher, since there is a previous log entry in
the same file:

End (22751) Visited 42494 pages total - Will re-exec

Which brings the total pages to 45055, which is less that the 69023
pages we found last month, but close enough to make me think it's
working properly.

Ben