On a -rewalk, using the -V

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

On a -rewalk, using the -V

Post by Thunderstone »



When using gw -rewalk I also use the -V option:

"during reload verify time, and refetch only if modified"

I'm just rewalking my own local web server, and according
to my web server logs, the Webinator spider is sending
GET requests whereas I thought it would send HEAD
requests to verify the time BEFORE it fetches/GETs
the page.

Is there any way to tell the walker to check with HEAD
first (looking at the Last Modified Date) instead of
pulling the entire document with GET?



____________________________________________________________
Better than free email: shared calendar, files, and more...
Get your 'virtual briefcase' at http://www.whatuseek.com






User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

On a -rewalk, using the -V

Post by Thunderstone »





Since the largest part of the overhead is in the connect to the
server, it is less than effective to do a HEAD and then a GET
for every page fetched. Doing this would roughly double the amount
of time it takes to acquire the sites' content.




User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

On a -rewalk, using the -V

Post by Thunderstone »



Maxwell Holmes said:

No. The -V option works by adding the "If-Modified-Since" header
in the GET request. That way the page will only be sent if it has
been modified, and saves having to do a HEAD, and then a GET. Note
the -V only works with -e, not with -rewalk, which does a full rewalk.

John Turnbull
-------------
Thunderstone Software


Post Reply