Page 1 of 1

no-cache option

Posted: Fri Oct 19, 2001 11:49 am
by ryan4
Is there an option with gw to send a no-cache or similar header when indexing a site? Our site sits behind a reverse cache server, and I want to make sure webinator always gets the "latest" copy of the pages.
Thanks.

no-cache option

Posted: Fri Oct 19, 2001 12:57 pm
by mark
Not with gw. You could modify the scripted walker to do it though using <urlcp header ... ...>

no-cache option

Posted: Tue Mar 16, 2004 4:01 am
by roskaa
uhm. i am having similar problem, the webinator crawls pages and reports invalid links in some pages. but there aint those links any more.

how it can be done and where?

i'm a rookie at this, please help me out

no-cache option

Posted: Tue Mar 16, 2004 10:43 am
by mark
Try
<urlcp header "cache-control" "no-cache">
<urlcp header "pragma" "no-cache">
right after
<urlcp clearheaders>
in fetchset.

no-cache option

Posted: Thu Mar 25, 2004 5:14 am
by roskaa
nope. that did not help.

in the walk status -log there is something really weird.

I've tried to refresh the search and yes, it makes a different db(1 or 2) so that i guess ain't the problem.

there still comes a few duplicate error from a page that has not exist in a week..

like,

The link :my.host.com/url/
Referenced by :my.host.com/url2/
Is a duplicate of:my.host.com/url3/jada/

and in url3 there is only a blank page!

it this some sort of webinators internal cache issue?

no-cache option

Posted: Thu Mar 25, 2004 11:07 am
by mark
No. Webinator doesn't not cache pages, at least not in the way you're thinking.

Apparently url 1 was blank as well when webinator fetched it.

Note that a refresh walk doesn't erase the status. It appends to the status from previous new and refresh walks. Go to the end of the status to find where the latest one started and look at the messages below that. Or try a "new" walk rather than "refresh".

no-cache option

Posted: Fri Mar 26, 2004 2:40 am
by roskaa
yes it is a new walk, i newer even tested refresh.

so if i put many blank pages - to 'hide' something - that comes a duplicate for other blanks ?

no-cache option

Posted: Fri Mar 26, 2004 10:23 am
by mark
Pages with duplicate content will not be stored by default. If 2 pages are empty only the first encountered will be stored. No great loss since there's nothing to find on the page anyhow.

There is an option under all walk settings to disable duplicate prevention if you really want the dups.

no-cache option

Posted: Tue Mar 30, 2004 5:33 am
by roskaa
hi,

yes i am aware of that, but that does not solve that weird issue that i am having with this.

in that blank page there ain't _anything_. so how is that duplicate prevention working? where it gets that ?

invalid/corrupt database? a bug?

i just need to know how so i can prevent this from happening, because i could really use that duplicate info in my work.

no-cache option

Posted: Tue Mar 30, 2004 7:30 am
by John
Blank pages are duplicates of each other, as the content is the same, nothing. If you have pages that are linked, and return the same content, including no content, they will be flagged as duplicates.