no-cache option

Post Reply
ryan4
Posts: 5
Joined: Wed Jan 03, 2001 9:28 am

no-cache option

Post by ryan4 »

Is there an option with gw to send a no-cache or similar header when indexing a site? Our site sits behind a reverse cache server, and I want to make sure webinator always gets the "latest" copy of the pages.
Thanks.
User avatar
mark
Site Admin
Posts: 5514
Joined: Tue Apr 25, 2000 6:56 pm

no-cache option

Post by mark »

Not with gw. You could modify the scripted walker to do it though using <urlcp header ... ...>
roskaa
Posts: 7
Joined: Tue Jun 10, 2003 6:37 am

no-cache option

Post by roskaa »

uhm. i am having similar problem, the webinator crawls pages and reports invalid links in some pages. but there aint those links any more.

how it can be done and where?

i'm a rookie at this, please help me out
User avatar
mark
Site Admin
Posts: 5514
Joined: Tue Apr 25, 2000 6:56 pm

no-cache option

Post by mark »

Try
<urlcp header "cache-control" "no-cache">
<urlcp header "pragma" "no-cache">
right after
<urlcp clearheaders>
in fetchset.
roskaa
Posts: 7
Joined: Tue Jun 10, 2003 6:37 am

no-cache option

Post by roskaa »

nope. that did not help.

in the walk status -log there is something really weird.

I've tried to refresh the search and yes, it makes a different db(1 or 2) so that i guess ain't the problem.

there still comes a few duplicate error from a page that has not exist in a week..

like,

The link :my.host.com/url/
Referenced by :my.host.com/url2/
Is a duplicate of:my.host.com/url3/jada/

and in url3 there is only a blank page!

it this some sort of webinators internal cache issue?
User avatar
mark
Site Admin
Posts: 5514
Joined: Tue Apr 25, 2000 6:56 pm

no-cache option

Post by mark »

No. Webinator doesn't not cache pages, at least not in the way you're thinking.

Apparently url 1 was blank as well when webinator fetched it.

Note that a refresh walk doesn't erase the status. It appends to the status from previous new and refresh walks. Go to the end of the status to find where the latest one started and look at the messages below that. Or try a "new" walk rather than "refresh".
roskaa
Posts: 7
Joined: Tue Jun 10, 2003 6:37 am

no-cache option

Post by roskaa »

yes it is a new walk, i newer even tested refresh.

so if i put many blank pages - to 'hide' something - that comes a duplicate for other blanks ?
User avatar
mark
Site Admin
Posts: 5514
Joined: Tue Apr 25, 2000 6:56 pm

no-cache option

Post by mark »

Pages with duplicate content will not be stored by default. If 2 pages are empty only the first encountered will be stored. No great loss since there's nothing to find on the page anyhow.

There is an option under all walk settings to disable duplicate prevention if you really want the dups.
roskaa
Posts: 7
Joined: Tue Jun 10, 2003 6:37 am

no-cache option

Post by roskaa »

hi,

yes i am aware of that, but that does not solve that weird issue that i am having with this.

in that blank page there ain't _anything_. so how is that duplicate prevention working? where it gets that ?

invalid/corrupt database? a bug?

i just need to know how so i can prevent this from happening, because i could really use that duplicate info in my work.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

no-cache option

Post by John »

Blank pages are duplicates of each other, as the content is the same, nothing. If you have pages that are linked, and return the same content, including no content, they will be flagged as duplicates.
John Turnbull
Thunderstone Software
Post Reply