Is there an option with gw to send a no-cache or similar header when indexing a site? Our site sits behind a reverse cache server, and I want to make sure webinator always gets the "latest" copy of the pages.
Thanks.
No. Webinator doesn't not cache pages, at least not in the way you're thinking.
Apparently url 1 was blank as well when webinator fetched it.
Note that a refresh walk doesn't erase the status. It appends to the status from previous new and refresh walks. Go to the end of the status to find where the latest one started and look at the messages below that. Or try a "new" walk rather than "refresh".
Pages with duplicate content will not be stored by default. If 2 pages are empty only the first encountered will be stored. No great loss since there's nothing to find on the page anyhow.
There is an option under all walk settings to disable duplicate prevention if you really want the dups.
Blank pages are duplicates of each other, as the content is the same, nothing. If you have pages that are linked, and return the same content, including no content, they will be flagged as duplicates.