Is it possible to index pages listed in the Watch URL on a different schedule than the normal refresh schedule?
I have too many profiles to be able to do a complete walk every day, but would like to ensure that updated content gets picked up quickly.
There's not really any way to separate a group of urls from the normal refresh rules. If the servers provide last-modified information the refresh should home in on the changing data within a few cycles.
Well, I already have three appliances... If I use "On Change" as the schedule and define a Watch URL, will a complete walk be performed or will just the links in the Watch URL be followed?
If the watch url is a normal part of your walk you don't have to also include it in the base urls. If it's not and you want it in your database you need to add it.
I see. One last question: what criteria is being used to determine whether the "Watch URL" is interpreted as being updated? The content of the page, the last modified META tag, or both of them?
It looks for last-modifed and uses that if present. If it's not present it compares the text of the page (not the full html, only the textual portions).
There's tons of whitespace on the head of every line. Try creating the file with no extraneous whitespace at beginning or end of line and no blank lines.