Watch URL and walk schedule

dietric
Posts: 100
Joined: Fri May 20, 2005 10:57 am

Watch URL and walk schedule

Post by dietric »

I have set up Watch URLs and set up my profile to do a rewalk onchange.
The rewalk is triggered every 15 minutes or so, despite the fact that the content at the watchURL has not changed and I'm not setting a lastModified META tag. Any suggestions?


http://qaprodc.adv100.com/qaprodc/artic ... ts/new.jsp

thanks
-ds
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Watch URL and walk schedule

Post by mark »

That url was returning an error page with a modified time of "now" so it would appear changed every time. While I was looking it started working. So it will probably work now.
dietric
Posts: 100
Joined: Fri May 20, 2005 10:57 am

Watch URL and walk schedule

Post by dietric »

I see why that would have happened. Is there any way for it NOT to trigger a rewalk if the watchURL returns a response code other than 200? The last thing you'd want to happen if your servers's down anyway is a rewalk to be triggered.

Thanks
-ds
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Watch URL and walk schedule

Post by mark »

Actually the server was returning a document that was good as far as the client/crawler was concerned. That's why it had a modified time. But the text of the document said the backend was having problems or some such. The crawler can't really know that it was a failure in that case.
dietric
Posts: 100
Joined: Fri May 20, 2005 10:57 am

Watch URL and walk schedule

Post by dietric »

Do you mean it did return 200 as the status code? If that's the case I might be able to fix that. What is the crawler looking at to determine whether it's "good"?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Watch URL and walk schedule

Post by mark »

Not 100% sure because I saw the error page in the browser. By the time I looked at http codes it was working.

Currently the crawler doesn't care about status codes, just modified date or changed text. A fix to look at status is in the todo queue.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Watch URL and walk schedule

Post by mark »

There are some issues with getting errors from the watch url though. They will be fixed in the next script release.
dietric
Posts: 100
Joined: Fri May 20, 2005 10:57 am

Watch URL and walk schedule

Post by dietric »

That's great news. Do you know what error codes are going to be expected so a rewalk is NOT triggered?

Thanks
-ds
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Watch URL and walk schedule

Post by mark »

HTTP codes 100-299 are considered ok. Anything else will prevent triggering of a rewalk and will be treated as if no attempt was made to fetch the url. Other non-http conditions that will prevent triggering are connection timeouts, dns failures, etc.
dietric
Posts: 100
Joined: Fri May 20, 2005 10:57 am

Watch URL and walk schedule

Post by dietric »

Has this been included in the recent update (6.2.11) by any chance?

Thanks
-ds