new vs. refresh

Post Reply
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

new vs. refresh

Post by KMandalia »

Does setting the following can make refreshing urls almost identical to newly discovered urls?

min refresh=1 min
max refresh=1 day
default refresh=1 hr

nextcheck=asap

I have a folder on our website that I want to be fetched first every time I start a refresh. Since refresh is "calculated" and since I want to crawl this folder when something new gets added (but other things in the folder don't change at all), I wanted to override refresh timings for everything within that folder.

Also, say I want to refresh everything in database but don't want to go through the trouble of backing up querylog (the only reason i don't want to do a new walk), can I,

1) delete everything in todo
2) and set nextcheck for everything to asap ('now')

is there any other thing that distinguishes new from refresh?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

new vs. refresh

Post by John »

New creates a new database and starts from the base url and crawls. Refresh will finish any remaining todo pages, and then refresh any pages with NextCheck before the current time in approximate NextCheck order. You could set NextCheck to '-1 week' to give priority to certain pages on refresh. You could also just give the folder url as watch url, which will ensure it gets refreshed every refresh, and any new pages would get added (without necessarily refreshing the other files in the folder).
John Turnbull
Thunderstone Software
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

new vs. refresh

Post by KMandalia »

so pretty much emptying out todo and setting nextcheck in past (-1 week) makes a refresh act like a new walk?

Watch URL is fine for the example I gave, but "intelligent refresh" is not so integllient for me so I have to tweak the admin to add this extra options.

Thanks.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

new vs. refresh

Post by John »

The order of pages fetched may be different, and a refresh of everything maybe slower than a new walk, especially if the web server does not know if-modified-since, as each page will be compared against the previous version, and the table updated rather than blindly fetching everything.
John Turnbull
Thunderstone Software
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

new vs. refresh

Post by KMandalia »

Perfect reply I was looking for.

Thanks.
Post Reply