Refresh or New

Post Reply
pete.smith
Posts: 73
Joined: Tue May 17, 2005 2:08 pm

Refresh or New

Post by pete.smith »

Hello

I now have my entire intranet sliced up into profiles, and using metaprofiling to aggregate. I would like them all to automatically crawl the site on a sched to get new changes / update etc. I thought I wanted refresh walk nightly, but I think I might have been wrong on that? What would your recommended strategy be for keeping current? If I do a constant refresh (every 15) it seems to slow down the performance.

Pete
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Refresh or New

Post by mark »

It depends on how many pages are in the profiles and how dynamic they are. A refresh walk is generally more efficient than a new unless more than half of the pages are always changing. How much work the refresh does is also controlled by the refresh time settings under all walk settings.
pete.smith
Posts: 73
Joined: Tue May 17, 2005 2:08 pm

Refresh or New

Post by pete.smith »

Thanks Mark,

So my big profile is 300K pages, and people add new content all the time. I just need it so, that if someone adds something down in a tree, thunderstone finds it. Maybe it is a new walk nightly? I can do the whole thing in 8 hours.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Refresh or New

Post by John »

I there is a consistent place that new content gets added, or is linked from then a refresh walk should work well. Otherwise if content could be linked from a page that doesn't normally change it could take longer to pick up the change, and the New walk would be better.
John Turnbull
Thunderstone Software
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Refresh or New

Post by mark »

If not many of the existing pages are changing and finding new pages once a day (nightly) is sufficient then try a refresh with max refresh time of 12 hours or 1 day and a schedule of daily at the desired hour.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Refresh or New

Post by mark »

Note that all scheduled walks are refresh regardless of type setting in the profile. To do a new walk on a schedule you'll have to turn off the profile schedule and use some external scheduler such as unix cron or windows task scheduler. See the manual under "using dowalk" for how to launch a walk.
pete.smith
Posts: 73
Joined: Tue May 17, 2005 2:08 pm

Refresh or New

Post by pete.smith »

Thanks Mark, this is the behavior I dont get:

I have walk type "refresh" ( I get it does not matter for sched) and nightly at 1AM. I get "walk completed 11 minutes" . There is no way it could do anything on that many pages in 11 minutes. If I hit "Go" it appears to do a "resume". What is the diff between "Go" and just letting the job refresh by sched?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Refresh or New

Post by mark »

The scheduled run is the same as hitting go except that the walk type will be ignored and refresh used on a scheduled walk. The walk status page should give you a more detailed idea of what happened than the 1 line walk summary.

On a refresh walk only pages scheduled to be refreshed will be checked (see the default/min/max refresh time settings). For the pages that do get checked, if the server supports if-mod-since and the page will not be downloaded if it hasn't changed.
Post Reply