I'm trying to find the most efficient and easiest way to configure Webinator so it doesn't re-walk the pages it's already done. The site I'm working on is a newspaper's online site, where nearly all the content never changes. Each week the items in the main directory are replaced, and after a week are moved into a dated directory under /archive/. Thus, we have:
/
/welcome.html
/front1.shtml
/front2.shtml
...
/archive/
/archive/20010620/
welcome.html
front1.shtml
...
/archive/20010613/
and so forth. The archives are linked off the main page so a depth limit won't do what I want - the new archive is at the same depth as the old ones.
Presently, I use -rewalk to reindex the entire site, which works fine. However this is inefficient. So each week, I want to index the top level, plus just _one_ of the directories under /archive (the most recent).
I see several possible ways to do this, including using the -V option to only download modified pages, or using -x to exclude previously run directories (this would require updating for each run).
What might be the most effective way to do this? My present config file has these options:
-d- -D9 -M -o -t5 -fshtml
The -D9 is historical, and probably not relevant any more.
/
/welcome.html
/front1.shtml
/front2.shtml
...
/archive/
/archive/20010620/
welcome.html
front1.shtml
...
/archive/20010613/
and so forth. The archives are linked off the main page so a depth limit won't do what I want - the new archive is at the same depth as the old ones.
Presently, I use -rewalk to reindex the entire site, which works fine. However this is inefficient. So each week, I want to index the top level, plus just _one_ of the directories under /archive (the most recent).
I see several possible ways to do this, including using the -V option to only download modified pages, or using -x to exclude previously run directories (this would require updating for each run).
What might be the most effective way to do this? My present config file has these options:
-d- -D9 -M -o -t5 -fshtml
The -D9 is historical, and probably not relevant any more.