We've run into an interesting problem. We're using Thunderstone to index some usenet groups through a dynamic html reader (a PHP script). The script has two modes: listing articles by page and displaying individual articles.
We've got 500,000+ articles, so a "new" walk of all pages every time would probably be impossible. We think it's safe to assume that "article" pages *never* change. So, when we do a refresh walk, we want to refresh all index pages and NEW articles, but completely ignore previously-indexed articles.
Essentially, I think we want to be able to set up different refresh rules based on URL pattern match. How can we tackle this?
Tnx- N
We've got 500,000+ articles, so a "new" walk of all pages every time would probably be impossible. We think it's safe to assume that "article" pages *never* change. So, when we do a refresh walk, we want to refresh all index pages and NEW articles, but completely ignore previously-indexed articles.
Essentially, I think we want to be able to set up different refresh rules based on URL pattern match. How can we tackle this?
Tnx- N