Selective refresh

Post Reply
nroot
Posts: 8
Joined: Tue Oct 24, 2006 3:16 pm

Selective refresh

Post by nroot »

We've run into an interesting problem. We're using Thunderstone to index some usenet groups through a dynamic html reader (a PHP script). The script has two modes: listing articles by page and displaying individual articles.

We've got 500,000+ articles, so a "new" walk of all pages every time would probably be impossible. We think it's safe to assume that "article" pages *never* change. So, when we do a refresh walk, we want to refresh all index pages and NEW articles, but completely ignore previously-indexed articles.

Essentially, I think we want to be able to set up different refresh rules based on URL pattern match. How can we tackle this?

Tnx- N
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Selective refresh

Post by John »

The simplest method is probably to edit the calcnextcheck function to take the URL, and pass it in where needed. You could then look at the URL and set nextcheck in the future if it matches, e.g.

<rex article $url>
<if $ret neq ''>
<return "2030-01-01">
<else>
<return "now">
</if>
John Turnbull
Thunderstone Software
nroot
Posts: 8
Joined: Tue Oct 24, 2006 3:16 pm

Selective refresh

Post by nroot »

Thanks John- I created a script that (I think) did roughly what you said: changed the NextCheck for pages in the database with URLs matching a specific pattern to the year 2030. I set that script up to run every 15 minutes while the "new" walk was going on and replace the NextCheck date for just rows that hadn't already been hit. Seems to have worked.

But that sounds a little different than editing the "calcnextcheck" function... where is that function? I couldn't find any info on it anywhere.

Thanks for the help- N
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Selective refresh

Post by mark »

calcnextcheck is in the dowalk script.
Post Reply