Exclude by field question

Post Reply
nroot
Posts: 8
Joined: Tue Oct 24, 2006 3:16 pm

Exclude by field question

Post by nroot »

I've got a bunch of pages that all include a tag in this format:

<td class=date>2007/04/02 04:19</td>

and I'd like to use "Exclude by Field" to not index pages where this date-stamp field is older than 2005/01/01. I've tried a ton of variants of rex expressions in the "Query" field but nothing's working. Can someone suggest the right rex syntax to use? Thanks!
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Exclude by field question

Post by mark »

Maybe something like
<td class\=date>200[0-4]
and
<td class\=date>1
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Exclude by field question

Post by John »

You might need a couple, e.g.

/>><td\x20class\=date>1
/>><td\x20class\=date>200[0-4]

Although as you have the scripts you may find it easier to extract the date and do the comparison as a date, e.g. in checkexfield

<rex ">><td\x20class\=date>\P=!</td>+" $htmlpage>
<$pagedate=(convert($ret, 'date' ))>
<if $pagedate lt '2005-01-01'>
<$exfield_index=N>
<$exfield_follow=N>
</if>
John Turnbull
Thunderstone Software
nroot
Posts: 8
Joined: Tue Oct 24, 2006 3:16 pm

Exclude by field question

Post by nroot »

Aggghh... it was the spaces in my rex that were killing me. I popped the \x20 in instead and it works fine now.

Thanks for the quick help.
Post Reply