Page 1 of 1

Exclude by field question

Posted: Fri Apr 13, 2007 1:04 pm
by nroot
I've got a bunch of pages that all include a tag in this format:

<td class=date>2007/04/02 04:19</td>

and I'd like to use "Exclude by Field" to not index pages where this date-stamp field is older than 2005/01/01. I've tried a ton of variants of rex expressions in the "Query" field but nothing's working. Can someone suggest the right rex syntax to use? Thanks!

Exclude by field question

Posted: Fri Apr 13, 2007 1:55 pm
by mark
Maybe something like
<td class\=date>200[0-4]
and
<td class\=date>1

Exclude by field question

Posted: Fri Apr 13, 2007 1:57 pm
by John
You might need a couple, e.g.

/>><td\x20class\=date>1
/>><td\x20class\=date>200[0-4]

Although as you have the scripts you may find it easier to extract the date and do the comparison as a date, e.g. in checkexfield

<rex ">><td\x20class\=date>\P=!</td>+" $htmlpage>
<$pagedate=(convert($ret, 'date' ))>
<if $pagedate lt '2005-01-01'>
<$exfield_index=N>
<$exfield_follow=N>
</if>

Exclude by field question

Posted: Fri Apr 13, 2007 2:53 pm
by nroot
Aggghh... it was the spaces in my rex that were killing me. I popped the \x20 in instead and it works fine now.

Thanks for the quick help.