help with rex/sandr please... looking for an effective pattern

nduvnjak
Posts: 40
Joined: Wed Feb 06, 2008 3:45 pm

help with rex/sandr please... looking for an effective pattern

Post by nduvnjak »

Hi,
can anyone help me with this parsing riddle:

I want to REMOVE all occurrences of a certain word from a text, but only if such word is between certain opening and closing tags.

Namely, the word I want to arbitrary remove is "<row>" and the opening tag is "<ul>", closing: "</ul>". The example text, which is some sort of altered HTML is given at the end of this message.
I want to remove all "<row>"s that are inside the <ul>...</ul> , whether these tags are nested or not. But definitely I want to keep those "<row>"s that are outside any "<ul>...</ul>".

To be even more precise I left the "<ROW>"s that need to be removed in the UPPERCASE, while those that need to remain in lowercase; but imagine all of them are lowercase, so we can't use the CAPS as the condition.

Thank you.
Nenad



<row><br> </font><row><br>

<li type="square"><font face="Arial" size="1"><a href="/unmegafono/compa-para-arqurbpaisaje-b-noche-26778.html?LINEFMT=2"><b>COMPA PARA ARQ,URB,PAISAJE B NOCHE</b></a> - <b>MARI,EUGE</b> - 17/03/2010 10:59&nbsp;<script>Check("17/03/2010 13:59");</script><br>
<ul> </font><ROW><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/re-compa-para-arqurbpaisaje-b-noche-26800.html?LINEFMT=2"><b>Re: COMPA PARA ARQ,URB,PAISAJE B NOCHE</b></a> - <b>lau</b> - 17/03/2010 12:37&nbsp;<script>Check("17/03/2010 15:37");</script><br> </font><ROW><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/re-compa-para-arqurbpaisaje-b-noche-26799.html?LINEFMT=2"><b>Re: COMPA PARA ARQ,URB,PAISAJE B NOCHE</b></a> - <b>lau</b> - 17/03/2010 12:33&nbsp;<script>Check("17/03/2010 15:33");</script><br>
<ul> </font><ROW><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/re-compa-para-arqurbpaisaje-b-noche-26806.html?LINEFMT=2"><b>Re: COMPA PARA ARQ,URB,PAISAJE B NOCHE</b></a> - <b>MARI Y E</b> - 17/03/2010 13:35&nbsp;<script>Check("17/03/2010 16:35");</script><br>
<ul> </font><ROW><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/re-compa-para-arqurbpaisaje-b-noche-26808.html?LINEFMT=2"><b>Re: COMPA PARA ARQ,URB,PAISAJE B NOCHE</b></a> - <b>MARI Y E</b> - 17/03/2010 13:37&nbsp;<script>Check("17/03/2010 16:37");</script><br> </font><ROW><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/re-compa-para-arqurbpaisaje-b-noche-26807.html?LINEFMT=2"><b>Re: COMPA PARA ARQ,URB,PAISAJE B NOCHE (s/t)</b></a> - <b>MARI Y E</b> - 17/03/2010 13:36&nbsp;<script>Check("17/03/2010 16:36");</script><br>
</ul>
</ul>
</ul> </font><row><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/construcciones-i-a-se-adhiere-al-26805.html?LINEFMT=2"><b>construcciones I A se adhiere al paeo???? (s/t)</b></a> - <b>flo</b> - 17/03/2010 13:33&nbsp;<script>Check("17/03/2010 16:33");</script><br> </font><row><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/paro-26804.html?LINEFMT=2"><b>paro?????</b></a> - <b>juli</b> - 17/03/2010 13:31&nbsp;<script>Check("17/03/2010 16:31");</script><br> </font><row><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/gestion-ambiental-del-paisajenecesito-26803.html?LINEFMT=2"><b>GESTION AMBIENTAL DEL PAISAJE...NECESITO INFORMACION</b></a> - <b>juan</b> - 17/03/2010 13:18&nbsp;<script>Check("17/03/2010 16:18");</script><br> </font><row><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/paro-26802.html?LINEFMT=2"><b>paro</b></a> - <b>jesi</b> - 17/03/2010 12:56&nbsp;<script>Check("17/03/2010 15:56");</script><br> </font><row><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/cambio-de-horario-dr-topografia-26796.html?LINEFMT=2"><b>cambio de horario dr topografia</b></a> - <b>pablo</b> - 17/03/2010 12:18&nbsp;<script>Check("17/03/2010 15:18");</script><br>
<ul> </font><ROW><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/re-cambio-de-horario-dr-topografia-26801.html?LINEFMT=2"><b>Re: cambio de horario dr topografia</b></a> - <b>luis</b> - 17/03/2010 12:51&nbsp;<script>Check("17/03/2010 15:51");</script><br>
</ul> </font><row><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/cambio-de-turno-26793.html?LINEFMT=2"><b>CAMBIO DE TURNO</b></a> - <b>Paola Sembaj</b> - 17/03/2010 12:11&nbsp;<script>Check("17/03/2010 15:11");</script><br>
<ul> </font><ROW><br>
<li type="square"><font face="Arial" size="1"><a href="/unmegafono/re-cambio-de-turno-construcciones-26798.html?LINEFMT=2"><b>Re: CAMBIO DE TURNO CONSTRUCCIONES 2007</b></a> - <b>Paola Sembaj</b> - 17/03/2010 12:26&nbsp;<script>Check("17/03/2010 15:26");</script><br>
</ul> </font><row><br>
User avatar
John
Site Admin
Posts: 2623
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH

help with rex/sandr please... looking for an effective pattern

Post by John »

Probably the best solution is to use sandcall, e.g.


<a name=startul hit>
<fmt %s $hit>
<$inul=y>
</a>
<a name=endul hit>
<fmt %s $hit>
<$inul=>
</a>
<a name=sawrow hit>
<if $inul neq 'y'>
<fmt %s $hit>
</if>
</a>
<a name=removerows>
<$search="<ul>" "</ul>" "<row>">
<$callfunc=startul endul sawrow>
<fmtcp SANDCALL $search $callfunc>
<capture><sb>$doc</sb></capture>
</a>
John Turnbull
Thunderstone Software
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

help with rex/sandr please... looking for an effective pattern

Post by mark »

Given that there can be nesting you probably want a count rather than a flag for inul.
<$inul=y> --> <$inul=($inul+1)>
<$inul=> --> <$inul=($inul-1)>
<if $inul neq 'y'> --> <if $inul eq 0>
before fmtcp add <$inul=0>

More logic could be added to avoid underflow in the case of extra </ul>s.
nduvnjak
Posts: 40
Joined: Wed Feb 06, 2008 3:45 pm

help with rex/sandr please... looking for an effective pattern

Post by nduvnjak »

this is perfect, thank you!