Parser Troubles.

Post Reply
gerry.odea
Posts: 98
Joined: Fri Sep 19, 2008 9:33 am

Parser Troubles.

Post by gerry.odea »

I'm trying to build a paser for this:

<a href="/search?hl=en&q=sports+cars&revid=238886396&ei=7usSSvz_BZiu8QSH7aSQBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=1"><b>sports</b> cars</a>

but I don't want to base it on

>><a=!href\=+href\=="?[^" >]+[^>]*>=[^<\x0a]+</a>=\x0a=

because their will be other <a href></a> that will be pulled in. I want to only pull in the <a href>'s that have revisions_inline in the url string of the <a href>
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Parser Troubles.

Post by mark »

Most reliable might be to do another pass over the list returned by your first expression.

<rex ".*>>revisions_inline=.*" $ret>
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Parser Troubles.

Post by mark »

Or you could use <fetch> to parse for you, and use <urlinfo links> instead of your expression to get the list of urls on the page. Then rex for revisions_inline in that list.
gerry.odea
Posts: 98
Joined: Fri Sep 19, 2008 9:33 am

Parser Troubles.

Post by gerry.odea »

I'm doing this instead. But it won't bring in the title. Can you tell me why?

<a name = GETRELATED>
<$searchurl = "http://www.domain.com/search?q=xyzzy">
<$imports='
recdelim >><table class\="ts std"
firstmatch
field Title varchar(40) />><a>\P=!</a>+
field Title2 varchar(40) />><b>\P=!</b>+
'>
</a>



<table class="ts std" id=brs style="padding:0 0 1em"><caption class="med nobr" style="padding-bottom:6px;text-align:left">Searches related to: <b>dogs</b></caption><tr><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=dog+breeds&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=1"><b>dog breeds</b></a><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=pictures+of+dogs&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=2"><b>pictures of</b> dogs</a><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=dogs+types&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=3">dogs <b>types</b></a><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=information+about+dogs&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=4"><b>information about</b> dogs</a><tr><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=dogs+health&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=5">dogs <b>health</b></a><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=dog+games&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=6"><b>dog games</b></a><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=dog+names&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=7"><b>dog names</b></a><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=adopt+a+dog&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=8"><b>adopt a dog</b></a></table>
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Parser Troubles.

Post by mark »

field Title varchar(40) />><a>\P=!</a>+

There is no "<a>" in the text. Perhaps you meant
field Title varchar(40) />><a=[^>]*>\P=!</a>+
gerry.odea
Posts: 98
Joined: Fri Sep 19, 2008 9:33 am

Parser Troubles.

Post by gerry.odea »

Yes that helped a bit now I'm stuck on this not matching up:

recdelim >><table class\="ts std"

for

<table class="ts std" id=brs style="padding:0 0 1em">
gerry.odea
Posts: 98
Joined: Fri Sep 19, 2008 9:33 am

Parser Troubles.

Post by gerry.odea »

do I need to add something between "ts std"?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Parser Troubles.

Post by mark »

Can there be multiple <table class="ts std" sections in the data? And would you want the first item from each of those? If not then maybe you don't want a recdelim at all.
Post Reply