gerry.odea
Posts: 98 Joined: Fri Sep 19, 2008 9:33 am
Post
by gerry.odea » Tue May 19, 2009 2:00 pm
I'm trying to build a paser for this:
<a href="/search?hl=en&q=sports+cars&revid=238886396&ei=7usSSvz_BZiu8QSH7aSQBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=1"><b>sports</b> cars</a>
but I don't want to base it on
>><a=!href\=+href\=="?[^" >]+[^>]*>=[^<\x0a]+</a>=\x0a=
because their will be other <a href></a> that will be pulled in. I want to only pull in the <a href>'s that have revisions_inline in the url string of the <a href>
mark
Site Admin
Posts: 5519 Joined: Tue Apr 25, 2000 6:56 pm
Post
by mark » Tue May 19, 2009 2:47 pm
Most reliable might be to do another pass over the list returned by your first expression.
<rex ".*>>revisions_inline=.*" $ret>
mark
Site Admin
Posts: 5519 Joined: Tue Apr 25, 2000 6:56 pm
Post
by mark » Tue May 19, 2009 2:49 pm
Or you could use <fetch> to parse for you, and use <urlinfo links> instead of your expression to get the list of urls on the page. Then rex for revisions_inline in that list.
gerry.odea
Posts: 98 Joined: Fri Sep 19, 2008 9:33 am
Post
by gerry.odea » Tue May 19, 2009 3:13 pm
I'm doing this instead. But it won't bring in the title. Can you tell me why?
<a name = GETRELATED>
<$searchurl = "
http://www.domain.com/search?q=xyzzy ">
<$imports='
recdelim >><table class\="ts std"
firstmatch
field Title varchar(40) />><a>\P=!</a>+
field Title2 varchar(40) />><b>\P=!</b>+
'>
</a>
<table class="ts std" id=brs style="padding:0 0 1em"><caption class="med nobr" style="padding-bottom:6px;text-align:left">Searches related to: <b>dogs</b></caption><tr><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=dog+breeds&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=1"><b>dog breeds</b></a><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=pictures+of+dogs&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=2"><b>pictures of</b> dogs</a><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=dogs+types&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=3">dogs <b>types</b></a><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=information+about+dogs&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=4"><b>information about</b> dogs</a><tr><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=dogs+health&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=5">dogs <b>health</b></a><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=dog+games&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=6"><b>dog games</b></a><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=dog+names&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=7"><b>dog names</b></a><td style="padding:0 0 7px;padding-right:34px;vertical-align:top"><a href="/search?hl=en&q=adopt+a+dog&revid=416594276&ei=eQQTSumHLeewtgfqvfWSBA&sa=X&oi=revisions_inline&resnum=0&ct=broad-revision&cd=8"><b>adopt a dog</b></a></table>
mark
Site Admin
Posts: 5519 Joined: Tue Apr 25, 2000 6:56 pm
Post
by mark » Tue May 19, 2009 3:21 pm
field Title varchar(40) />><a>\P=!</a>+
There is no "<a>" in the text. Perhaps you meant
field Title varchar(40) />><a=[^>]*>\P=!</a>+
gerry.odea
Posts: 98 Joined: Fri Sep 19, 2008 9:33 am
Post
by gerry.odea » Tue May 19, 2009 3:23 pm
Yes that helped a bit now I'm stuck on this not matching up:
recdelim >><table class\="ts std"
for
<table class="ts std" id=brs style="padding:0 0 1em">
gerry.odea
Posts: 98 Joined: Fri Sep 19, 2008 9:33 am
Post
by gerry.odea » Tue May 19, 2009 3:24 pm
do I need to add something between "ts std"?
mark
Site Admin
Posts: 5519 Joined: Tue Apr 25, 2000 6:56 pm
Post
by mark » Tue May 19, 2009 4:39 pm
Can there be multiple <table class="ts std" sections in the data? And would you want the first item from each of those? If not then maybe you don't want a recdelim at all.