Using REX

Faiz
Posts: 109
Joined: Wed Jan 10, 2001 1:29 pm

Using REX

Post by Faiz »

I was doing the same thing as shown in the previous postings of this topic. I had taken the raw document of a URL and was trying to find out text between "DisplayFirstCollInfo('" and "')", but I am getting only the last occurence of the text.
On displaying the raw document, I could see all the "<" and ">" are replaced by ">" and "<". Is it because of that?
This is what I did,
<fetch $u1>
<urlinfo rawdoc>
<$rawdoc=$ret>
<capture>
<rex ">>DisplayFirstColInfo('\P=!')+" $rawdoc>
<$firstcol=$ret>
<loop $firstcol>
$firstcol<fmt "\n">
</loop>
</capture>
<$text=$ret>
Here $text returns only one value.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Using REX

Post by mark »

The escapement of < and > on display is normal and required for HTML documents. The actual data is not affected.

Does the page in question actually have more than one occurrence of the pattern (check closely)?

What has changed since last April when you said all was well?
Faiz
Posts: 109
Joined: Wed Jan 10, 2001 1:29 pm

Using REX

Post by Faiz »

The TEXIS version has changed since last April. I was using TEXIS 3 then and now I am using TEXIS 4.0. I dont know why but adding ROW to REX did the trick. I should have done it yesterday.
<fetch $u1>
<urlinfo rawdoc>
<$rawdoc=$ret>
<capture>
<rex ROW ">>DisplayFirstColInfo('\P=!')+" $rawdoc>
$ret
</rex>
</capture>
<$text=$ret>
This code gives me all the values.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Using REX

Post by mark »

It doesn't quite make sense that that would make a difference. But anyhow, another way to get the same effect is

<rex ">>DisplayFirstColInfo('\P=!')+" $rawdoc>
<sum "%s
" $ret>
<$text=$ret>

Out of curiosity, what are you doing with $text once you have it all concatenated together like that?
Faiz
Posts: 109
Joined: Wed Jan 10, 2001 1:29 pm

Using REX

Post by Faiz »

Once $text is populated I insert the data in a table so that it is searcheable. That code is for LOTUS QUICKPLACE Urls, because they are not like any html page. All the text that is displayed on the web page are written through javascript and the TEXIS crawler ignores javascript. On viewing the source of the document I figured out that the information contained in "DisplayFirstColInfo('" and "')" are important and needs to be searched. So, this is how I am doing it. If you can tell me an alternative way to do it that would be great.

Regards,
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Using REX

Post by mark »

That sounds like a spiffy solution to me.
Post Reply