Using REX

Faiz · Post by **Faiz** » Tue Jan 22, 2002 6:15 pm

I was doing the same thing as shown in the previous postings of this topic. I had taken the raw document of a URL and was trying to find out text between "DisplayFirstCollInfo('" and "')", but I am getting only the last occurence of the text.
On displaying the raw document, I could see all the "<" and ">" are replaced by ">" and "<". Is it because of that?
This is what I did,
<fetch $u1>
<urlinfo rawdoc>
<$rawdoc=$ret>
<capture>
<rex ">>DisplayFirstColInfo('\P=!')+" $rawdoc>
<$firstcol=$ret>
<loop $firstcol>
$firstcol<fmt "\n">
</loop>
</capture>
<$text=$ret>
Here $text returns only one value.

Post by **mark** » Tue Jan 22, 2002 9:55 pm

The escapement of < and > on display is normal and required for HTML documents. The actual data is not affected.

Does the page in question actually have more than one occurrence of the pattern (check closely)?

What has changed since last April when you said all was well?

Faiz · Post by **Faiz** » Wed Jan 23, 2002 9:56 am

The TEXIS version has changed since last April. I was using TEXIS 3 then and now I am using TEXIS 4.0. I dont know why but adding ROW to REX did the trick. I should have done it yesterday.
<fetch $u1>
<urlinfo rawdoc>
<$rawdoc=$ret>
<capture>
<rex ROW ">>DisplayFirstColInfo('\P=!')+" $rawdoc>
$ret
</rex>
</capture>
<$text=$ret>
This code gives me all the values.

Post by **mark** » Thu Jan 24, 2002 12:24 pm

It doesn't quite make sense that that would make a difference. But anyhow, another way to get the same effect is

<rex ">>DisplayFirstColInfo('\P=!')+" $rawdoc>
<sum "%s
" $ret>
<$text=$ret>

Out of curiosity, what are you doing with $text once you have it all concatenated together like that?

Faiz · Post by **Faiz** » Thu Jan 24, 2002 12:42 pm

Once $text is populated I insert the data in a table so that it is searcheable. That code is for LOTUS QUICKPLACE Urls, because they are not like any html page. All the text that is displayed on the web page are written through javascript and the TEXIS crawler ignores javascript. On viewing the source of the document I figured out that the information contained in "DisplayFirstColInfo('" and "')" are important and needs to be searched. So, this is how I am doing it. If you can tell me an alternative way to do it that would be great.

Regards,

Post by **mark** » Thu Jan 24, 2002 12:49 pm

That sounds like a spiffy solution to me.