Hi,
I want to grab everything which falls between
First(' and '). This means, if there is First('abcd') and First('xyz'), then I want to grab the text abcd and xyz. Is it possible to do with rex? I can have many First('124') on the document. All i need is the text between (' and ').
Thanx.
I have to index some pages which are dynamically generated from Lotus Notes database (called quickplace). When I do a view source, I see that the text displayed on the page is actually written in Javascript. Now, the crawler is unable to grab the text as they are written to the browser from Javascript. But they follow a certain pattern. So, what I am trying to do, is grab the text from the Javascript variables. An example of what the source looks like is,
<script>DisplayFirstColInfo('Six Sigma');ge ( '' );</script></A></td><td class=h-foldercompact-text valign=top>amy sombut</td> <td class=h-foldercompact-text valign=top>02/14/2001</td> </tr><script>gh('Several slides providing an overview of Six Sigma; process maps of how to incorporate Six Sigma tools into DMAIC; sample project.\n ',3,1,0,0,0, 1)</script><script>gb(0, 3)</script><tr class=h-folderItem-bg valign=top><script>WriteResponseBar(0,0)</script><td class=h-folderitem-text valign=top><br></td><td class=h-folderitem-text valign=top><script>document.write (GenerateQPObjURLAnchorTag("h_D7BE94FAB94220FB852569F3004C607D", "F42B6E7B6B285ECE852569F3004CC722", "", ""));</script><script>DisplayFirstColInfo('The Methodology Outlined');ge ( '' );</script></A></td><td class=h-foldercompact-text valign=top>amy sombut</td> <td class=h-foldercompact-text valign=top>02/14/2001</td> </tr><script>gh('\n\nThis is still a work-in-process, however, it should serve as a guide when working with IT professionals and scoping out projects.\n\nhttp://webd01.corporate.ge.com/projectplan/dev_meth.html',3,1,0,0,0, 1)</script>
In the above example, I want the text between DisplayFirstColInfo(' and '). In the first line, the text i want will be 'Six Sigma' and one more after that, 'The Methodology Outlined' (which is dynamically generated). Is it possible to get it using rex or something? How can I give the start limiter and end limiter? Or shall I have to write an awk script to find out?
I hope I am able to make it clear this time.