Strstr sometimes won't find HTML tags

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Strstr sometimes won't find HTML tags

Post by Thunderstone »



It seems like strstr refuses to acknowledge HTML tags when called
within <MM> tags. The following script works exactly as I would
expect to - given a document divided into sections (in reverse
order), with each section ended by the section number embedded
in gobbledy-gook, it returns a list of section numbers containing
hit markups. Following that is a code fragment which attempts
to do the same thing on markup generated with MM, and the
strstr on "<A NAME=hit" always returns -1 (although if I replace
"<A NAME=hit" with the matching text from the section, it works
as I expect).

I suppose this may be a feature, under the assumption that if
you're searching the marked-up body you usually want to ignore
the markup... If so, can someone offer an alternative to
accomplish what I want (which is to get a list of the sections
that had hits under the query).

<SCRIPT LANGUAGE=vortex>
<A NAME=main>
<HTML><HEAD><TITLE>Test script</TITLE></HEAD>
<BODY>
<$body = "This section contains the word
<A NAME=hit1 HREF=#hit2>music</A> with markup.
~!@#5#@!~
This section also has the word <A NAME=hit2 HREF=#hit3>music</A>
marked up.
~!@#4#@!~
The magic word is not in this section.
~!@#3#@!~
Here's the word <A NAME=hit3 HREF=#hit4>music</A> again, and
here's the word <A NAME=hit4 HREF=#hit5>music</A> once more.
~!@#2#@!~
A final message with the word <A NAME=hit7 HREF=#hit8>music</A>.
~!@#1#@!~">
<split "#@!~" $body>
<$msgList = $ret>
<strstr "<A NAME=hit" $msgList>
<$hitList = $ret>
<strstr "~!@#" $msgList>
<$idList = $ret>
<loop $msgList $idList $hitList>
<IF $hitList neq -1>
<substr $msgList $idList -1>
<substr $ret 4 -1>
<$results = ($ret + "," + $results)>
</IF>
</loop>
<!--- Prints "1,2,4,5," --->
$results
</BODY></HTML>
</A>
</SCRIPT>

<fmtcp query "%mhH" $mmfmt></mm>
<mm>
<!--- Note that the marked-up body doesn't have the "~" at the
end of this string, or the "#" at the end of the other
marker string below - another issue of concern... --->
<split "#@!" $Body>
<$msgList = $ret>
<!--- The printing of $msgList below shows it does contain
the string "<A NAME=hit" in the first section (among
others), but $hitList is full of -1's --->
<strstr "<A NAME=hit" $msgList>
<$hitList = $ret>
$msgList<BR>$hitList<BR>
</mm>
<strstr "~!@" $msgList>
<$idList = $ret>
<loop $msgList $idList $hitList>
<IF $hitList neq -1>
<substr $msgList $idList -1>
<substr $ret 3 -1>
<$results = ($ret + "," + $results)>
</IF>
</loop>
$results<BR>




User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Strstr sometimes won't find HTML tags

Post by Thunderstone »




Because those HTML markup tags aren't in the variables. You're
misunderstanding how <mm> works; it marks up _only_ the printed output
of variables. It does not affect the value of the variables
themselves, ie. their value as passed to functions. That's why you
see the markup when you print the variable, but not when you pass it
to <strstr>.

You need to <CAPTURE> the output of <mm> and search that. Eg.
something like:

<CAPTURE><mm>$Body</mm></CAPTURE>
<$markup = $ret>
<split "#@\!~" $markup>
...

Also note that <split> expects a REX expression, not a simple string,
as its delimiter. Thus `!' has a different meaning and you need to
escape it; this is the cause of one of your other problems. See <split> at
http://www.thunderstone.com/vortexman/node98.html and REX syntax at
http://www.thunderstone.com/vortexman/node92.html.

-Kai


Post Reply