hit markup, mminfo and proximities

tboyer
Posts: 68
Joined: Mon Aug 28, 2006 4:43 pm

hit markup, mminfo and proximities

Post by tboyer »

Hello friends,

How can I mark up search hits in a way that respects proximity queries?

<$text= "dog horse. dog cat. dog rabbit">
<$query= "dog cat w/sent">

<fmt "%mhH" $query $text> -- I get all dog and cat references regardless of proximity.
<$result=(mminfo( $qry, $page, 0, 0,1)) seems to respect proximity but I have to use the offset numbers to create my own markup function.

Do you have any suggestions about the best way to mark up respecting proximity?

Also, are there any convenient ways to get mminfo and fmt to ignore tags when marking up HTML? At the moment I'm splitting them out, turning them into whitespace characters so they'll be ignored, then combining them back in after markup. I think I can pull that off but it would be great if there's an easier way.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

hit markup, mminfo and proximities

Post by John »

Add an 'e' flag to the format string to use the exact query instead of highlighting all occurrences:

<fmt "%mehH" $query $text>
John Turnbull
Thunderstone Software
User avatar
Kai
Site Admin
Posts: 1271
Joined: Tue Apr 25, 2000 1:27 pm

hit markup, mminfo and proximities

Post by Kai »

Normally <fmt> hit markup will try to avoid existing HTML tags in the text, so that its own HTML markup tags do not break the HTML. The `n' subflag turns this off (markup as-is).

However, when the `H' (HTML-escape) code is the main code used (as in this case), the `n' subflag is also implicitly used (turn off tag-avoidance), because any original HTML tags will be HTML-escaped anyway by `H', so there will be no tag conflict. Use `s' instead of `H' to both preserve (no HTML-escape) original tags and have markup tags try to avoid original tags.
User avatar
Kai
Site Admin
Posts: 1271
Joined: Tue Apr 25, 2000 1:27 pm

hit markup, mminfo and proximities

Post by Kai »

Note also that texis Version 6 has <fmtcp queryfixupmode findsets>, which is on by default: this respects delimiters, and does a better job of highlighting all terms in the delimiters (the `e' flag may merely find enough terms to satisfy the query requirements, e.g. perhaps not all occurrences of a term).
tboyer
Posts: 68
Joined: Mon Aug 28, 2006 4:43 pm

hit markup, mminfo and proximities

Post by tboyer »

Awesome, thanks John. Also, any suggestions about marking up tagged documents? Best I've come up with is splitting on tags, replacing with some unusual combination of whitespace (\v\f\t\v), marking up and then putting tags back in. This is so proximity and phrase functions work in tagged material.
tboyer
Posts: 68
Joined: Mon Aug 28, 2006 4:43 pm

hit markup, mminfo and proximities

Post by tboyer »

sorry I see Kai has added suggestions re tags. Thanks very much!
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

hit markup, mminfo and proximities

Post by mark »

Are you trying to keep the tags for rendering so that a <b> in the document will cause bolding in the browser?
If so use <fmt "%mhs" ...> as mentioned by kai.

Or do you want to escape them so <b> in the document will show as <b> in the browser (<b>) in the source?
If so use <fmt "%mhH" ...>
tboyer
Posts: 68
Joined: Mon Aug 28, 2006 4:43 pm

hit markup, mminfo and proximities

Post by tboyer »

If the query is dog cat w/10 and I want to be sure the following is highlighted what do you recommend?

dog <a href ='http://www.cats.com'> cat. <a>

Is there a way I can make it ignore that tag in the character count for proximity?
tboyer
Posts: 68
Joined: Mon Aug 28, 2006 4:43 pm

hit markup, mminfo and proximities

Post by tboyer »

Yes Mark -- I'd like to preserve the tagging in the documents as much as possible but still hit-highlight them.
tboyer
Posts: 68
Joined: Mon Aug 28, 2006 4:43 pm

hit markup, mminfo and proximities

Post by tboyer »

If I use <fmt "%mes" $query $txt> the tags are unescaped, but I still have that problem with proximity length -- i.e. the tag counts toward the proximity's character count.

Is there any rarely used character that I could sandr in for the tag that would be ignored? That would only make the distance between dog and cat one character longer.I'm thinking maybe form feed or vertical tab but maybe there's something better.
Post Reply