How can I mark up search hits in a way that respects proximity queries?
<$text= "dog horse. dog cat. dog rabbit">
<$query= "dog cat w/sent">
<fmt "%mhH" $query $text> -- I get all dog and cat references regardless of proximity.
<$result=(mminfo( $qry, $page, 0, 0,1)) seems to respect proximity but I have to use the offset numbers to create my own markup function.
Do you have any suggestions about the best way to mark up respecting proximity?
Also, are there any convenient ways to get mminfo and fmt to ignore tags when marking up HTML? At the moment I'm splitting them out, turning them into whitespace characters so they'll be ignored, then combining them back in after markup. I think I can pull that off but it would be great if there's an easier way.
Normally <fmt> hit markup will try to avoid existing HTML tags in the text, so that its own HTML markup tags do not break the HTML. The `n' subflag turns this off (markup as-is).
However, when the `H' (HTML-escape) code is the main code used (as in this case), the `n' subflag is also implicitly used (turn off tag-avoidance), because any original HTML tags will be HTML-escaped anyway by `H', so there will be no tag conflict. Use `s' instead of `H' to both preserve (no HTML-escape) original tags and have markup tags try to avoid original tags.
Note also that texis Version 6 has <fmtcp queryfixupmode findsets>, which is on by default: this respects delimiters, and does a better job of highlighting all terms in the delimiters (the `e' flag may merely find enough terms to satisfy the query requirements, e.g. perhaps not all occurrences of a term).
Awesome, thanks John. Also, any suggestions about marking up tagged documents? Best I've come up with is splitting on tags, replacing with some unusual combination of whitespace (\v\f\t\v), marking up and then putting tags back in. This is so proximity and phrase functions work in tagged material.
Are you trying to keep the tags for rendering so that a <b> in the document will cause bolding in the browser?
If so use <fmt "%mhs" ...> as mentioned by kai.
Or do you want to escape them so <b> in the document will show as <b> in the browser (<b>) in the source?
If so use <fmt "%mhH" ...>
If I use <fmt "%mes" $query $txt> the tags are unescaped, but I still have that problem with proximity length -- i.e. the tag counts toward the proximity's character count.
Is there any rarely used character that I could sandr in for the tag that would be ignored? That would only make the distance between dog and cat one character longer.I'm thinking maybe form feed or vertical tab but maybe there's something better.