Upgraded to Texis version 6 and noticed an issue in query mark-up in our search results page.
Searching for term "Tadic" while finds results with "Tadić", it does not highlight the query in search results (title and abstract). Yet it does highlight in document body. On the other hand if I search for "Tadić" it works fine.
I am using "%mbH" on the search result and "%mhs" on doc view.
Both `%mbH' and `%mbs' should highlight `Tadić' given the query `Tadic', and default textsearchmode (i.e. ignorediacritics set). Have you verified that the source text passed to <fmt>/<mm> is identical in both instances, i.e. that it is `Tadić' (with the actual UTF-8 character U+0107) and not `Tadić' (with the HTML entity that will not match)? (Normally the crawls will have already converted the entities to UTF-8 for this reason.)
Yes the text is unicode in both places. (In fact the message board escaped the unicode character).
It's really bizarre as I've not modified the textsearchmode and if you do the search with the unicode character it actually highlights the terms without the accents.
Try crawling and searching with the stock v6 scripts; those should highlight properly and you can then adapt that highlight/markup code back to your scripts.
I've isolated the problem in one line in the search script. I am using minwordlen=4 for suffix proc and when I remove the line the problem disappears but I lose suffix proc completely.
I also noticed that if I change it to 3 it still works.