Page 1 of 1
Query markup in search results
Posted: Fri Mar 15, 2013 10:55 am
by josmani
Upgraded to Texis version 6 and noticed an issue in query mark-up in our search results page.
Searching for term "Tadic" while finds results with "Tadić", it does not highlight the query in search results (title and abstract). Yet it does highlight in document body. On the other hand if I search for "Tadić" it works fine.
I am using "%mbH" on the search result and "%mhs" on doc view.
Would appreciate any help!
Query markup in search results
Posted: Fri Mar 15, 2013 11:56 am
by Kai
Both `%mbH' and `%mbs' should highlight `Tadić' given the query `Tadic', and default textsearchmode (i.e. ignorediacritics set). Have you verified that the source text passed to <fmt>/<mm> is identical in both instances, i.e. that it is `Tadić' (with the actual UTF-8 character U+0107) and not `Tadić' (with the HTML entity that will not match)? (Normally the crawls will have already converted the entities to UTF-8 for this reason.)
Query markup in search results
Posted: Fri Mar 15, 2013 12:10 pm
by josmani
Yes the text is unicode in both places. (In fact the message board escaped the unicode character).
It's really bizarre as I've not modified the textsearchmode and if you do the search with the unicode character it actually highlights the terms without the accents.
Query markup in search results
Posted: Fri Mar 15, 2013 12:59 pm
by Kai
Which exact version of Texis is this (texis -version)? Have you made any other changes to the stock search scripts?
Query markup in search results
Posted: Fri Mar 15, 2013 1:11 pm
by josmani
Commercial Version 6.00.1282528740 20100823 (i686-intel-winnt-64-32)
In fact we have been building on the search script for the last ten years and we recently decided to migrate to unicode/v6.
I wanted to make sure I am not missing any thing (a flag/function).
Query markup in search results
Posted: Fri Mar 15, 2013 1:55 pm
by Kai
Try crawling and searching with the stock v6 scripts; those should highlight properly and you can then adapt that highlight/markup code back to your scripts.
Query markup in search results
Posted: Sat Mar 16, 2013 2:20 pm
by josmani
I've isolated the problem in one line in the search script. I am using minwordlen=4 for suffix proc and when I remove the line the problem disappears but I lose suffix proc completely.
I also noticed that if I change it to 3 it still works.