I am replacing the Texis thesaurus file with a smaller/more specific file and use backref to compile the new file for use in the site search. While this works perfectly with our English language website, I was trying to recreate the same nationalities for our Cyrillic (Unicode) version and I am not having any success.
The documentation mentions and ASCII equivalence file. Is there a way I can do this in Unicode?
Case/diacritic insensitivity with Unicode (UTF-8) words in equivalences -- either explicitly via parenthetical syntax in the query, or via a thesaurus -- are not yet supported in Texis; this is planned for a future release (no target date yet). Only single-word sets are currently UTF-8 case/diacritic-insensitive. UTF-8 in equivs will probably be mangled.
For a single-set search, you might be able to emulate the equiv by manually translating to a zero-intersect query, e.g. translate `(car,auto,vehicle)' to `car auto vehicle @0', where any of those words could be UTF-8. But of course you'd have to then look up and map the equiv yourself, and any other multi-word sets in the same query would have to be ASCII parenthetical or equiv sets (with an explicit `+' prefix to require them, to prevent grouping with `@0').