Page 1 of 1

non-english characters in texis

Posted: Fri May 20, 2005 5:49 pm
by jkj2001
We have a question regarding European character sets in texis. These aren't unicode characters we're referring to, but rather single-byte/ascii values.

Right now we index one of our fields using this statement:

tsql -q "set keepnoise='on';set delexp=0;set addexp='\punct{1,5}';set addexp='\a
lnum{1,99}';set addexp='>>\alpha{1,50},=\alpha{1,50}';create metamorph inverted
index myindex on mytable(FIELD01);"

When we load a record into the table with foreign characters (eg Confédération), and reindex as above, we're unable to search for that term and retrieve hits. However, if French text is pasted into our web browser interface and saved to the table, the text is searchable afterward.

Our search term in both instances reads "select * from mytable where FIELD01 like 'Confédération'"

Would you have any theories as to why the text is searchable in the latter instance but not the former? I realize we're a bit thin on the details here so feel free to ask for more info. We're using version 4.04.1067366033. Thanks!

non-english characters in texis

Posted: Fri May 20, 2005 10:36 pm
by John
It is probably the difference between indexed and unindexed search.

When creating the index you should include the high-bit characters. One simple method would be instead of \alnum use [\alnum\x80-\xff] in the addexps.