non-english characters in texis

Post Reply
jkj2001
Posts: 142
Joined: Fri Mar 29, 2002 1:39 pm

non-english characters in texis

Post by jkj2001 »

We have a question regarding European character sets in texis. These aren't unicode characters we're referring to, but rather single-byte/ascii values.

Right now we index one of our fields using this statement:

tsql -q "set keepnoise='on';set delexp=0;set addexp='\punct{1,5}';set addexp='\a
lnum{1,99}';set addexp='>>\alpha{1,50},=\alpha{1,50}';create metamorph inverted
index myindex on mytable(FIELD01);"

When we load a record into the table with foreign characters (eg Confédération), and reindex as above, we're unable to search for that term and retrieve hits. However, if French text is pasted into our web browser interface and saved to the table, the text is searchable afterward.

Our search term in both instances reads "select * from mytable where FIELD01 like 'Confédération'"

Would you have any theories as to why the text is searchable in the latter instance but not the former? I realize we're a bit thin on the details here so feel free to ask for more info. We're using version 4.04.1067366033. Thanks!
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

non-english characters in texis

Post by John »

It is probably the difference between indexed and unindexed search.

When creating the index you should include the high-bit characters. One simple method would be instead of \alnum use [\alnum\x80-\xff] in the addexps.
John Turnbull
Thunderstone Software
Post Reply