What metamorph index needs to be created for multibyte characters????

Post Reply
varsha.himatsinghani
Posts: 2
Joined: Tue Feb 08, 2011 6:59 am

What metamorph index needs to be created for multibyte characters????

Post by varsha.himatsinghani »

How do I execute a query with multibyte characters. Eg :
select * from person_profile where person_name='multibyte characters'

I have set the following expressions while indexing.
What else needs to be set for multibyte characters.

set delexp=0;
set addexp='[\alnum\x80-\xff]{1,70}';
set addexp='[\alnum\x80-\xff.]{1,70}>>[.''\-_]=[\alnum\x80-\xff.]{1,70}';
set addexp='\digit=[\digit\.,+]{1,30}';
set addexp='>>[\xc0-\xfd]=[\x80-\xbf]+';
set addexp='>>\alnum=[\alnum\x5c\x5f\x2d\x26\x23\x40\x2e\x25\x2c]{1,30}';
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

What metamorph index needs to be created for multibyte characters????

Post by John »

The index expressions look correct for a multibyte string, however your SQL is using an "=" instead of LIKE. Metamorph indexes are used for LIKE queries, standard indexes for equals. Also make sure the character set you used for storage matches the query character set.
John Turnbull
Thunderstone Software
varsha.himatsinghani
Posts: 2
Joined: Tue Feb 08, 2011 6:59 am

What metamorph index needs to be created for multibyte characters????

Post by varsha.himatsinghani »

I have used the "=" sign by mistake.

The correct example would be :
select * from person_profile where person_name likep likepquery

where "likepquery" has Japanese cahracters.

which type of character set should be set for these type of multibyte characters ???
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

What metamorph index needs to be created for multibyte characters????

Post by mark »

You probably want UTF-8 for storage. Then you just have to ensure that the query coming from the browser is also UTF-8. There's no guarantee that it will be but chances are that if you're searching for Japanese it will be UTF-8. Here's one way to check the query:

<rex row "\?" $query></rex><$x=$loop> <!-- count initial `?' -->
<strfmt "%!hV" $query> <!-- attempt UTF-8 decode -->
<rex row "\?" $ret></rex> <!-- count `?' again -->
<if $loop gt $x> <!-- more means not UTF-8 -->
<strfmt "%V" $query><$query=$ret> <!-- convert to UTF-8 -->
</if>
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

What metamorph index needs to be created for multibyte characters????

Post by John »

The easiest is for the text to be stored in UTF-8. You already have the index expression to index UTF-8 characters. One issue may be the lack of word boundaries in the text. You can test that by doing a search with a space between each Japanese character. If that works then you can look at the Webinator search scripts which have functions to transform a query into a phrased query of the individual characters.
John Turnbull
Thunderstone Software
Post Reply