Thunderstone Support Forums

Posted: **Tue Oct 11, 2005 1:06 pm**

Hi,

it seems like French accented characters are giving me more trouble than anything else. I got the equivalence file to work (created a customized "eqvsusr" file), but when I include French words with accented characters in the list the backref.exe runs fine, but when the search does not pick up the equivalence.

For example, if I have in my eqvsusr.lst

montréal,montreal
grâce,grace,honor

the results will not include any of the equivalent words. I typed in the accented é,â in wordpad by using "alt+131" and "alt+133".

I also modified the $noiselist included in the search script, but it does not pick anything with accents. For example, I want the search script to treat "à" as a noise word. I tried
<$noiselist="à" "à" "\xC3\xA0" "\xE0">
<apicp noise $noiselist>

and it does not work.

Is there a special format for accented characters to be included in thesaurus and noiselist? Thanks.

Posted: **Tue Oct 11, 2005 2:05 pm**

Make sure that the accented characters are in the wordc and langc settings, otherwise the search terms will not be seen as "language" words to be processed against the thesaurus and noise list.

Posted: **Tue Oct 11, 2005 2:25 pm**

I'm confused - what are the wordc and langc settings and where can I find them? Thanks.

Posted: **Tue Oct 11, 2005 2:54 pm**

http://thunderstone.master.com/texis/ma ... ml?q=wordc

Also make sure your equivs include the character sets being used.

Posted: **Tue Oct 11, 2005 3:33 pm**

Thanks guys. I'm not sure if accented characters would be included in "\alpha", but I put the SQL statements in the <search> function:

<A NAME=search>
<local savenext saveindexcount>

<sql "set wordc='[\alnum]'"></sql>
<sql "set langc='[\alnum]'"></sql>
...

and I also included them in the <init> function. None of them worked. I'm not sure if the accented characters are a part of alpha characters, or if I had placed the SQL statements in the wrong place. I'm using UTF-8 as searching character sets. Thanks again for any help you could provide...

Posted: **Tue Oct 11, 2005 3:54 pm**

They should be in the fpar function like the other set's. You've made them more restrictive not less. Put space and - in langc as in the default case. Also try adding all the high bits to them both.

<sql "set wordc='[\alnum\x80-\xff'']'"></sql>
<sql "set langc='[\alnum\x80-\xff'' \-]'"></sql>

Make sure you include the utf-8 version of montréal in your equiv.

Posted: **Thu Oct 13, 2005 1:13 pm**

Thanks Mark!

I changed the settings to fpar, but I'm not sure what you mean by including the utf-8 version in my equiv. In my eqvsusr.lst, I have for example:

montréal,Montreal,Montréal,montreal

the accents are entered by wordpad using the alt+130 key, by including the utf-8 version do I have to translate the English words to utf-8 code, such as:

montr\xC3\A9al,montreal

or every single letter in utf-8 code?

Posted: **Thu Oct 13, 2005 1:16 pm**

Another question, after including the sql statements in my fpar function, I have my noiselist set to exclude the noise word "à", so I have

<$noiselist = "a" "à" "\xE0" "À" "\xC3\xA0" ...>

to include every possible format of the letter à, but when I do a search it still does not filter out the à. What am I doing wrong?

Posted: **Thu Oct 13, 2005 3:49 pm**

You can't use \x notation. You have to use the actual chars.
<$noiselist = "a" "à"> works for me.
You need to enter the words for noise and equivs in whatever form they are coming from the browser. You could use
{query=<fmt "%U" $query>}
to get an idea what's coming in. Or save them to a file with
<write append /tmp/queries><fmt "%s\n" $query></write>

Posted: **Fri Oct 14, 2005 3:45 pm**

Thanks Mark! I figured out the problem - our Java encoder was encoding the incoming utf-8 twice, so instead of putting "à" in the noiselist I have to put "Ã", then it works. Thank you for all your help!

Thunderstone Support Forums

Accented Noise words and synonyms

Accented Noise words and synonyms

Accented Noise words and synonyms

Accented Noise words and synonyms

Accented Noise words and synonyms

Accented Noise words and synonyms

Accented Noise words and synonyms

Accented Noise words and synonyms

Accented Noise words and synonyms

Accented Noise words and synonyms

Accented Noise words and synonyms