Umlaut

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Umlaut

Post by Thunderstone »



In Sweden we use the letters =E5 =E4 =F6 (=C5=C4=D6) In html it is writte=
n å
and ä and ö

I e the word "p=E5" is written "på" and "f=F6r" is "för"

The problem is that webinator doesn't find these words. Not even if one
decides not to use proper html code and writes as the words are seen.
And these letters are very common here.


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Umlaut

Post by Thunderstone »




Make sure you're using an index expression (-k option) that includes
these letters. The default expression, \alnum{2,30}, excludes 8-bit
characters since they are usually non-text.

Unindex your database, and re-index it with a -k option such as
-k'[\alnum\xA0-\xFF]{2,30}':

gw -unindex
gw -k'[\alnum\xA0-\xFF]{2,30}' -index


-Kai


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Umlaut

Post by Thunderstone »



Thank you for your quick and accurate answer. Now my searches functions
much better.


/stefan Andersson
Chalmers university of Technology

Kai Getrost wrote:



Post Reply