Webinator and Baltic aplhabet

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Webinator and Baltic aplhabet

Post by Thunderstone »




Hi.

I've got a problem with webinator.
The thing is - my documents include charatcers which are not
default for english alphabet like 'ä' (ä) in HTML.
And because of that webinator can't find these words in the document.

What can I do about it ? Or is there a documen that describes this problem?
(I could not find any ... )

Thanks in advance ,
Janek Hiis
computerspecialist at "Süsteemiarenduse Partnerid" in Estonia.





User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Webinator and Baltic aplhabet

Post by Thunderstone »



Make sure you're using an index expression (-k option) that includes
those letters. The default expression, \alnum{2,30} , excludes 8-bit
characters since they are usually non-text.

Use a customized -k option such as

-k"[\alnum\x80-\xFF]{2,30}"

before performing a walk. If you've already performed a walk unindex
your database, and re-index it with a -k option such as

gw -unindex
gw -k"[\alnum\x80-\xFF]{2,30}" -index



Post Reply