Q: 8-bit characters

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Q: 8-bit characters

Post by Thunderstone »



Hallo!

My question is:
I live in Poland and in my language, polish, we use some specific
characters, which codes are larger than 128.
How can I make Webinator treat this characters as a letters?
Now they are seen as separators (like SPACE).
Is there any way to do this?

Marcin
--
___ __ Marcin Nowak_ __ * mailto:chomik@piast.t19.ds.pwr.wroc.pl *
/ _)/ )_ ___ _ __(_) / ) * http://www.t19.ds.pwr.wroc.pl/~chomik/ *
( (_/'_ ) . /' V / )/' ~) * IRC,ICQ-Chomik ** tel. 71 734475 w 184 *
\ _(_/ (_/\__(_/2_/(_/(_/\_) **** Wroclaw University of Technology ****


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Q: 8-bit characters

Post by Thunderstone »




The full context view should show everything as expected.

The abstract view will only show alphanumerics as defined by the
system's C library. Some systems don't pass their language setting
thru as expected. The, soon to be released, next version of webinator
will have more support for these systems.
You could also change the search script where it uses <abstract> to
generate your own style of abstract.

For searching, make sure you're using an index expression (-k option) that
includes these letters. The default expression, \alnum{2,30}, excludes 8-bit
characters since they are usually non-text.

Unindex your database, and re-index it with a -k option such as
-k"[\alnum\xA0-\xFF]{2,30}":

gw -unindex
gw -k"[\alnum\xA0-\xFF]{2,30}" -index



Post Reply