National Characters

User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

National Characters

Post by Thunderstone »



Has anybody managed to include support for national characters?
By support I mean that the indexer will retrieve and index these
characters,
and that the will be searchable from the search form. The standard, out
of the box webinator will not recognize these characters at all.


These creatures appear in several forms:

Plain 8 bit ISO-latin (no example )
As html-entities: å
Decimal coding å

All these will generate a lowercase a with a little ring above.
Thanx in advance,
RS

(o o)
+------------oOO--(_)--OOo-------------+
| Richard Soderberg, MD, PADI DM, |
| Systems analyst, Diving physician |
| http://www.mic.ki.se/rs/rs.html |
| ----- |
| The Karolinska institute library |
| +46 8 728 80 00 |
| PO BOX 200, 171 77 Stockholm, Sweden |
| http://www.kibic.ki.se |
+--------------------------------------+


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

National Characters

Post by Thunderstone »




A search of the archive at:
http://www.thunderstone.com/texis/webinator/listproc/
for:
National Characters
reveals:
Make sure you're using an index expression (-k option) that includes
these letters. The default expression, \alnum{2,30}, excludes 8-bit
characters since they are usually non-text.

Unindex your database, and re-index it with a -k option such as
-k'[\alnum\xA0-\xFF]{2,30}':

gw -unindex
gw -k'[\alnum\xA0-\xFF]{2,30}' -index