Page 1 of 2
Umlaut
Posted: Tue Nov 12, 2002 10:20 am
by ssc.blum
Hello!
I want Webinator to index words which contain german Umlauts like ä, ö or ü. Does anyone know how to do this?
Thanks.
Umlaut
Posted: Tue Nov 12, 2002 10:46 am
by John
Yes, there are a couple of ways you can do it, depending on which version of Webinator you have, and your operating system. If you set your locale to the German locale it will automatically index words with umlauts, as well as search them case-insensitively. In Vortex scripts it can be accomplished with something similar to:
<SQL "set locale='de'"></SQL>
When that is not an option you can change the index expression to also index the high-bit characters. An appropriate expression is:
[\alnum\x80-\xff]{2,30}
which can be set via gw -k or <SQL "set addexp">
Umlaut
Posted: Thu Nov 14, 2002 8:00 am
by ssc.blum
John: Thank you for your fast response.
Yes, that was the problem. I had to set the locales to german locale and change the index expression. Unfortunately the Webinator is indexing only some words with umlauts. The others aren't indexed. With non-umlaut-words I doesn't have any problems.
Umlaut
Posted: Thu Nov 14, 2002 9:29 am
by mark
Please change your handle back to something that's not designed to look like someone else's (eg someoneelse's handle with a trailing underscore) as you've done.
Umlaut
Posted: Thu Nov 14, 2002 9:33 am
by mark
Did you unindex before or rewalk clean with the new expression?
Can you provide an example page where it does not index the umlauts?
Umlaut
Posted: Thu Nov 14, 2002 10:18 am
by ssc.blum
Ok, back to my old handle.

The script that is used deletes the whole database before creating a new one.
The weird thing about this, is that for example in one text block:
"es muss gewährleistet sein, dass ... , d.h. Änderungen müssen sowohl im ..."
"gewährleistet" is found, "Änderungen" and "müssen" is not found.
Umlaut
Posted: Thu Nov 14, 2002 10:50 am
by mark
Did you set the locale when searching as well as when creating the index?
I'm able to index and find each of those terms with just changing the index expression the locale or both. Are you getting any error messages within the source of the results page?
Umlaut
Posted: Thu Nov 14, 2002 12:14 pm
by ssc.blum
Yes, I have set the locale in the script and there are no error messages at all.
Umlaut
Posted: Thu Nov 14, 2002 2:25 pm
by mark
What're your gw and texis versions?
gw -version
texis -version
What's your gw command line?
Umlaut
Posted: Fri Nov 15, 2002 4:49 am
by ssc.blum
Webinator WWW Site Indexer Version 2.56 (Commercial)
Copyright(c) 1995,1996,1997,1998,1999,2000 Thunderstone EPI Inc.
Release: 20010117
Texis Web Script (Vortex) Copyright (c) 1996-2001 Thunderstone - EPI, Inc.
Commercial Webinator Version 3.01.979767712 of Jan 17, 2001 (powerpc-ibm-aix4.3.2.0)
gw -k[\alnum\x00-\xFF]{2,30} -Usuchmasc.dc -Psuchmasc.dc -dDEUTSCH
But I don't know what the "-U" and "-P" options are good for, cause I didn't made this command.