Umlaut

ssc.blum
Posts: 8
Joined: Tue Nov 12, 2002 10:02 am

Umlaut

Post by ssc.blum »

Hello!

I want Webinator to index words which contain german Umlauts like ä, ö or ü. Does anyone know how to do this?

Thanks.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Umlaut

Post by John »

Yes, there are a couple of ways you can do it, depending on which version of Webinator you have, and your operating system. If you set your locale to the German locale it will automatically index words with umlauts, as well as search them case-insensitively. In Vortex scripts it can be accomplished with something similar to:

<SQL "set locale='de'"></SQL>

When that is not an option you can change the index expression to also index the high-bit characters. An appropriate expression is:

[\alnum\x80-\xff]{2,30}

which can be set via gw -k or <SQL "set addexp">
John Turnbull
Thunderstone Software
ssc.blum
Posts: 8
Joined: Tue Nov 12, 2002 10:02 am

Umlaut

Post by ssc.blum »

John: Thank you for your fast response.
Yes, that was the problem. I had to set the locales to german locale and change the index expression. Unfortunately the Webinator is indexing only some words with umlauts. The others aren't indexed. With non-umlaut-words I doesn't have any problems.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Umlaut

Post by mark »

Please change your handle back to something that's not designed to look like someone else's (eg someoneelse's handle with a trailing underscore) as you've done.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Umlaut

Post by mark »

Did you unindex before or rewalk clean with the new expression?
Can you provide an example page where it does not index the umlauts?
ssc.blum
Posts: 8
Joined: Tue Nov 12, 2002 10:02 am

Umlaut

Post by ssc.blum »

Ok, back to my old handle. :)
The script that is used deletes the whole database before creating a new one.
The weird thing about this, is that for example in one text block:

"es muss gewährleistet sein, dass ... , d.h. Änderungen müssen sowohl im ..."

"gewährleistet" is found, "Änderungen" and "müssen" is not found.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Umlaut

Post by mark »

Did you set the locale when searching as well as when creating the index?

I'm able to index and find each of those terms with just changing the index expression the locale or both. Are you getting any error messages within the source of the results page?
ssc.blum
Posts: 8
Joined: Tue Nov 12, 2002 10:02 am

Umlaut

Post by ssc.blum »

Yes, I have set the locale in the script and there are no error messages at all.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Umlaut

Post by mark »

What're your gw and texis versions?
gw -version
texis -version

What's your gw command line?
ssc.blum
Posts: 8
Joined: Tue Nov 12, 2002 10:02 am

Umlaut

Post by ssc.blum »

Webinator WWW Site Indexer Version 2.56 (Commercial)
Copyright(c) 1995,1996,1997,1998,1999,2000 Thunderstone EPI Inc.
Release: 20010117

Texis Web Script (Vortex) Copyright (c) 1996-2001 Thunderstone - EPI, Inc.
Commercial Webinator Version 3.01.979767712 of Jan 17, 2001 (powerpc-ibm-aix4.3.2.0)

gw -k[\alnum\x00-\xFF]{2,30} -Usuchmasc.dc -Psuchmasc.dc -dDEUTSCH

But I don't know what the "-U" and "-P" options are good for, cause I didn't made this command.
Post Reply