Page 1 of 1

RE(2): RE(2): Cyrillic HTML pages

Posted: Wed Sep 18, 1996 11:46 am
by Thunderstone


Hi Bart and All,

Actually, I can't think of any other reason you'd want to pre-process
the HTML. My site is just starting to put other languages up. The Webinator
seems to work okay, but it substitutes "E" for É I think this
is appropriate for people with English keyboards, but the people searching
our site may have foreign keyboards and actually be able to type an E
with an acute accent.

I was just playing with the idea of converting the &Eacute to ASCII 201
decimal. If this is a good idea, maybe you could have a toggle switch
built into gw? I was planning on indexing the foreign languages into
a separate database, so I could flip on the "Convert to ASCII 192-255"
switch and not even need an external plugin.

I'm not really knowledgeable about how foreign language searching should
work. Does this sound like a workable approach?

-Kevin McCarthy
AMD
kevin.mccarthy@amd.com

---- webinator(a)thunderstone.com's Message ----

Egads! This is getting complex.

The thought behind the Plugins was/is:


+ yes--> HTML Parser ---\
Fetched URL -> Webinator -> HTML? | ->Database
+ no---> Mime Plugin ---/

There isn't currently the structure for preprocessing the HTML
with a Plugin. I suppose it could be done though.

Are there other uses for this beyond language translation?

Thanks,
Bart