Page 1 of 1

How to convert characters

Posted: Thu Oct 25, 2001 11:43 am
by chand012
I want to convert characters with diacritical marks to their closest keyboard equivalents, so that searches without the diacritics will find the information. However, what I thought would work isn't working. For example, if I try to convert é to e in string $str, I use

<sandr "\x82" "e" $str>

but $ret is the same as $str. My ASCII chart lists 82 as the hex value for é. Is there something wrong with my method, or is there a mismatch between my ASCII chart (http://www.cdrummond.qc.ca/cegep/inform ... /ascii.htm) and Texis's?

Thanks,
David

How to convert characters

Posted: Thu Oct 25, 2001 12:24 pm
by chand012
I think I may have found it. I realized that I was using the PC-DOS extended ASCII set, but our system is Solaris. I think I've found what I need in /usr/pub/iso, so I'll try that.

How to convert characters

Posted: Thu Oct 25, 2001 12:56 pm
by chand012
The hex values in /usr/pub/iso appear to work.

How to convert characters

Posted: Thu Oct 25, 2001 3:49 pm
by Kai
You can also use <fetch> with 8-bit HTML turned off:

<urlcp 8bithtml off>
<fetch "http://localhost/foo.html" $str>
<urltext>

This <fetch> with a 2nd argument will use $str as the raw HTML, instead of actually fetching it from http://localhost/foo.html (it is important that that URL end in .html nonetheless). <urltext> will return the "formatted" text, with 8-bit chars translated to 7-bit equivs (because of the <urlcp>). (The text will also be word-wrapped, if it's longer than 80 chars, and any HTML sequences/tags decoded.)