strfmt %hV

Post Reply
vince16
Posts: 1
Joined: Tue Mar 20, 2007 12:55 pm

strfmt %hV

Post by vince16 »

I'm trying to UTF-8 encode text that may or may not have mixed symbols and/or html entities.

code:
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<capture>
£ - &#163; - pound (160 - 191)<br />
Ñ - &#209; - N tilde (192 - 223)<br />
ñ - &#241; - n tilde (224 - 255)<br />
€ - &#8364; - <b>euro</b> (8200+)<br />
</capture>
<$text=$ret>
<strfmt "%hV" $text>
<send $ret>

output:
£ - £ - pound (&#160; - 191)
Ã&#145; - Ñ - N tilde (&#192; - 223)
ñ - ñ - n tilde (&#224; - 255)
â&#130;¬ - € - euro (&#8200;+)

html entities are encoded properly but the symbol representations are not...
symbols for &#160;-&#191; - have extra-characters preceding them
symbols for &#192; and up - encoded to unrecognizable character

What is the correct way to utf-8 encode text with mixed symbols and html entities?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

strfmt %hV

Post by John »

I believe that is the correct UTF-8 encoding, unless they are getting double encoded. Are you sure the input wasn't already UTF-8 encoded? Setting the encoding of this page to UTF-8 shows the output correct for two lines, however the &#130; implies the input was already UTF-8.
John Turnbull
Thunderstone Software
Post Reply