Page 1 of 1

stderr converting ISO-1252 to UTF-8

Posted: Thu Oct 06, 2005 7:22 pm
by cindy_walker
We're getting these error messages on the walk status page:

118 /usr/local/morph3/texis/scripts/webinator/dowalk(doprimer) 266: Charset converter "/usr/local/morph3/etc/iconv" -f iso-1252 -t UTF-8 -c stderr converting iso-1252 to UTF-8: iconv: conversion from iso-1252 unsupported
in the function httransbuf

018 /usr/local/morph3/texis/scripts/webinator/dowalk(doprimer) 266: Cannot convert iso-1252 to UTF-8 via charset converter "/usr/local/morph3/etc/iconv" -f iso-1252 -t UTF-8 -c: returned exit code 1 in the function httransbuf

and similar slightly shorter messages on the search results page. It's not clear to me whether Webinator indexes the page anyway when this happens.

How can I correct this? Many of our pages have a content type meta tag specifying charset=iso-1252 for who knows what reason.

We're using Enterprise Webinator 5.1.10-Unix-w/plugin for Solaris.

Cindy

stderr converting ISO-1252 to UTF-8

Posted: Fri Oct 07, 2005 11:07 am
by John
There is not as far as we are aware an ISO-1252 character set. It would be recognized as CP1252 or WINDOWS-1252, which it looks as if some of your pages use.

It would be possible to write a shell script wrapper around iconv to convert iso-1252 to windows-1252.

stderr converting ISO-1252 to UTF-8

Posted: Fri Oct 07, 2005 11:45 am
by cindy_walker
The template we were given to use for our web pages unfortunately contains this meta tag:

<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=iso-1252">

Clearly it's meaningless. Is there a way to configure the dowalk script to assume the default charset when it comes across a page with this meta tag? We have hundreds of content providers and >100,000 pages. We would only be at best partially successful in removing it from our pages.

I don't think we actually need to do any special conversion. For us it's an erroneous tag.

stderr converting ISO-1252 to UTF-8

Posted: Sat Oct 08, 2005 9:33 am
by John
The conversion would be of the argument to iconv. A future version of Webinator will allow the aliases to specified a little easier, but you can make the following edits:

In texis.cnf uncomment the Charset Converter line and change it to:

Charset Converter = "%INSTALLDIR%(#)PATH_SEP(#)etc(#)PATH_SEP(#)preiconv" %CHARSETFROM% %CHARSETTO%

And on unix the preiconv script might look like:

#!/bin/sh

if [ "x$1" -eq "xiso-1252" ] ; then
/usr/local/morph3/etc/iconv -f windows-1252 -t $2 -c
else
/usr/local/morph3/etc/iconv -f $1 -t $2 -c
fi