Error while fetching documents with "shift_jis" charsets

Post Reply
Maulik
Posts: 4
Joined: Wed Feb 18, 2009 8:45 am

Error while fetching documents with "shift_jis" charsets

Post by Maulik »

Hi,

I am fetching Japanese documents with "shift_jis" charsets. Following errors appear:

- Cannot convert charset shift_jis to UTF-8 via converter /usr/local/morph3/etc/iconv-wrapper: No such file or directory in the function httransbuf

- Cannot completely convert charset UTF-8 to UTF-8: Invalid character sequence at source offset 109419 in the function htutf8_to_utf8

I am working in Solaris 10 and Commercial Version 5.01.1231219356 20090106 (sparc-sun-solaris2.8-64-64) of Texis.

Am i missing any file or library? Please provide your suggestions.

Regards,
Maulik
User avatar
Kai
Site Admin
Posts: 1270
Joined: Tue Apr 25, 2000 1:27 pm

Error while fetching documents with "shift_jis" charsets

Post by Kai »

`etc/iconv-wrapper' is not the standard Webinator charset converter. Did you perhaps wrap the standard etc/iconv executable with a wrapper script called etc/iconv-wrapper at some point, then delete it?

Check that the etc/iconv-wrapper exists and works properly. Or edit texis.cnf and comment out your [Texis] Charset Converter setting (it was probably edited) to fall back to the standard etc/iconv (you will then need to restart vhttpd if you are using it). Make sure that the etc/iconv executable exists in your install dir.

The second error (`Cannot completely convert charset UTF-8 to UTF-8') probably indicates corrupt UTF-8 data, or data incorrectly labeled as UTF-8 when it is not. Can you provide the URL where that error occurred?
Maulik
Posts: 4
Joined: Wed Feb 18, 2009 8:45 am

Error while fetching documents with "shift_jis" charsets

Post by Maulik »

I have not written any wrapper for iconv and there exists no setting in the code which would point to "etc/iconv-wrapper". The standard file "iconv" exists at morph3/etc.

I have not edited the default charset setting in texis.cnf. Currently, texis.cnf has the following settings for charsets (it is commented):

; Charset Converter is an external program and arguments that translates
; its stdin in one charset to stdout in another. The variables
; %CHARSETFROM% and %CHARSETTO% will be replaced with charsets when run;
; embedded-space args may be double-quoted:
;Charset Converter = "%INSTALLDIR%/etc/iconv" -f %CHARSETFROM% -t %CHARSETTO% -c

Is there any other way that texis could be directed to use the default iconv? If there is no way to prevent "iconv-wrapper", then please describe what should be the content of that wrapper.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Error while fetching documents with "shift_jis" charsets

Post by mark »

The default setting of just "iconv" is clearly getting overridden somewhere. Check the code again looking for just "iconv-wrapper". Don't forget to check any modules that you may be using.
User avatar
John
Site Admin
Posts: 2595
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Error while fetching documents with "shift_jis" charsets

Post by John »

You may want to open a ticket through our Tech Support page. It looks as if in the past we have explained the usage of <urlcp charsetconverter> to other members of your organization.
John Turnbull
Thunderstone Software
Maulik
Posts: 4
Joined: Wed Feb 18, 2009 8:45 am

Error while fetching documents with "shift_jis" charsets

Post by Maulik »

I have found that "iconv-wrapper" was getting called through <urlcp charsetconverter> in a module, so removed it. Application is working correctly as expected.

My company is using your product from past many years and many people have worked on it, so quite possible some one might have asked similar question. Thanks a lot!!
Post Reply