SQL and ODBC : Could not convert SQL_WCHAR to varchar to UTF-8

michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

SQL and ODBC : Could not convert SQL_WCHAR to varchar to UTF-8

Post by michel.weber »

We have a thesaurus stored in an SQLServer 2008 database.

I wanted to extract it and convert it to user aqiv filte for use with TEXIS/Appliance.

For some records i get the following error :
<!-- 000 /test/APGenThesaurus:28: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
Line 28 in the script is the sql statement :
<table cellspacing="0">
<tr>
<td>ID</td>
<td>Descriptor</td>
<td>Non-Descriptor(s)</td>
</tr>
<local dsc="">
<sql PROVIDER="ODBC" ROW CONNECTSTR="Driver={SQL Server Native Client 10.0};Server=PDE790\SQL2008;Database=MYdb;Uid=MYuid;Pwd=MYpwd"
"SELECT TOP 100 [DESCID] ,[DESCRIPTOR] ,[NONDESCRIPTOR] FROM [AP_TO_DBREFERENCES].[dbo].[Search_Thesaurus]">
<if $dsc ne $DESCRIPTOR>
<if $dsc ne "">
</tr>
</if>
<$dsc=$DESCRIPTOR>
<tr>
<td><fmt %s $DESCID></td>
<td><fmt %s $dsc ></td>
</if>
<if $NONDESCRIPTOR ne "">
<td><fmt %s $NONDESCRIPTOR></td>
</if>
</sql>
</tr>
</table>

The information from the field is returned correctly, but I'm unable to see what causes the problem for these particular records.

here is an excerpt of the results which are printed inside the sql loop :
<tr>
<td>7</td>
<td>commerce des armes</td>
<td>trafic d'armes</td>
<!-- 000 /test/APGenThesaurus:28: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<td>vente d'armes</td>
</tr>
<tr>
<td>8</td>
<td>commerce Est-Ouest</td>

<!-- 000 /test/APGenThesaurus:28: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
</tr>
<tr>
<td>9</td>
<td>commerce extérieur</td>
<td>organisation du commerce extérieur</td>
</tr>
<tr>

Both DESCRIPTOR and NONDESCRIPTOR ar nvarchar(255). I have tried casting them to varchar(255), then the errors disappear, but the result isn't utf-8 anymore.
I also thought for a moment it might be related to NULL values, but it's not.

Any suggestions?
User avatar
John
Site Admin
Posts: 2623
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH

SQL and ODBC : Could not convert SQL_WCHAR to varchar to UTF-8

Post by John »

You are seeing the data come through as UTF-8? It sounds as if the translation is already happening somewhere, even though the ODBC is reporting it as WCHAR instead.
John Turnbull
Thunderstone Software
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

SQL and ODBC : Could not convert SQL_WCHAR to varchar to UTF-8

Post by Kai »

It appears there may be an issue with zero-length (empty) WCHAR fields coming from ODBC, which may be the cause of your second example error (we still need to verify this).

Out of curiosity, do any of the rows that produce the `Could not convert' errors have hi-bit characters in those fields?
michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

SQL and ODBC : Could not convert SQL_WCHAR to varchar to UTF-8

Post by michel.weber »

Yes the output is utf-8.

That is what i expected with nvarchar, but maybe i'm wrong.

SQLServer uses UCS-2 encoding.


It seems to generate an error for 1/3 of the records but i can't see any pattern it happens with NULLs, with english words, and french words...
but ti never complains about the DESCRIPTOR filed
michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

SQL and ODBC : Could not convert SQL_WCHAR to varchar to UTF-8

Post by michel.weber »

Sorry our messages crossed each other.

It happens a lot with NULLs, but not all the time. See record 21 below.

I'm not sure wat a 'hi-bit' character is, but it definitely also happens with plain english words.
For example records 13 and 22 in the sample output

<!-- 000 /test/APGenThesaurus:38: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<!-- Record 0ID ::594::Descriptor ::AAMS countries::nonDescriptor :::: -->
<!-- Record 1ID ::6905::Descriptor ::abandon scolaire::nonDescriptor ::abandon de la scolarité:: -->
<!-- Record 2ID ::6905::Descriptor ::abandon scolaire::nonDescriptor ::abandon des études:: -->
<!-- 000 /test/APGenThesaurus:38: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<!-- Record 3ID ::6905::Descriptor ::abandon scolaire::nonDescriptor ::abandon en cours d'études:: -->
<!-- 000 /test/APGenThesaurus:38: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<!-- Record 4ID ::759::Descriptor ::abandoned child::nonDescriptor :::: -->
<!-- 000 /test/APGenThesaurus:38: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<!-- Record 5ID ::4444::Descriptor ::abandoned land::nonDescriptor :::: -->
<!-- Record 6ID ::920::Descriptor ::abats::nonDescriptor :::: -->
<!-- Record 7ID ::1857::Descriptor ::abattage d'animaux::nonDescriptor ::abattage de bétail:: -->
<!-- 000 /test/APGenThesaurus:38: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<!-- Record 8ID ::1857::Descriptor ::abattage d'animaux::nonDescriptor ::étourdissement d'animal:: -->
<!-- 000 /test/APGenThesaurus:38: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<!-- Record 9ID ::3509::Descriptor ::ABM Agreement::nonDescriptor :::: -->
<!-- Record 10ID ::4333::Descriptor ::abolition of customs duties::nonDescriptor :::: -->
<!-- Record 11ID ::4504::Descriptor ::abortion::nonDescriptor ::legal abortion:: -->
<!-- Record 12ID ::4504::Descriptor ::abortion::nonDescriptor ::termination of pregnancy:: -->
<!-- 000 /test/APGenThesaurus:38: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<!-- Record 13ID ::4504::Descriptor ::abortion::nonDescriptor ::voluntary termination of pregnancy:: -->
<!-- 000 /test/APGenThesaurus:38: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<!-- Record 14ID ::6621::Descriptor ::abrogation::nonDescriptor :::: -->

<!-- 000 /test/APGenThesaurus:38: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<!-- Record 19ID ::1746::Descriptor ::absolute majority::nonDescriptor :::: -->
<!-- 000 /test/APGenThesaurus:38: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<!-- Record 20ID ::5984::Descriptor ::abstentionism::nonDescriptor :::: -->
<!-- Record 21ID ::5984::Descriptor ::abstentionnisme::nonDescriptor :::: -->
<!-- 000 /test/APGenThesaurus:38: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<!-- Record 22ID ::4523::Descriptor ::Abu Dhabi::nonDescriptor ::Abu Zaby:: -->
<!-- 000 /test/APGenThesaurus:38: Could not convert result column `NONDESCRIPTOR' SQL_WCHAR data to varchar UTF-8: The parameter is incorrect; using varbyte in the function vsvtx_get -->
<!-- Record 23ID ::2::Descriptor ::abus de confiance::nonDescriptor :::: -->