Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / General / Other SQL Server Topics / March 2008

Tip: Looking for answers? Try searching our database.

Encoding problem in SQLXML - bulkload

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
worlman385@yahoo.com - 30 Mar 2008 09:44 GMT
I got an XML Data in UTF-8 encoding like this

Dah$)A(&li

but when I run the schema to insert data into SQL Server 2005 Express
edition, the data will become like this

Dah$)A!'0„7li

How can I solve the encoding problem?

XML data:
=======================================

<crew program='SH008774030000'>
<member>
<role>Director</role>
<givenname>Dah$)A(&li</givenname>
<surname>Hall</surname>
</member>
<member>
<role>Writer</role>
<givenname>Dah$)A(&li</givenname>
<surname>Hall</surname>
</member>
</crew>

XML schema:
=======================================

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
  xmlns:sql="urn:schemas-microsoft-com:mapping-schema">

  <xsd:annotation>
     <xsd:appinfo>
        <sql:relationship name="OrderOD" parent="productionID"
parent-key="program"
           child="productionCrew" child-key="program"/>

        <sql:relationship name="ODProduct" parent="productionCrew"
           parent-key="role givenname surname" child="crew"
child-key="role givenname surname"/>
     </xsd:appinfo>
  </xsd:annotation>


  <xsd:element name="crew" sql:relation="productionID">
     <xsd:complexType>
        <xsd:sequence>
           <xsd:element name="member" sql:relation="crew"
sql:relationship="OrderOD ODProduct">
              <xsd:complexType>
                 <xsd:sequence>
                    <xsd:element name="role" type="xsd:string"/>
                    <xsd:element name="givenname" type="xsd:string"/>
                    <xsd:element name="surname" type="xsd:string"/>
                 </xsd:sequence>
              </xsd:complexType>
           </xsd:element>
        </xsd:sequence>
        <xsd:attribute name="program" type="xsd:string"/>
     </xsd:complexType>
  </xsd:element>

</xsd:schema>

Database table:
=======================================

CREATE TABLE ProgramListings.dbo.productionCrew
(
program VARCHAR(20),
role VARCHAR(20),
givenname VARCHAR(20),
surname VARCHAR(20),
PRIMARY KEY(program, role, surname)
)
             
CREATE TABLE ProgramListings.dbo.crew
(
role VARCHAR(20),
givenname VARCHAR(20),
surname VARCHAR(20),
PRIMARY KEY(role, surname)

)
Dan Guzman - 30 Mar 2008 15:37 GMT
>I got an XML Data in UTF-8 encoding like this
>
[quoted text clipped - 6 lines]
>
> How can I solve the encoding problem?

Your XML is not well-formed.  Entity references need to be specified in
place of illegal XML characters (e.g. "&amp;" instead of "&").  Note that
this is not specific to SQLXML but part of the basic XML standards.  See
http://www.w3.org/TR/REC-xml/.

A CDATA section is commonly used in XML in order to eliminate the need to
escape illegal characters.  In a CDATA section, only the end tag ("]]>").
For example:

<crew program='SH008774030000'>
<member>
<role>Director</role>
<givenname><![CDATA[Dah$)A(&li]]></givenname>
<surname>Hall</surname>
</member>
<member>
<role>Writer</role>
<givenname><![CDATA[Dah$)A(&li]]></givenname>
<surname>Hall</surname>
</member>
</crew>

Signature

Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

>I got an XML Data in UTF-8 encoding like this
>
[quoted text clipped - 82 lines]
>
> )
Dan Guzman - 30 Mar 2008 15:56 GMT
> In a CDATA section, only the end tag ("]]>").

This sentence should have been:

In a CDATA section, only the end tag ("]]>") is recognized as markup.

Signature

Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

> >I got an XML Data in UTF-8 encoding like this
>>
[quoted text clipped - 115 lines]
>>
>> )
worlman385@yahoo.com - 30 Mar 2008 22:48 GMT
Thanks your help Dan!

But no! the XML is well formed:
http://www.oniva.com/upload/1356/x1.jpg

but after using the COM object of SQLXML to load XML file into
Database, the data will look like this:
http://www.oniva.com/upload/1356/x1.jpg

notice the givenname data is changed.

I think the input is UTF-8 data but the SQLXML interface convert UTF-8
to ASCII so the data is messed up when loaded from XML to database.

Since some data in XML is non-ASCII

>Your XML is not well-formed.  Entity references need to be specified in
>place of illegal XML characters (e.g. "&amp;" instead of "&").  Note that
[quoted text clipped - 17 lines]
></member>
></crew>
worlman385@yahoo.com - 30 Mar 2008 23:00 GMT
Sorry, the link of second one should be
http://www.oniva.com/upload/1356/x2.jpg

>but after using the COM object of SQLXML to load XML file into
>Database, the data will look like this:
>http://www.oniva.com/upload/1356/x1.jpg
worlman385@yahoo.com - 30 Mar 2008 23:36 GMT
Just solve the problem

use NVARCHAR instead of VARCHAR

NVARCHAR - support unicdoe

==============

CREATE TABLE ProgramListings.dbo.productionCrew
(
program NVARCHAR(20),
role NVARCHAR(20),
givenname NVARCHAR(20),
surname NVARCHAR(20),
PRIMARY KEY(program, role, surname)
)
Dan Guzman - 30 Mar 2008 23:11 GMT
> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
> to ASCII so the data is messed up when loaded from XML to database.

Does your XML include a processing instruction to specify UTF-8 encoding?
For example:

<?xml version="1.0" encoding="UTF-8" ?>

Signature

Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

> Thanks your help Dan!
>
[quoted text clipped - 33 lines]
>></member>
>></crew>
worlman385@yahoo.com - 30 Mar 2008 23:25 GMT
Yes, I put the string like
<?xml version="1.0" encoding="UTF-8" ?>
also tried other encoding string but no luck.

I tried to put the parameter in the connection string:
sendStringParametersAsUnicode=true

but doesn't help

=======================
hr = pIXMLBulkLoad->put_ConnectionString(L"provider=SQLOLEDB;data
source=(local)\\SQLEXPRESS;database=ProgramListings;integrated
security=SSPI;sendStringParametersAsUnicode=true");
    }

>> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
>> to ASCII so the data is messed up when loaded from XML to database.
[quoted text clipped - 3 lines]
>
><?xml version="1.0" encoding="UTF-8" ?>
worlman385@yahoo.com - 30 Mar 2008 23:36 GMT
Just solve the problem

use NVARCHAR instead of VARCHAR

NVARCHAR - support unicdoe

==============

CREATE TABLE ProgramListings.dbo.productionCrew
(
program NVARCHAR(20),
role NVARCHAR(20),
givenname NVARCHAR(20),
surname NVARCHAR(20),
PRIMARY KEY(program, role, surname)
)
Dan Guzman - 31 Mar 2008 12:42 GMT
> Just solve the problem
>
> use NVARCHAR instead of VARCHAR

I'm glad you were able to figure this out.  I had assumed that your database
default collation was appropriate for the characters being stored.

Signature

Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

> Just solve the problem
>
[quoted text clipped - 12 lines]
> PRIMARY KEY(program, role, surname)
> )
Erland Sommarskog - 30 Mar 2008 23:13 GMT
> Thanks your help Dan!
>
[quoted text clipped - 9 lines]
> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
> to ASCII so the data is messed up when loaded from XML to database.

Your original post was encoded in iso-2022-cn, which may explain why I
and Dan so very funny characters.

Since your target columns are varchar, I need to ask: what is the collation
of these columns? If that collation does not include é in its ANSI set,
you cannot get the name right. Then again, then you should get "e". What
you got appears to be UTF-8 interpreted as ANSI.

I don't have any experience of XML bulkload, so I don't know what is
going on. You could try to add

  <?xml  version="1.0" encoding="utf-8" ?>

first in the file, even though this should not be needed since UTF-8
is the default for XML.


Signature

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

worlman385@yahoo.com - 30 Mar 2008 23:36 GMT
Just solve the problem

use NVARCHAR instead of VARCHAR

NVARCHAR - support unicdoe

==============

CREATE TABLE ProgramListings.dbo.productionCrew
(
program NVARCHAR(20),
role NVARCHAR(20),
givenname NVARCHAR(20),
surname NVARCHAR(20),
PRIMARY KEY(program, role, surname)
)

>> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
>> to ASCII so the data is messed up when loaded from XML to database.
[quoted text clipped - 14 lines]
>first in the file, even though this should not be needed since UTF-8
>is the default for XML.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.