Sql Bulk在终结器中插入带双引号的XML格式文件

时间:2011-12-16 05:55:02

标签: sql-server xml bulkinsert bcp

我正在尝试从csv文档中将一些数据插入到表中,该文档的所有字段都以“”

分隔。

 APPLICANTID,NAME,CONTACT,PHONENO,MOBILENO,FAXNO,EMAIL,ADDR1,ADDR2,ADDR3,STATE,POSTCODE
 "3","Snoop Dogg","Snoop Dogg","411","","","","411 High Street","USA 
 ","","USA", "1111" "4","LL Cool J","LL Cool J","","","","","5 King
 Street","","","USA","1111"

我正在使用xml格式文件来尝试克服“”分隔符,因为我认为我必须在导入后再次更新数据以删除初始值“如果没有。”

我的格式文件如下所示:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="NCharTerm" TERMINATOR='",' MAX_LENGTH="12"/>
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="3" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="4" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="5" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="6" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="7" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="8" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="9" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="10" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="11" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="12" xsi:type="CharTerm" TERMINATOR="\r\n" COLLATION="Latin1_General_CI_AS"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="APPLICANTID" xsi:type="SQLINT"/>
  <COLUMN SOURCE="2" NAME="NAME" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="3" NAME="CONTACT" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="4" NAME="PHONENO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="5" NAME="MOBILENO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="6" NAME="FAXNO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="7" NAME="EMAIL" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="8" NAME="ADDR1" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="9" NAME="ADDR2" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="10" NAME="ADDR3" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="11" NAME="STATE" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="12" NAME="POSTCODE" xsi:type="SQLCHAR"/>
 </ROW>
</BCPFORMAT>

我正在使用以下内容运行导入:

BULK INSERT [PracticalDB].dbo.applicant 
FROM 'C:\temp.csv'
WITH (KEEPIDENTITY, FORMATFILE='C:\temp.xml', FIRSTROW = 2)

我收到错误:

  

Msg 4864,Level 16,State 1,Line 1批量加载数据转换错误   (为指定的代码页键入不匹配或无效字符)   第2栏第1栏(APPLICANTID)。

表示所有行。

我尝试了各种不同的终结器组合,包括使用:

TERMINATOR="&quot;,"
TERMINATOR="\","
TERMINATOR='","
TERMINATOR='\","

并且它们似乎都不起作用。

是否有正确的方法来逃避“以便正确解析它,假设这是我的问题。

3 个答案:

答案 0 :(得分:19)

好的,我明白了!

当您定义xml属性(即TERMINATOR ='')时,您可以使用'而不是',然后您可以使用“在其中而不用担心。”

此外,我需要吃第一个“带字段,以便其他列可以正确解析。这最终得到格式文件

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='"' />
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="3" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="4" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="5" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="6" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="7" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="8" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="9" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="10" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="11" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="12" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="13" xsi:type="CharTerm" TERMINATOR='"\r\n' />
 </RECORD>
 <ROW>
  <COLUMN SOURCE="2" NAME="APPLICANTID" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="3" NAME="NAME" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="4" NAME="CONTACT" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="5" NAME="PHONENO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="6" NAME="MOBILENO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="7" NAME="FAXNO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="8" NAME="EMAIL" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="9" NAME="ADDR1" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="10" NAME="ADDR2" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="11" NAME="ADDR3" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="12" NAME="STATE" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="13" NAME="POSTCODE" xsi:type="SQLCHAR"/>
 </ROW>
</BCPFORMAT>

第一个字段只是丢弃一个,删除第一个“和其他字段全部分开”,“最后分隔”(换行符)

答案 1 :(得分:2)

提示:如果只有部分字段是doubleqouted,那么使用批量插入的openrowset版本,这样做,您可以操作来自输入文件的字段内容 在插入目标表之前。

在操作中,您可以对字段内容执行任何操作,例如:删除双引号。这里没有提到对性能的影响,我没有相关的措施。

答案 2 :(得分:1)

提示:如果您的CSV文件格式不一致,例如在同一列上有一些值是双重的,有些不是这个博客会帮助您轻松完成(这是继续Estevez的提示使用openrowset只是最后一步) http://ariely.info/Blog/tabid/83/EntryId/122/Using-Bulk-Insert-to-import-inconsistent-data-format-using-pure-T-SQL.aspx