CSV文件以逗号分隔,包含嵌入的分隔符和引号。有些字段有开头和结尾的引用,有些则没有。
完美处理第一条记录,但不是第二条记录。正如您所看到的,该字段似乎有一个引号,但它实际上是嵌入的。 field5没有尾随引用。导入的结果将空白放在字段5和6中,并将field5数据(以粗体显示)压入field7,后者在此过程中会导致违反最大字段长度。
Filehelpers中是否有属性设置我可以用来处理包含下面粗体字母的字段的记录,以便记录正确导入每个字段? CSV文件是从外部来源接收的,因此无法控制Feed。
AT2M-2471-3 ,," 1178",AccuTemp,48""固体切割板(必须与AT2A-2630-3或AT2A-2630-22一起订购),ea," 10.00"," 0.00000"," 207.00& #34;" 93.41"" 0.00"" 0.00"" 0.00"" 0.00& #34;,ATCUT,""," 1",每个,"切割板,设备安装",Accutemp,"", "假" ,, 85,"" ,," 0"" baab3369-BCAD-453e-9867-921e4af1203c"&# 34;",Accutemp ,,""" e0fb1dfb-c00d-DD11-a23a-00304834a8c9"" bcd6e7a0-be0d-DD11-a23a-00304834a8c9&# 34;
AT2M-2877-1 ,," 1178",AccuTemp,的"" U""用于连接两个29"" A Depth griddles ,ea," 4.00"," 0.00000"," 104.00"," 46.93" " 0.00"" 0.00"" 0.00"" 0.00",AT2M,"&#34 ;," 1"每个,, Accutemp,"""假" ,, 85,"" ,,&#34 ; 0"" f7d56cb1-b2ab-40c7-b7e5-55ee1b4d1023""",Accutemp ,,""" e3fb1dfb- c00d-DD11-a23a-00304834a8c9"" bcd6e7a0-be0d-DD11-a23a-00304834a8c9"
这是SQL表结构,没有索引:
CREATE TABLE [dbo].[rawdata](
[Model Number] [varchar](50) NULL,
[User Stock Model Number] [varchar](50) NULL,
[Vendor Number] [varchar](50) NULL,
[Vendor Name] [varchar](50) NULL,
[Specification] [varchar](max) NULL,
[Vendor Pack] [varchar](50) NULL,
[Selling Unit] [varchar](50) NULL,
[Weight] [varchar](50) NULL,
[Cube] [varchar](50) NULL,
[List Price] [varchar](50) NULL,
[Net Price] [varchar](50) NULL,
[Height] [varchar](50) NULL,
[Width] [varchar](50) NULL,
[Depth] [varchar](50) NULL,
[Deal Net] [varchar](50) NULL,
[Picture Name] [varchar](150) NULL,
[Blank Column] [varchar](50) NULL,
[Vendor to Stock] [varchar](50) NULL,
[Priced By] [varchar](50) NULL,
[Category] [varchar](75) NULL,
[Vendor Nickname] [varchar](50) NULL,
[User Vendor Name] [varchar](50) NULL,
[Configurable?] [varchar](50) NULL,
[Category Values] [varchar](max) NULL,
[Freight Class] [varchar](50) NULL,
[Vendor FOB] [varchar](50) NULL,
[Ship from Zip] [varchar](50) NULL,
[Model Apply] [varchar](50) NULL,
[Picture Link] [varchar](50) NULL,
[Category Code] [varchar](50) NULL,
[Vendor Short Name] [varchar](50) NULL,
[Cutsheet Name] [varchar](150) NULL,
[Cutsheet Link] [varchar](50) NULL,
[Product ID] [varchar](50) NULL,
[Vendor ID] [varchar](50) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
我为表创建了具有以下属性的类: [DelimitedRecord("&#34)] [IgnoreFirst(1)]
class rawdata
{
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Model_Number;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string User_Stock_Model_Number;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Vendor_Number;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Vendor_Name;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Specification;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Vendor_Pack;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Selling_Unit;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Weight;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Cube;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string List_Price;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Net_Price;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Height;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Width;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Depth;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Deal_Net;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Picture_Name;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Blank_Column;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Vendor_to_Stock;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Priced_By;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Category;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Vendor_Nickname;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string User_Vendor_Name;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Configurable;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Category_Values;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Freight_Class;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Vendor_FOB;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Ship_from_Zip;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Model_Apply;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Picture_Link;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Category_Code;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Vendor_Short_Name;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Cutsheet_Name;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Cutsheet_Link;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Product_ID;
[FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
public string Vendor_ID;
}
这是C#代码:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.Data;
using System.Data.SqlClient;
using System.Windows.Forms;
using FileHelpers;
namespace XYZ
{
class Class1
{
static void Main(string[] args)
{
SqlConnection conn1 = new SqlConnection();
DataTable temp_rawdata_table = new DataTable();
conn1.ConnectionString = "Data Source=ABC;Initial Catalog=XYZ;Integrated Security=True";
System.Diagnostics.Stopwatch elapsed = new System.Diagnostics.Stopwatch();
elapsed.Start(); Int64 rows = 0;
// ================ Begin BulkCopy ========================
using (SqlBulkCopy bulkcopy = new SqlBulkCopy(conn1.ConnectionString,
System.Data.SqlClient.SqlBulkCopyOptions.TableLock)
{
DestinationTableName = "rawdata",
BulkCopyTimeout = 0,
BatchSize = 100000
})
{
temp_rawdata_table = new XYZDataSet.rawdataDataTable();
// using the ASYNC engine allows for processing record by record
FileHelperAsyncEngine engine = new FileHelperAsyncEngine(typeof(rawdata));
engine.BeginReadFile("C:\\rawdata.csv");
int batchsize = 0;
Console.WriteLine("Copying data to table.");
// The Async engines are IEnumerable
foreach (rawdata aqtext in engine)
{
//create a new update row for aq360productsraw table
DataRow rawdata_update_row = temp_rawdata_table.NewRow();
rawdata_update_row["Model Number"] = aqtext.Model_Number.Trim();
rawdata_update_row["User Stock Model Number"] = aqtext.User_Stock_Model_Number.Trim();
rawdata_update_row["Vendor Number"] = aqtext.Vendor_Number.Trim();
rawdata_update_row["Vendor Name"] = aqtext.Vendor_Name.Trim();
rawdata_update_row["Specification"] = aqtext.Specification.Trim();
rawdata_update_row["Vendor Pack"] = aqtext.Vendor_Pack.Trim();
rawdata_update_row["Selling Unit"] = aqtext.Selling_Unit.Trim();
rawdata_update_row["Weight"] = aqtext.Weight.Trim();
rawdata_update_row["Cube"] = aqtext.Cube.Trim();
rawdata_update_row["List Price"] = aqtext.List_Price.Trim();
rawdata_update_row["Net Price"] = aqtext.Net_Price.Trim();
rawdata_update_row["Height"] = aqtext.Height.Trim();
rawdata_update_row["Width"] = aqtext.Width.Trim();
rawdata_update_row["Depth"] = aqtext.Depth.Trim();
rawdata_update_row["Deal Net"] = aqtext.Deal_Net.Trim();
rawdata_update_row["Picture Name"] = aqtext.Picture_Name.Trim();
rawdata_update_row["Blank Column"] = aqtext.Blank_Column.Trim();
rawdata_update_row["Vendor to Stock"] = aqtext.Vendor_to_Stock.Trim();
rawdata_update_row["Priced By"] = aqtext.Priced_By.Trim();
rawdata_update_row["Category"] = aqtext.Category.Trim();
rawdata_update_row["Vendor Nickname"] = aqtext.Vendor_Nickname.Trim();
rawdata_update_row["User Vendor Name"] = aqtext.User_Vendor_Name.Trim();
rawdata_update_row["Configurable?"] = aqtext.Configurable.Trim();
rawdata_update_row["Category Values"] = aqtext.Category_Values.Trim();
rawdata_update_row["Freight Class"] = aqtext.Freight_Class.Trim();
rawdata_update_row["Vendor FOB"] = aqtext.Vendor_FOB.Trim();
rawdata_update_row["Ship from Zip"] = aqtext.Ship_from_Zip.Trim();
rawdata_update_row["Model Apply"] = aqtext.Model_Apply.Trim();
rawdata_update_row["Picture Link"] = aqtext.Picture_Link.Trim();
rawdata_update_row["Category Code"] = aqtext.Category_Code.Trim();
rawdata_update_row["Vendor Short Name"] = aqtext.Vendor_Short_Name.Trim();
rawdata_update_row["Cutsheet Name"] = aqtext.Cutsheet_Name.Trim();
rawdata_update_row["Cutsheet Link"] = aqtext.Cutsheet_Link.Trim();
rawdata_update_row["Product ID"] = aqtext.Product_ID.Trim();
rawdata_update_row["Vendor ID"] = aqtext.Vendor_ID.Trim();
temp_rawdata_table.Rows.Add(rawdata_update_row);
batchsize += 1;
if (batchsize == 100000)
{
bulkcopy.WriteToServer(temp_rawdata_table);
temp_rawdata_table.Rows.Clear();
batchsize = 0;
Console.WriteLine("Flushing 100,000 rows");
}
rows += 1;
Console.WriteLine(rows.ToString() + " " + aqtext.Model_Number.Trim() + Environment.NewLine);
}
bulkcopy.WriteToServer(temp_rawdata_table);
temp_rawdata_table.Rows.Clear();
engine.Close();
}
elapsed.Stop();
Console.WriteLine((rows + " records imported in " + elapsed.Elapsed.TotalSeconds + " seconds."));
}
}
}
答案 0 :(得分:1)
@MarcosMeli也提到的问题是,这是一个无效的CSV文件。而不只是那一个领域。即使你认为有效的行也没有真正起作用。似乎创建这个CSV文件的人在向哪个字段应该是文本限定的(即“引用”)并且不需要它的方面做了反向。它们的数字字段是文本限定的,文本字段是非限定的。
第1行的工作原因是文本限定条件会查看字段的第一个和最后一个字符。在第1行中,转义引号(即双引号)不是第一个字符,因此我怀疑它是重复的双引号。然而在第2行中,引用了该字段的开头文本,因此第一个字符是引号,然后它们通过复制双引号来转义。它非常容易完成,甚至让FileHelpers与它一起工作现在对它继续正常工作没有多少信心,特别是如果非文本限定的文本字段中有逗号。在这种情况下,它会再次导致字段意外的变化。我知道你说CSV文件来自外部源,你无法控制它,但你真的需要尝试修复它,因为它是完全错误的。这是一个生成它的系统中的一个错误,需要修复它。
目前,您可以将所有文本字段设置为非文本限定。但是,您可能需要添加一个步骤,用一个双引号替换所有双引号。
除了数据格式问题,并且没有从FileHelpers中取出任何东西,因为它看起来像一个有趣且有用的库,我会说你不需要需要 FileHelpers才能读取文本文件,逐行(最小内存占用)并将其批处理到SQL Server。事实上,你可以做所有这一切:
[rawdata]
)的步骤,而是直接将行发送到同步存储过程中VARCHAR
/ NVARCHAR
字段)。怎么样?使用表值参数和,方法是使用IEnumerable<SqlDataRecord>
方法(而不 DataTable
方法)。我在这里用几个答案详细说明了这个技术:
答案 1 :(得分:0)
问题是CSV无效,只有在引用字段时才能转义引号
价值:
,""U"" channel for connecting two 29"" A Depth griddles,
要正确解析它必须是
,"""U"" channel for connecting two 29"" A Depth griddles",
如何从规范字段中删除FieldQuoted?
public string Specification;