使用Filehelpers,将大型CSV文件导入SQL2014表

时间:2015-07-17 16:40:46

标签: c# sql-server filehelpers

CSV文件以逗号分隔,包含嵌入的分隔符和引号。有些字段有开头和结尾的引用,有些则没有。

完美处理第一条记录,但不是第二条记录。正如您所看到的,该字段似乎有一个引号,但它实际上是嵌入的。 field5没有尾随引用。导入的结果将空白放在字段5和6中,并将field5数据(以粗体显示)压入field7,后者在此过程中会导致违反最大字段长度。

Filehelpers中是否有属性设置我可以用来处理包含下面粗体字母的字段的记录,以便记录正确导入每个字段? CSV文件是从外部来源接收的,因此无法控制Feed。

AT2M-2471-3 ,," 1178",AccuTemp,48""固体切割板(必须与AT2A-2630-3或AT2A-2630-22一起订购),ea," 10.00"," 0.00000"," 207.00& #34;" 93.41"" 0.00"" 0.00"" 0.00"" 0.00& #34;,ATCUT,""," 1",每个,"切割板,设备安装",Accutemp,"", "假" ,, 85,"" ,," 0"" baab3369-BCAD-453e-9867-921e4af1203c"&# 34;",Accutemp ,,""" e0fb1dfb-c00d-DD11-a23a-00304834a8c9"" bcd6e7a0-be0d-DD11-a23a-00304834a8c9&# 34;

AT2M-2877-1 ,," 1178",AccuTemp,的"" U""用于连接两个29"" A Depth griddles ,ea," 4.00"," 0.00000"," 104.00"," 46.93" " 0.00"" 0.00"" 0.00"" 0.00",AT2M,"&#34 ;," 1"每个,, Accutemp,"""假" ,, 85,"" ,,&#34 ; 0"" f7d56cb1-b2ab-40c7-b7e5-55ee1b4d1023""",Accutemp ,,""" e3fb1dfb- c00d-DD11-a23a-00304834a8c9"" bcd6e7a0-be0d-DD11-a23a-00304834a8c9"

这是SQL表结构,没有索引:

    CREATE TABLE [dbo].[rawdata](
        [Model Number] [varchar](50) NULL,
        [User Stock Model Number] [varchar](50) NULL,
        [Vendor Number] [varchar](50) NULL,
        [Vendor Name] [varchar](50) NULL,
        [Specification] [varchar](max) NULL,
        [Vendor Pack] [varchar](50) NULL,
        [Selling Unit] [varchar](50) NULL,
        [Weight] [varchar](50) NULL,
        [Cube] [varchar](50) NULL,
        [List Price] [varchar](50) NULL,
        [Net Price] [varchar](50) NULL,
        [Height] [varchar](50) NULL,
        [Width] [varchar](50) NULL,
        [Depth] [varchar](50) NULL,
        [Deal Net] [varchar](50) NULL,
        [Picture Name] [varchar](150) NULL,
        [Blank Column] [varchar](50) NULL,
        [Vendor to Stock] [varchar](50) NULL,
        [Priced By] [varchar](50) NULL,
        [Category] [varchar](75) NULL,
        [Vendor Nickname] [varchar](50) NULL,
        [User Vendor Name] [varchar](50) NULL,
        [Configurable?] [varchar](50) NULL,
        [Category Values] [varchar](max) NULL,
        [Freight Class] [varchar](50) NULL,
        [Vendor FOB] [varchar](50) NULL,
        [Ship from Zip] [varchar](50) NULL,
        [Model Apply] [varchar](50) NULL,
        [Picture Link] [varchar](50) NULL,
        [Category Code] [varchar](50) NULL,
        [Vendor Short Name] [varchar](50) NULL,
        [Cutsheet Name] [varchar](150) NULL,
        [Cutsheet Link] [varchar](50) NULL,
        [Product ID] [varchar](50) NULL,
        [Vendor ID] [varchar](50) NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

我为表创建了具有以下属性的类:     [DelimitedRecord("&#34)]     [IgnoreFirst(1)]

class rawdata
{
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Model_Number;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string User_Stock_Model_Number;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Number;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Specification;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Pack;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Selling_Unit;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Weight;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Cube;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string List_Price;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Net_Price;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Height;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Width;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Depth;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Deal_Net;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Picture_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Blank_Column;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_to_Stock;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Priced_By;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Category;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Nickname;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string User_Vendor_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Configurable;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Category_Values;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Freight_Class;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_FOB;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Ship_from_Zip;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Model_Apply;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Picture_Link;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Category_Code;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Short_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Cutsheet_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Cutsheet_Link;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Product_ID;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_ID;  

}

这是C#代码:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Text.RegularExpressions;
    using System.Threading.Tasks;
    using System.Data;
    using System.Data.SqlClient;
    using System.Windows.Forms;
    using FileHelpers;

    namespace XYZ
    {
        class Class1
        {
            static void Main(string[] args)
            {
                SqlConnection conn1 = new SqlConnection();
                DataTable temp_rawdata_table = new DataTable();

                conn1.ConnectionString = "Data Source=ABC;Initial Catalog=XYZ;Integrated Security=True";

                System.Diagnostics.Stopwatch elapsed = new System.Diagnostics.Stopwatch();
                elapsed.Start(); Int64 rows = 0;

                // ================ Begin BulkCopy ========================
                using (SqlBulkCopy bulkcopy = new SqlBulkCopy(conn1.ConnectionString,
                    System.Data.SqlClient.SqlBulkCopyOptions.TableLock)
                    {
                        DestinationTableName = "rawdata",
                        BulkCopyTimeout = 0,
                        BatchSize = 100000
                    })
                {
                    temp_rawdata_table = new XYZDataSet.rawdataDataTable();

                    // using the ASYNC engine allows for processing record by record
                    FileHelperAsyncEngine engine = new FileHelperAsyncEngine(typeof(rawdata));
                    engine.BeginReadFile("C:\\rawdata.csv");

                    int batchsize = 0;

                    Console.WriteLine("Copying data to table.");
                    // The Async engines are IEnumerable
                    foreach (rawdata aqtext in engine)
                    {
                        //create a new update row for aq360productsraw table
                        DataRow rawdata_update_row = temp_rawdata_table.NewRow();

                        rawdata_update_row["Model Number"] = aqtext.Model_Number.Trim();
                        rawdata_update_row["User Stock Model Number"] = aqtext.User_Stock_Model_Number.Trim();
                        rawdata_update_row["Vendor Number"] = aqtext.Vendor_Number.Trim();
                        rawdata_update_row["Vendor Name"] = aqtext.Vendor_Name.Trim();
                        rawdata_update_row["Specification"] = aqtext.Specification.Trim();
                        rawdata_update_row["Vendor Pack"] = aqtext.Vendor_Pack.Trim();
                        rawdata_update_row["Selling Unit"] = aqtext.Selling_Unit.Trim();
                        rawdata_update_row["Weight"] = aqtext.Weight.Trim();
                        rawdata_update_row["Cube"] = aqtext.Cube.Trim();
                        rawdata_update_row["List Price"] = aqtext.List_Price.Trim();
                        rawdata_update_row["Net Price"] = aqtext.Net_Price.Trim();
                        rawdata_update_row["Height"] = aqtext.Height.Trim();
                        rawdata_update_row["Width"] = aqtext.Width.Trim();
                        rawdata_update_row["Depth"] = aqtext.Depth.Trim();
                        rawdata_update_row["Deal Net"] = aqtext.Deal_Net.Trim();
                        rawdata_update_row["Picture Name"] = aqtext.Picture_Name.Trim();
                        rawdata_update_row["Blank Column"] = aqtext.Blank_Column.Trim();
                        rawdata_update_row["Vendor to Stock"] = aqtext.Vendor_to_Stock.Trim();
                        rawdata_update_row["Priced By"] = aqtext.Priced_By.Trim();
                        rawdata_update_row["Category"] = aqtext.Category.Trim();
                        rawdata_update_row["Vendor Nickname"] = aqtext.Vendor_Nickname.Trim();
                        rawdata_update_row["User Vendor Name"] = aqtext.User_Vendor_Name.Trim();
                        rawdata_update_row["Configurable?"] = aqtext.Configurable.Trim();
                        rawdata_update_row["Category Values"] = aqtext.Category_Values.Trim();
                        rawdata_update_row["Freight Class"] = aqtext.Freight_Class.Trim();
                        rawdata_update_row["Vendor FOB"] = aqtext.Vendor_FOB.Trim();
                        rawdata_update_row["Ship from Zip"] = aqtext.Ship_from_Zip.Trim();
                        rawdata_update_row["Model Apply"] = aqtext.Model_Apply.Trim();
                        rawdata_update_row["Picture Link"] = aqtext.Picture_Link.Trim();
                        rawdata_update_row["Category Code"] = aqtext.Category_Code.Trim();
                        rawdata_update_row["Vendor Short Name"] = aqtext.Vendor_Short_Name.Trim();
                        rawdata_update_row["Cutsheet Name"] = aqtext.Cutsheet_Name.Trim();
                        rawdata_update_row["Cutsheet Link"] = aqtext.Cutsheet_Link.Trim();
                        rawdata_update_row["Product ID"] = aqtext.Product_ID.Trim();
                        rawdata_update_row["Vendor ID"] = aqtext.Vendor_ID.Trim();


                        temp_rawdata_table.Rows.Add(rawdata_update_row);

                        batchsize += 1;
                        if (batchsize == 100000)
                        {
                            bulkcopy.WriteToServer(temp_rawdata_table);
                            temp_rawdata_table.Rows.Clear();
                            batchsize = 0;
                            Console.WriteLine("Flushing 100,000 rows");
                        }

                        rows += 1;

                        Console.WriteLine(rows.ToString() + "    " + aqtext.Model_Number.Trim() + Environment.NewLine);
                    }


                    bulkcopy.WriteToServer(temp_rawdata_table);
                    temp_rawdata_table.Rows.Clear();

                    engine.Close();
                }
                elapsed.Stop();
                Console.WriteLine((rows + " records imported in " +  elapsed.Elapsed.TotalSeconds + " seconds."));
            }
        }
    }

2 个答案:

答案 0 :(得分:1)

@MarcosMeli也提到的问题是,这是一个无效的CSV文件。而不只是那一个领域。即使你认为有效的行也没有真正起作用。似乎创建这个CSV文件的人在向哪个字段应该是文本限定的(即“引用”)并且不需要它的方面做了反向。它们的数字字段是文本限定的,文本字段是非限定的。

第1行的工作原因是文本限定条件会查看字段的第一个和最后一个字符。在第1行中,转义引号(即双引号)不是第一个字符,因此我怀疑它是重复的双引号。然而在第2行中,引用了该字段的开头文本,因此第一个字符是引号,然后它们通过复制双引号来转义。它非常容易完成,甚至让FileHelpers与它一起工作现在对它继续正常工作没有多少信心,特别是如果非文本限定的文本字段中有逗号。在这种情况下,它会再次导致字段意外的变化。我知道你说CSV文件来自外部源,你无法控制它,但你真的需要尝试修复它,因为它是完全错误的。这是一个生成它的系统中的一个错误,需要修复它。

目前,您可以将所有文本字段设置为非文本限定。但是,您可能需要添加一个步骤,用一个双引号替换所有双引号。

除了数据格式问题,并且没有从FileHelpers中取出任何东西,因为它看起来像一个有趣且有用的库,我会说你不需要需要 FileHelpers才能读取文本文件,逐行(最小内存占用)并将其批处理到SQL Server。事实上,你可以做所有这一切:

  • 跳过具有单独的临时表(即[rawdata])的步骤,而是直接将行发送到同步存储过程中
  • 在应用层执行基本数据类型验证,并发送强类型的数据行(而不是传入所有VARCHAR / NVARCHAR字段)。

怎么样?使用表值参数,方法是使用IEnumerable<SqlDataRecord>方法(而 DataTable方法)。我在这里用几个答案详细说明了这个技术:

答案 1 :(得分:0)

问题是CSV无效,只有在引用字段时才能转义引号

价值:

,""U"" channel for connecting two 29"" A Depth griddles,

要正确解析它必须是

,"""U"" channel for connecting two 29"" A Depth griddles",

如何从规范字段中删除FieldQuoted?

  public string Specification;