SQL Server批量插入带有不一致引号的CSV文件

时间:2009-04-23 15:47:41

标签: sql-server csv bulkinsert

BULK INSERT(SQL Server)是否可以使用CSV文件,其中的字段仅被引号包围:OCCASSIONALLY?具体来说,引号只包含那些包含“,”的字段。

换句话说,我的数据看起来像这样(第一行包含标题):

id, company, rep, employees
729216,INGRAM MICRO INC.,"Stuart, Becky",523
729235,"GREAT PLAINS ENERGY, INC.","Nelson, Beena",114
721177,GEORGE WESTON BAKERIES INC,"Hogan, Meg",253

由于引号不一致,我不能使用'“,”作为分隔符,我不知道如何创建一个可以解释此问题的格式文件。

我尝试使用','作为分隔符并将其加载到临时表中,其中每列都是varchar,然后使用一些kludgy处理来去除引号,但这也不起作用,因为包含的字段','分为多列。

不幸的是,我无法预先操作CSV文件。

这是绝望的吗?

非常感谢任何建议。

顺便说一句,我看到了这篇文章SQL bulk import from csv,但在这种情况下,每个字段都始终用引号括起来。因此,在这种情况下,他可以使用','作为分隔符,然后删除引号。

17 个答案:

答案 0 :(得分:19)

从MSDN:

无法对此文件执行批量插入

要用作批量导入的数据文件,CSV文件必须符合以下限制:

  • 数据字段从不包含字段终止符。
  • 数据字段中的所有值都没有用引号(​​“”)括起来。

http://msdn.microsoft.com/en-us/library/ms188609.aspx

一些简单的文本处理应该是准备导入文件所需的全部内容。或者,您的用户可能需要根据se指南格式化文件,或者使用逗号之外的其他内容作为分隔符(例如|)

答案 1 :(得分:18)

您需要预处理文件,期间。

如果你真的需要这样做,这是代码。我这样写是因为我别无选择。它是实用程序代码,我并不以此为荣,但它确实有效。方法不是让SQL理解引用的字段,而是操纵文件使用完全不同的分隔符。

编辑:这是github repo中的代码。它已得到改进,现在还有单元测试! https://github.com/chrisclark/Redelim-it

此函数接受一个输入文件,并使用新的分隔符替换所有字段分隔逗号(引号文本字段中的逗号,只是实际的分隔符号)。然后,您可以告诉sql server使用新的字段分隔符而不是逗号。在此处函数的版本中,占位符为< TMP > (我确信这不会出现在原始的csv中 - 如果有的话,支持爆炸)。

因此在运行此函数后,您可以通过执行以下操作来导入sql:

BULK INSERT MyTable
FROM 'C:\FileCreatedFromThisFunction.csv'
WITH
(
FIELDTERMINATOR = '<*TMP*>',
ROWTERMINATOR = '\n'
)

而且没有进一步的麻烦,我提前向你道歉的可怕,可怕的功能(编辑 - 我发布了一个工作程序,而不只是函数on my blog here):

Private Function CsvToOtherDelimiter(ByVal InputFile As String, ByVal OutputFile As String) As Integer

        Dim PH1 As String = "<*TMP*>"

        Dim objReader As StreamReader = Nothing
        Dim count As Integer = 0 'This will also serve as a primary key'
        Dim sb As New System.Text.StringBuilder

        Try
            objReader = New StreamReader(File.OpenRead(InputFile), System.Text.Encoding.Default)
        Catch ex As Exception
            UpdateStatus(ex.Message)
        End Try

        If objReader Is Nothing Then
            UpdateStatus("Invalid file: " & InputFile)
            count = -1
            Exit Function
        End If

        'grab the first line
    Dim line = reader.ReadLine()
    'and advance to the next line b/c the first line is column headings
    If hasHeaders Then
        line = Trim(reader.ReadLine)
    End If

    While Not String.IsNullOrEmpty(line) 'loop through each line

        count += 1

        'Replace commas with our custom-made delimiter
        line = line.Replace(",", ph1)

        'Find a quoted part of the line, which could legitimately contain commas.
        'In that case we will need to identify the quoted section and swap commas back in for our custom placeholder.
        Dim starti = line.IndexOf(ph1 & """", 0)
        If line.IndexOf("""",0) = 0 then starti=0

        While starti > -1 'loop through quoted fields

            Dim FieldTerminatorFound As Boolean = False

            'Find end quote token (originally  a ",)
            Dim endi As Integer = line.IndexOf("""" & ph1, starti)

            If endi < 0 Then
                FieldTerminatorFound = True
                If endi < 0 Then endi = line.Length - 1
            End If

            While Not FieldTerminatorFound

                'Find any more quotes that are part of that sequence, if any
                Dim backChar As String = """" 'thats one quote
                Dim quoteCount = 0
                While backChar = """"
                    quoteCount += 1
                    backChar = line.Chars(endi - quoteCount)
                End While

                If quoteCount Mod 2 = 1 Then 'odd number of quotes. real field terminator
                    FieldTerminatorFound = True
                Else 'keep looking
                    endi = line.IndexOf("""" & ph1, endi + 1)
                End If
            End While

            'Grab the quoted field from the line, now that we have the start and ending indices
            Dim source = line.Substring(starti + ph1.Length, endi - starti - ph1.Length + 1)

            'And swap the commas back in
            line = line.Replace(source, source.Replace(ph1, ","))

            'Find the next quoted field
            '                If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail
            starti = line.IndexOf(ph1 & """", starti + ph1.Length)

        End While

            line = objReader.ReadLine

        End While

        objReader.Close()

        SaveTextToFile(sb.ToString, OutputFile)

        Return count

    End Function

答案 2 :(得分:8)

我发现Chris的答案很有帮助,但我想使用T-SQL(而不是使用CLR)在SQL Server中运行它,所以我将他的代码转换为T-SQL代码。但后来我通过将所有内容包装在执行以下操作的存储过程中更进了一步:

  1. 使用批量插入初始导入CSV文件
  2. 使用Chris的代码清理线条
  3. 以表格格式返回结果
  4. 根据我的需要,我通过删除值周围的引号并将两个双引号转换为一个双引号来进一步清理行(我认为这是正确的方法)。

    CREATE PROCEDURE SSP_CSVToTable
    
    -- Add the parameters for the stored procedure here
    @InputFile nvarchar(4000)
    , @FirstLine int
    
    AS
    
    BEGIN
    
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;
    
    --convert the CSV file to a table
    --clean up the lines so that commas are handles correctly
    
    DECLARE @sql nvarchar(4000)
    DECLARE @PH1 nvarchar(50)
    DECLARE @LINECOUNT int -- This will also serve as a primary key
    DECLARE @CURLINE int
    DECLARE @Line nvarchar(4000)
    DECLARE @starti int
    DECLARE @endi int
    DECLARE @FieldTerminatorFound bit
    DECLARE @backChar nvarchar(4000)
    DECLARE @quoteCount int
    DECLARE @source nvarchar(4000)
    DECLARE @COLCOUNT int
    DECLARE @CURCOL int
    DECLARE @ColVal nvarchar(4000)
    
    -- new delimiter
    SET @PH1 = '†'
    
    -- create single column table to hold each line of file
    CREATE TABLE [#CSVLine]([line] nvarchar(4000))
    
    -- bulk insert into temp table
    -- cannot use variable path with bulk insert
    -- so we must run using dynamic sql
    SET @Sql = 'BULK INSERT #CSVLine
    FROM ''' + @InputFile + '''
    WITH
    (
    FIRSTROW=' + CAST(@FirstLine as varchar) + ',
    FIELDTERMINATOR = ''\n'',
    ROWTERMINATOR = ''\n''
    )'
    
    -- run dynamic statement to populate temp table
    EXEC(@sql)
    
    -- get number of lines in table
    SET @LINECOUNT = @@ROWCOUNT
    
    -- add identity column to table so that we can loop through it
    ALTER TABLE [#CSVLine] ADD [RowId] [int] IDENTITY(1,1) NOT NULL
    
    IF @LINECOUNT > 0
    BEGIN
        -- cycle through each line, cleaning each line
        SET @CURLINE = 1
        WHILE @CURLINE <= @LINECOUNT
        BEGIN
            -- get current line
            SELECT @line = line
              FROM #CSVLine
             WHERE [RowId] = @CURLINE
    
            -- Replace commas with our custom-made delimiter
            SET @Line = REPLACE(@Line, ',', @PH1)
    
            -- Find a quoted part of the line, which could legitimately contain commas.
            -- In that case we will need to identify the quoted section and swap commas back in for our custom placeholder.
            SET @starti = CHARINDEX(@PH1 + '"' ,@Line, 0)
            If CHARINDEX('"', @Line, 0) = 0 SET @starti = 0
    
            -- loop through quoted fields
            WHILE @starti > 0 
            BEGIN
                SET @FieldTerminatorFound = 0
    
                -- Find end quote token (originally  a ",)
                SET @endi = CHARINDEX('"' + @PH1, @Line, @starti)  -- sLine.IndexOf("""" & PH1, starti)
    
                IF @endi < 1
                BEGIN
                    SET @FieldTerminatorFound = 1
                    If @endi < 1 SET @endi = LEN(@Line) - 1
                END
    
                WHILE @FieldTerminatorFound = 0
                BEGIN
                    -- Find any more quotes that are part of that sequence, if any
                    SET @backChar = '"' -- thats one quote
                    SET @quoteCount = 0
    
                    WHILE @backChar = '"'
                    BEGIN
                        SET @quoteCount = @quoteCount + 1
                        SET @backChar = SUBSTRING(@Line, @endi-@quoteCount, 1) -- sLine.Chars(endi - quoteCount)
                    END
    
                    IF (@quoteCount % 2) = 1
                    BEGIN
                        -- odd number of quotes. real field terminator
                        SET @FieldTerminatorFound = 1
                    END
                    ELSE 
                    BEGIN
                        -- keep looking
                        SET @endi = CHARINDEX('"' + @PH1, @Line, @endi + 1) -- sLine.IndexOf("""" & PH1, endi + 1)
                    END
    
                END
    
                -- Grab the quoted field from the line, now that we have the start and ending indices
                SET @source = SUBSTRING(@Line, @starti + LEN(@PH1), @endi - @starti - LEN(@PH1) + 1) 
                -- sLine.Substring(starti + PH1.Length, endi - starti - PH1.Length + 1)
    
                -- And swap the commas back in
                SET @Line = REPLACE(@Line, @source, REPLACE(@source, @PH1, ','))
                --sLine.Replace(source, source.Replace(PH1, ","))
    
                -- Find the next quoted field
                -- If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail
                SET @starti = CHARINDEX(@PH1 + '"', @Line, @starti + LEN(@PH1))
                --sLine.IndexOf(PH1 & """", starti + PH1.Length)
    
            END
    
            -- get table based on current line
            IF OBJECT_ID('tempdb..#Line') IS NOT NULL
                DROP TABLE #Line
    
            -- converts a delimited list into a table
            SELECT *
            INTO #Line
            FROM dbo.iter_charlist_to_table(@Line,@PH1)
    
            -- get number of columns in line
            SET @COLCOUNT = @@ROWCOUNT
    
            -- dynamically create CSV temp table to hold CSV columns and lines
            -- only need to create once
            IF OBJECT_ID('tempdb..#CSV') IS NULL
            BEGIN
                -- create initial structure of CSV table
                CREATE TABLE [#CSV]([Col1] nvarchar(100))
    
                -- dynamically add a column for each column found in the first line
                SET @CURCOL = 1
                WHILE @CURCOL <= @COLCOUNT
                BEGIN
                    -- first column already exists, don't need to add
                    IF @CURCOL > 1 
                    BEGIN
                        -- add field
                        SET @sql = 'ALTER TABLE [#CSV] ADD [Col' + Cast(@CURCOL as varchar) + '] nvarchar(100)'
    
                        --print @sql
    
                        -- this adds the fields to the temp table
                        EXEC(@sql)
                    END
    
                    -- go to next column
                    SET @CURCOL = @CURCOL + 1
                END
            END
    
            -- build dynamic sql to insert current line into CSV table
            SET @sql = 'INSERT INTO [#CSV] VALUES('
    
            -- loop through line table, dynamically adding each column value
            SET @CURCOL = 1
            WHILE @CURCOL <= @COLCOUNT
            BEGIN
                -- get current column
                Select @ColVal = str 
                  From #Line 
                 Where listpos = @CURCOL
    
                IF LEN(@ColVal) > 0
                BEGIN
                    -- remove quotes from beginning if exist
                    IF LEFT(@ColVal,1) = '"'
                        SET @ColVal = RIGHT(@ColVal, LEN(@ColVal) - 1)
    
                    -- remove quotes from end if exist
                    IF RIGHT(@ColVal,1) = '"'
                        SET @ColVal = LEFT(@ColVal, LEN(@ColVal) - 1)
                END
    
                -- write column value
                -- make value sql safe by replacing single quotes with two single quotes
                -- also, replace two double quotes with a single double quote
                SET @sql = @sql + '''' + REPLACE(REPLACE(@ColVal, '''',''''''), '""', '"') + ''''
    
                -- add comma separater except for the last record
                IF @CURCOL <> @COLCOUNT
                    SET @sql = @sql + ','
    
                -- go to next column
                SET @CURCOL = @CURCOL + 1
            END
    
            -- close sql statement
            SET @sql = @sql + ')'
    
            --print @sql
    
            -- run sql to add line to table
            EXEC(@sql)
    
            -- move to next line
            SET @CURLINE = @CURLINE + 1
    
        END
    
    END
    
    -- return CSV table
    SELECT * FROM [#CSV]
    
    END
    
    GO
    

    存储过程使用这个帮助函数将字符串解析成表(感谢Erland Sommarskog!):

    CREATE FUNCTION [dbo].[iter_charlist_to_table]
                    (@list      ntext,
                     @delimiter nchar(1) = N',')
         RETURNS @tbl TABLE (listpos int IDENTITY(1, 1) NOT NULL,
                             str     varchar(4000),
                             nstr    nvarchar(2000)) AS
    
    BEGIN
      DECLARE @pos      int,
              @textpos  int,
              @chunklen smallint,
              @tmpstr   nvarchar(4000),
              @leftover nvarchar(4000),
              @tmpval   nvarchar(4000)
    
      SET @textpos = 1
      SET @leftover = ''
      WHILE @textpos <= datalength(@list) / 2
      BEGIN
         SET @chunklen = 4000 - datalength(@leftover) / 2
         SET @tmpstr = @leftover + substring(@list, @textpos, @chunklen)
         SET @textpos = @textpos + @chunklen
    
         SET @pos = charindex(@delimiter, @tmpstr)
    
         WHILE @pos > 0
         BEGIN
            SET @tmpval = ltrim(rtrim(left(@tmpstr, @pos - 1)))
            INSERT @tbl (str, nstr) VALUES(@tmpval, @tmpval)
            SET @tmpstr = substring(@tmpstr, @pos + 1, len(@tmpstr))
            SET @pos = charindex(@delimiter, @tmpstr)
         END
    
         SET @leftover = @tmpstr
      END
    
      INSERT @tbl(str, nstr) VALUES (ltrim(rtrim(@leftover)), ltrim(rtrim(@leftover)))
    
    RETURN
    
    END
    

    以下是我如何通过T-SQL调用它。在这种情况下,我将结果插入到临时表中,因此我首先创建临时表:

        -- create temp table for file import
    CREATE TABLE #temp
    (
        CustomerCode nvarchar(100) NULL,
        Name nvarchar(100) NULL,
        [Address] nvarchar(100) NULL,
        City nvarchar(100) NULL,
        [State] nvarchar(100) NULL,
        Zip nvarchar(100) NULL,
        OrderNumber nvarchar(100) NULL,
        TimeWindow nvarchar(100) NULL,
        OrderType nvarchar(100) NULL,
        Duration nvarchar(100) NULL,
        [Weight] nvarchar(100) NULL,
        Volume nvarchar(100) NULL
    )
    
    -- convert the CSV file into a table
    INSERT #temp
    EXEC [dbo].[SSP_CSVToTable]
         @InputFile = @FileLocation
        ,@FirstLine = @FirstImportRow
    

    我没有对性能进行过多次测试,但它可以很好地满足我的需求 - 导入少于1000行的CSV文件。但是,它可能会阻塞非常大的文件。

    希望其他人也觉得它很有用。

    干杯!

答案 3 :(得分:5)

我还创建了一个将CSV转换为批量插入的可用格式的函数。我使用Chris Clark的回答帖子作为创建以下C#函数的起点。

我最终使用正则表达式来查找字段。然后我逐行重新创建文件,在我去的时候将它写入一个新文件,从而避免将整个文件加载到内存中。

private void CsvToOtherDelimiter(string CSVFile, System.Data.Linq.Mapping.MetaTable tbl)
{
    char PH1 = '|';
    StringBuilder ln;

    //Confirm file exists. Else, throw exception
    if (File.Exists(CSVFile))
    {
        using (TextReader tr = new StreamReader(CSVFile))
        {
            //Use a temp file to store our conversion
            using (TextWriter tw = new StreamWriter(CSVFile + ".tmp"))
            {
                string line = tr.ReadLine();
                //If we have already converted, no need to reconvert.
                //NOTE: We make the assumption here that the input header file 
                //      doesn't have a PH1 value unless it's already been converted.
                if (line.IndexOf(PH1) >= 0)
                {
                    tw.Close();
                    tr.Close();
                    File.Delete(CSVFile + ".tmp");
                    return;
                }
                //Loop through input file
                while (!string.IsNullOrEmpty(line))
                {
                    ln = new StringBuilder();

                    //1. Use Regex expression to find comma separated values 
                    //using quotes as optional text qualifiers 
                    //(what MS EXCEL does when you import a csv file)
                    //2. Remove text qualifier quotes from data
                    //3. Replace any values of PH1 found in column data 
                    //with an equivalent character
                    //Regex:  \A[^,]*(?=,)|(?:[^",]*"[^"]*"[^",]*)+|[^",]*"[^"]*\Z|(?<=,)[^,]*(?=,)|(?<=,)[^,]*\Z|\A[^,]*\Z
                    List<string> fieldList = Regex.Matches(line, @"\A[^,]*(?=,)|(?:[^"",]*""[^""]*""[^"",]*)+|[^"",]*""[^""]*\Z|(?<=,)[^,]*(?=,)|(?<=,)[^,]*\Z|\A[^,]*\Z")
                            .Cast<Match>()
                            .Select(m => RemoveCSVQuotes(m.Value).Replace(PH1, '¦'))
                            .ToList<string>();

                    //Add the list of fields to ln, separated by PH1
                    fieldList.ToList().ForEach(m => ln.Append(m + PH1));

                    //Write to file. Don't include trailing PH1 value.
                    tw.WriteLine(ln.ToString().Substring(0, ln.ToString().LastIndexOf(PH1)));

                    line = tr.ReadLine();
                }


                tw.Close();
            }
            tr.Close();

            //Optional:  replace input file with output file
            File.Delete(CSVFile);
            File.Move(CSVFile + ".tmp", CSVFile);
        }
    }
    else
    {
        throw new ArgumentException(string.Format("Source file {0} not found", CSVFile));
    }
}
//The output file no longer needs quotes as a text qualifier, so remove them
private string RemoveCSVQuotes(string value)
{
    //if is empty string, then remove double quotes
    if (value == @"""""") value = "";
    //remove any double quotes, then any quotes on ends
    value = value.Replace(@"""""", @"""");
    if (value.Length >= 2)
        if (value.Substring(0, 1) == @"""")
            value = value.Substring(1, value.Length - 2);
    return value;
}

答案 4 :(得分:3)

这个问题通常是由用户将Excel文件导出为CSV引起的。

解决此问题的方法有两种:

  1. 使用宏as per Microsoft's suggestion
  2. 从Excel导出
  3. 或者非常简单的方法:
    • 在Excel中打开CSV。
    • 另存为Excel文件。 (.xls或.xlsx)。
    • 将该文件导入SQL Server an Excel file
    • 对自己轻笑,因为你没有像上面的解决方案那样编码.... muhahahaha
  4. Import as Excel file

    如果你真的想编写脚本(将CSV保存为Excel后),这里有一些SQL

    select * 
    into SQLServerTable FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 
        'Excel 8.0;Database=D:\testing.xls;HDR=YES', 
        'SELECT * FROM [Sheet1$]')
    

答案 5 :(得分:2)

另一种方法 - 假设您没有大量字段或期望在数据本身中出现引号将是使用REPLACE函数。

UPDATE dbo.tablename 
        SET dbo.tablename.target_field = REPLACE(t.importedValue, '"', '')
FROM #tempTable t
WHERE dbo.tablename.target_id = t.importedID;

我用过它。我无法就性能做出任何声明。这只是解决问题的快捷方式。

答案 6 :(得分:2)

这可能比您愿意使用的更复杂或更复杂,但是......

如果您可以实现将行解析为VB或C#中的字段的逻辑,则可以使用CLR表值函数(TVF)来执行此操作。

当你想让一些C#或VB代码将数据分成列和/或调整值时,CLR TVF可以很好地从外部源读取数据。

您必须愿意将CLR程序集添加到数据库中(并允许外部或不安全的操作,以便它可以打开文件)。这可能会有点复杂或涉及,但可能值得你获得灵活性。

我有一些大文件需要尽快定期加载到表中,但需要在某些列上执行某些代码转换,并且需要特殊处理来加载原本会导致数据类型错误的值批量插入。

简而言之,CLR TVF允许您对文件的每一行运行C#或VB代码,并使用批量插入,如性能(尽管您可能需要担心日志记录)。 SQL Server文档中的示例允许您创建一个TVF,以便从可用作起点的事件日志中进行读取。

请注意,CLR TVF中的代码只能在处理第一行之前的初始阶段访问数据库(例如,没有查找每一行 - 您使用普通的TVF来执行此类操作)。根据您的问题,您似乎不需要这样做。

另请注意,每个CLR TVF必须明确指定其输出列,因此您无法编写可重用于您可能拥有的每个不同csv文件的通用列表。

您可以编写一个CLR TVF来从文件中读取整行,返回一列结果集,然后使用普通TVF从每种类型的文件中读取。这需要代码解析要在T-SQL中编写的每一行,但避免必须编写许多CLR TVF。

答案 7 :(得分:1)

我在像麦克这样的领域','456 2nd St,Apt 5'时发现了一些问题。

此问题的解决方案是@ http://crazzycoding.blogspot.com/2010/11/import-csv-file-into-sql-server-using.html

谢谢, - Ashish

答案 8 :(得分:1)

克里斯, 非常感谢!你保存了我的饼干!我不敢相信当XL做这么好的工作时,批量装载机不能处理这种情况。这些家伙不会在大厅里看到对方??? 无论如何......我需要一个ConsoleApplication版本,所以这就是我一起攻击的内容。它既邋and又脏,但它就像一个冠军!我对分隔符进行了硬编码并将标题注释掉,因为我的应用程序不需要它。

我希望我也可以在这里贴上一个很棒的大啤酒。

Geeze,我不知道为什么End Module和Public Class不在代码块之内...... srry!

    Module Module1

    Sub Main()

        Dim arrArgs() As String = Command.Split(",")
        Dim i As Integer
        Dim obj As New ReDelimIt()

        Console.Write(vbNewLine & vbNewLine)

        If arrArgs(0) <> Nothing Then
            For i = LBound(arrArgs) To UBound(arrArgs)
                Console.Write("Parameter " & i & " is " & arrArgs(i) & vbNewLine)
            Next


            obj.ProcessFile(arrArgs(0), arrArgs(1))

        Else
            Console.Write("Usage Test1 <inputfile>,<outputfile>")
        End If

        Console.Write(vbNewLine & vbNewLine)
    End Sub

 End Module

 Public Class ReDelimIt

    Public Function ProcessFile(ByVal InputFile As String, ByVal OutputFile As String) As Integer

        Dim ph1 As String = "|"

        Dim objReader As System.IO.StreamReader = Nothing
        Dim count As Integer = 0 'This will also serve as a primary key
        Dim sb As New System.Text.StringBuilder

        Try
            objReader = New System.IO.StreamReader(System.IO.File.OpenRead(InputFile), System.Text.Encoding.Default)
        Catch ex As Exception
            MsgBox(ex.Message)
        End Try

        If objReader Is Nothing Then
            MsgBox("Invalid file: " & InputFile)
            count = -1
            Exit Function
        End If

        'grab the first line
        Dim line = objReader.ReadLine()
        'and advance to the next line b/c the first line is column headings
        'Removed Check Headers can put in if needed.
        'If chkHeaders.Checked Then
        'line = objReader.ReadLine
        'End If

        While Not String.IsNullOrEmpty(line) 'loop through each line

            count += 1

            'Replace commas with our custom-made delimiter
            line = line.Replace(",", ph1)

            'Find a quoted part of the line, which could legitimately contain commas.
            'In that case we will need to identify the quoted section and swap commas back in for our custom placeholder.
            Dim starti = line.IndexOf(ph1 & """", 0)

            While starti > -1 'loop through quoted fields

                'Find end quote token (originally  a ",)
                Dim endi = line.IndexOf("""" & ph1, starti)

                'The end quote token could be a false positive because there could occur a ", sequence.
                'It would be double-quoted ("",) so check for that here
                Dim check1 = line.IndexOf("""""" & ph1, starti)

                'A """, sequence can occur if a quoted field ends in a quote.
                'In this case, the above check matches, but we actually SHOULD process this as an end quote token
                Dim check2 = line.IndexOf("""""""" & ph1, starti)

                'If we are in the check1 ("",) situation, keep searching for an end quote token
                'The +1 and +2 accounts for the extra length of the checked sequences
                While (endi = check1 + 1 AndAlso endi <> check2 + 2) 'loop through "false" tokens in the quoted fields
                    endi = line.IndexOf("""" & ph1, endi + 1)
                    check1 = line.IndexOf("""""" & ph1, check1 + 1)
                    check2 = line.IndexOf("""""""" & ph1, check2 + 1)
                End While

                'We have searched for an end token (",) but can't find one, so that means the line ends in a "
                If endi < 0 Then endi = line.Length - 1

                'Grab the quoted field from the line, now that we have the start and ending indices
                Dim source = line.Substring(starti + ph1.Length, endi - starti - ph1.Length + 1)

                'And swap the commas back in
                line = line.Replace(source, source.Replace(ph1, ","))

                'Find the next quoted field
                If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail
                starti = line.IndexOf(ph1 & """", starti + ph1.Length)

            End While

            'Add our primary key to the line
            ' Removed for now
            'If chkAddKey.Checked Then
            'line = String.Concat(count.ToString, ph1, line)
            ' End If

            sb.AppendLine(line)

            line = objReader.ReadLine

        End While

        objReader.Close()

        SaveTextToFile(sb.ToString, OutputFile)

        Return count

    End Function

    Public Function SaveTextToFile(ByVal strData As String, ByVal FullPath As String) As Boolean
        Dim bAns As Boolean = False
        Dim objReader As System.IO.StreamWriter
        Try
            objReader = New System.IO.StreamWriter(FullPath, False, System.Text.Encoding.Default)
            objReader.Write(strData)
            objReader.Close()
            bAns = True
        Catch Ex As Exception
            Throw Ex
        End Try
        Return bAns
    End Function

End Class

答案 9 :(得分:1)

你不仅应该能够指定字段分隔符,它应该是[,],而且还要指定文​​本限定符,在这种情况下它将是[“]。使用[]括起来,这样就不会混淆”。

答案 10 :(得分:1)

需要预处理。

PowerShell函数Import-CSV支持这种类型的文件。然后,Export-CSV会将每个值封装在引号中。

单个文件:

Import-Csv import.csv | Export-Csv -NoTypeInformation export.csv

要合并路径为C:\ year \ input_date.csv的许多文件:

$inputPath = 'C:\????\input_????????.csv'
$outputPath = 'C:\merged.csv'
Get-ChildItem $inputPath |
  Select -ExpandProperty FullName |
  Import-CSV |
  Export-CSV -NoTypeInformation -Path $outputPath

PowerShell通常可以使用PowerShell代理帐户与SQL Server代理一起运行。

如果分隔符处理不当,请明确指定另一个分隔符。

 Export-CSV -NoTypeInformation -Delimiter ';' -Path $outputPath

答案 11 :(得分:0)

此代码适用于我:

 public bool CSVFileRead(string fullPathWithFileName, string fileNameModified, string tableName)
    {
        SqlConnection con = new SqlConnection(ConfigurationSettings.AppSettings["dbConnectionString"]);
        string filepath = fullPathWithFileName;
        StreamReader sr = new StreamReader(filepath);
        string line = sr.ReadLine();
        string[] value = line.Split(',');
        DataTable dt = new DataTable();
        DataRow row;
        foreach (string dc in value)
        {
            dt.Columns.Add(new DataColumn(dc));
        }
        while (!sr.EndOfStream)
        {
            //string[] stud = sr.ReadLine().Split(',');
            //for (int i = 0; i < stud.Length; i++)
            //{
            //    stud[i] = stud[i].Replace("\"", "");
            //}
            //value = stud;
            value = sr.ReadLine().Split(',');
            if (value.Length == dt.Columns.Count)
            {
                row = dt.NewRow();
                row.ItemArray = value;
                dt.Rows.Add(row);
            }
        }
        SqlBulkCopy bc = new SqlBulkCopy(con.ConnectionString, SqlBulkCopyOptions.TableLock);
        bc.DestinationTableName = tableName;
        bc.BatchSize = dt.Rows.Count;
        con.Open();
        bc.WriteToServer(dt);
        bc.Close();
        con.Close();

        return true;
    }

答案 12 :(得分:0)

我把下面的内容放在一起解决我的问题。我需要预处理非常大的文件并整理出不一致的引用。只需将其粘贴到一个空白的C#应用​​程序中,根据您的要求设置consts即可。这适用于超过10 GB的非常大的CSV。

namespace CsvFixer
{
    using System.IO;
    using System.Text;

    public class Program
    {
        private const string delimiter = ",";
        private const string quote = "\"";
        private const string inputFile = "C:\\temp\\input.csv";
        private const string fixedFile = "C:\\temp\\fixed.csv";

        /// <summary>
        /// This application fixes inconsistently quoted csv (or delimited) files with support for very large file sizes.
        /// For example :  1223,5235234,8674,"Houston","London, UK",3425,Other text,stuff 
        /// Must become :  "1223","5235234","8674","Houston","London, UK","3425","Other text","stuff" 
        /// </summary>
        /// <param name="args"></param>
        static void Main(string[] args)
        {
            // Use streaming to allow for large files. 
            using (StreamWriter outfile = new StreamWriter(fixedFile))
            {
                using (FileStream fs = File.Open(inputFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
                using (BufferedStream bs = new BufferedStream(fs))
                using (StreamReader sr = new StreamReader(bs))
                {
                    string currentLine;

                    // Read each input line in and write each fixed line out
                    while ((currentLine = sr.ReadLine()) != null)
                    {
                        outfile.WriteLine(FixLine(currentLine, delimiter, quote));
                    }
                }
            }
        }

        /// <summary>
        /// Fully quote a partially quoted line 
        /// </summary>
        /// <param name="line">The partially quoted line</param>
        /// <returns>The fully quoted line</returns>
        private static string FixLine(string line, string delimiter, string quote)
        {
            StringBuilder fixedLine = new StringBuilder();

            // Split all on the delimiter, acceptinmg that some quoted fields 
            // that contain the delimiter wwill be split in to many pieces.
            string[] fieldParts = line.Split(delimiter.ToCharArray());

            // Loop through the fields (or parts of fields)
            for (int i = 0; i < fieldParts.Length; i++)
            {
                string currentFieldPart = fieldParts[i];

                // If the current field part starts and ends with a quote it is a field, so write it to the result
                if (currentFieldPart.StartsWith(quote) && currentFieldPart.EndsWith(quote))
                {
                    fixedLine.Append(string.Format("{0}{1}", currentFieldPart, delimiter));
                }
                // else if it starts with a quote but doesnt end with one, it is part of a lionger field.
                else if (currentFieldPart.StartsWith(quote))
                {
                    // Add the start of the field
                    fixedLine.Append(string.Format("{0}{1}", currentFieldPart, delimiter));

                    // Append any additional field parts (we will only hit the end of the field when 
                    // the last field part finishes with a quote. 
                    while (!fieldParts[++i].EndsWith(quote))
                    {
                        fixedLine.Append(string.Format("{0}{1}", fieldParts[i], delimiter));
                    }

                    // Append the last field part - i.e. the part containing the closing quote
                    fixedLine.Append(string.Format("{0}{1}", fieldParts[i], delimiter));
                }
                else
                {
                    // The field has no quotes, add the feildpart with quote as bookmarks 
                    fixedLine.Append(string.Format("{0}{1}{0}{2}", quote, currentFieldPart, delimiter));
                }
            }

            // Return the fixed string 
            return fixedLine.ToString();
        }
    }
}

答案 13 :(得分:0)

从实践中说起......在SQL Server 2017中,您可以提供双引号的“文本限定符”,并且它不会“取代”您的分隔符。我批量插入几个看起来就像OP的例子的文件。我的文件是“.csv”,它们具有不一致的文本限定符,仅当值包含逗号时才会找到。我不知道这个特性/功能的SQL Server版本是什么开始工作的,但我知道它适用于SQL Server 2017 Standard。很简单。

答案 14 :(得分:0)

您不需要在SQL外部预处理文件。

对我有用的是改变

ROWTERMINATOR ='\ n'

ROWTERMINATOR ='0x0a'。

答案 15 :(得分:0)

SQL 2017中添加了一个新选项,用于为WITH ( FORMAT='CSV')命令指定BULK INSERT

来自a Microsoft GitHub page的示例:

BULK INSERT Product
FROM 'product.csv'
WITH (  DATA_SOURCE = 'MyAzureBlobStorage',
        FORMAT='CSV', CODEPAGE = 65001, --UTF-8 encoding
        FIRSTROW=2,
        ROWTERMINATOR = '0x0a',
        TABLOCK); 

该选项的详细文档可在此处找到: https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017#input-file-format-options

我已成功将此选项与包含可选引号的CSV数据一起使用,就像OP给出的示例一样。

答案 16 :(得分:-1)

创建一个VB.NET程序,使用4.5 Framework TextFieldParser转换为新的Delimiter 这将自动处理文本限定字段

修改上面的代码以使用内置的TextFieldParser

模块模块1

Sub Main()

    Dim arrArgs() As String = Command.Split(",")
    Dim i As Integer
    Dim obj As New ReDelimIt()
    Dim InputFile As String = ""
    Dim OutPutFile As String = ""
    Dim NewDelimiter As String = ""

    Console.Write(vbNewLine & vbNewLine)

    If Not IsNothing(arrArgs(0)) Then
        For i = LBound(arrArgs) To UBound(arrArgs)
            Console.Write("Parameter " & i & " is " & arrArgs(i) & vbNewLine)
        Next
        InputFile = arrArgs(0)
        If Not IsNothing(arrArgs(1)) Then
            If Not String.IsNullOrEmpty(arrArgs(1)) Then
                OutPutFile = arrArgs(1)
            Else
                OutPutFile = InputFile.Replace("csv", "pipe")
            End If
        Else
            OutPutFile = InputFile.Replace("csv", "pipe")
        End If
        If Not IsNothing(arrArgs(2)) Then
            If Not String.IsNullOrEmpty(arrArgs(2)) Then
                NewDelimiter = arrArgs(2)
            Else
                NewDelimiter = "|"
            End If
        Else
            NewDelimiter = "|"
        End If
        obj.ConvertCSVFile(InputFile,OutPutFile,NewDelimiter)

    Else
        Console.Write("Usage ChangeFileDelimiter <inputfile>,<outputfile>,<NewDelimiter>")
    End If
    obj = Nothing
    Console.Write(vbNewLine & vbNewLine)
    'Console.ReadLine()

End Sub

结束模块

Public Class ReDelimIt

Public Function ConvertCSVFile(ByVal InputFile As String, ByVal OutputFile As String, Optional ByVal NewDelimiter As String = "|") As Integer
    Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(InputFile)
        MyReader.TextFieldType = FileIO.FieldType.Delimited
        MyReader.SetDelimiters(",")
        Dim sb As New System.Text.StringBuilder
        Dim strLine As String = ""
        Dim currentRow As String()
        While Not MyReader.EndOfData
            Try
                currentRow = MyReader.ReadFields()
                Dim currentField As String
                strLine = ""
                For Each currentField In currentRow
                    'MsgBox(currentField)
                    If strLine = "" Then
                        strLine = strLine & currentField
                    Else
                        strLine = strLine & NewDelimiter & currentField
                    End If
                Next
                sb.AppendLine(strLine)
            Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
                'MsgBox("Line " & ex.Message & "is not valid and will be skipped.")
                Console.WriteLine("Line " & ex.Message & "is not valid and will be skipped.")
            End Try
        End While
        SaveTextToFile(sb.ToString, OutputFile)
    End Using

    Return Err.Number

End Function

Public Function SaveTextToFile(ByVal strData As String, ByVal FullPath As String) As Boolean
    Dim bAns As Boolean = False
    Dim objReader As System.IO.StreamWriter
    Try
        If FileIO.FileSystem.FileExists(FullPath) Then
            Kill(FullPath)
        End If
        objReader = New System.IO.StreamWriter(FullPath, False, System.Text.Encoding.Default)
        objReader.Write(strData)
        objReader.Close()
        bAns = True
    Catch Ex As Exception
        Throw Ex
    End Try
    Return bAns
End Function

结束班