解析CSV,在VBA中忽略字符串文字中的逗号?

时间:2011-07-21 18:20:25

标签: excel vba csv ms-access split

我有一个每天运行的VBA应用程序。它会检查自动下载CSV的文件夹,并将其内容添加到数据库中。在解析它们时,我意识到某些值将逗号作为其名称的一部分。这些值包含在字符串文字中。

所以我试图弄清楚如何解析这个CSV并忽略字符串文字中包含的逗号。例如......

1,2,3,"This should,be one part",5,6,7 Should return 

1
2
3
"This should,be one part"
5
6
7

我一直在使用VBA的split()函数,因为我不想重新发明轮子,但如果我不得不想我会做其他的事情。

任何建议都将不胜感激。

11 个答案:

答案 0 :(得分:13)

解决这个问题的第一种方法是从csv文件(int,int,“String literal,最多只有一个逗号”等)查看该行的结构。 一个天真的解决方案是(假设该行没有任何分号)

Function splitLine1(line As String) As String()

   Dim temp() As String
   'Splits the line in three. The string delimited by " will be at temp(1)
   temp = Split(line, Chr(34)) 'chr(34) = "

   'Replaces the commas in the numeric fields by semicolons
   temp(0) = Replace(temp(0), ",", ";")
   temp(2) = Replace(temp(2), ",", ";")

   'Joins the temp array with quotes and then splits the result using the semicolons
   splitLine1 = Split(Join(temp, Chr(34)), ";")

End Function

此功能仅解决此特定问题。 另一种方法是使用VBScript中的正则表达式对象。

Function splitLine2(line As String) As String()

    Dim regex As Object
    Set regex = CreateObject("vbscript.regexp")
    regex.IgnoreCase = True
    regex.Global = True

    'This pattern matches only commas outside quotes
    'Pattern = ",(?=([^"]*"[^"]*")*(?![^"]*"))"
    regex.Pattern = ",(?=([^" & Chr(34) & "]*" & Chr(34) & "[^" & Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"

    'regex.replaces will replace the commas outside quotes with semicolons and then the
    'Split function will split the result based on the semicollons
    splitLine2 = Split(regex.Replace(line, ";"), ";")

End Function

这种方法看起来更加神秘,但并没有取决于线的结构

您可以在VBScript Here

中阅读有关正则表达式模式的更多信息

答案 1 :(得分:11)

@Gimp说......

  

目前的答案中没有足够的细节。

     

我遇到了同样的问题。寻找更多细节   答案。

详细说明@ MRAB的回答:

Function ParseCSV(FileName)
    Dim Regex       'As VBScript_RegExp_55.RegExp
    Dim MatchColl   'As VBScript_RegExp_55.MatchCollection
    Dim Match       'As VBScript_RegExp_55.Match
    Dim FS          'As Scripting.FileSystemObject
    Dim Txt         'As Scripting.TextStream
    Dim CSVLine
    ReDim ToInsert(0)

    Set FS = CreateObject("Scripting.FileSystemObject")
    Set Txt = FS.OpenTextFile(FileName, 1, False, -2)
    Set Regex = CreateObject("VBScript.RegExp")

    Regex.Pattern = """[^""]*""|[^,]*"    '<- MRAB's answer
    Regex.Global = True

    Do While Not Txt.AtEndOfStream
        ReDim ToInsert(0)
        CSVLine = Txt.ReadLine
        For Each Match In Regex.Execute(CSVLine)
            If Match.Length > 0 Then
                ReDim Preserve ToInsert(UBound(ToInsert) + 1)
                ToInsert(UBound(ToInsert) - 1) = Match.Value
            End If
        Next
        InsertArrayIntoDatabase ToInsert
    Loop
    Txt.Close
End Function

您需要为自己的表自定义InsertArrayIntoDatabase Sub。我有几个名为f00,f01等的文本字段......

Sub InsertArrayIntoDatabase(a())
    Dim rs As DAO.Recordset
    Dim i, n
    Set rs = CurrentDb().TableDefs("tbl").OpenRecordset()
    rs.AddNew
    For i = LBound(a) To UBound(a)
        n = "f" & Format(i, "00") 'fields in table are f00, f01, f02, etc..
        rs.Fields(n) = a(i)
    Next
    rs.Update
End Sub

请注意,您应该使用一个全局变量,而不是在CurrentDb()中使用InsertArrayIntoDatabase(),而该变量的设置值为CurrentDb() 之前 {{1运行,因为在循环中运行ParseCSV()非常慢,尤其是在非常大的文件上。

答案 2 :(得分:10)

用于解析CSV行的简单正则表达式(假设引用字段内没有引号)是:

"[^"]*"|[^,]*

每场比赛都会返回一个字段。

答案 3 :(得分:3)

如果您正在使用MS Access表,则只需从磁盘导入文本即可。例如:

''If you have a reference to the Windows Script Host Object Model
Dim fs As New FileSystemObject
Dim ts As TextStream

''For late binding
''Dim fs As Object
''Dim ts As Object
''Set fs=CreateObject("Scripting.FileSystemObject")

Set ts = fs.CreateTextFile("z:\docs\import.csv", True)

sData = "1,2,3,""This should,be one part"",5,6,7"

ts.Write sData
ts.Close

''Just for testing, your table will already exist
''sSQL = "Create table Imports (f1 int, f2 int, f3 int, f4 text, " _
''     & "f5 int, f6 int, f7 int)"
''CurrentDb.Execute sSQL

''The fields will be called F1,F2 ... Fn in the text file
sSQL = "INSERT INTO Imports SELECT * FROM " _
     & "[text;fmt=delimited;hdr=no;database=z:\docs\].[import.csv]"
CurrentDb.Execute sSQL

答案 4 :(得分:2)

我知道这是一个老帖子,但认为这可能有助于其他人。这是从http://n3wt0n.com/blog/comma-separated-values-and-quoted-commas-in-vbscript/抄袭/修改的,但效果非常好,并且设置为可以将输入行传递给的函数。

Function SplitCSVLineToArray(Line, RemoveQuotes) 'Pass it a line and whether or not to remove the quotes
    ReplacementString = "#!#!#"  'Random String that we should never see in our file
    LineLength = Len(Line)
    InQuotes = False
    NewLine = ""
    For x = 1 to LineLength 
        CurrentCharacter = Mid(Line,x,1)
        If CurrentCharacter = Chr(34) then  
            If InQuotes then
                InQuotes = False
            Else
                InQuotes = True
            End If
        End If
        If InQuotes Then 
            CurrentCharacter = Replace(CurrentCharacter, ",", ReplacementString)
        End If
        NewLine = NewLine & CurrentCharacter
    Next    
    LineArray = split(NewLine,",")
    For x = 0 to UBound(LineArray)
        LineArray(x) = Replace(LineArray(x), ReplacementString, ",")
        If RemoveQuotes = True then 
            LineArray(x) = Replace(LineArray(x), Chr(34), "")
        End If
    Next 
    SplitCSVLineToArray = LineArray
End Function

答案 5 :(得分:1)

我意识到这是一个老帖子,但我只是碰到它寻找解决OP所遇到的同样问题的方法,所以线程仍然相关。

要从CSV导入数据,我向工作表添加查询

wksTarget.Querytables.add(Connection:=strConn, Destination:=wksTarget.Range("A1"))

然后设置相应的Querytable参数(例如Name, FieldNames, RefreshOnOpen等)

查询表可以通过TextFileCommaDelimiterTextFileSemiColonDelimiter和其他人处理各种分隔符。还有许多其他参数(TextfilePlatform, TextFileTrailingMinusNumbers, TextFileColumnTypes, TextFileDecimalSeparator, TextFileStartRow, TextFileThousandsSeparator)可以处理源文件的特性。

与OP相关,QueryTables还有一个参数,用于处理双引号内的逗号 - TextFileQualifier = xlTextQualifierDoubleQuote

我发现QueryTables比编写代码导入文件,拆分/解析字符串或使用REGEX表达式要简单得多。

总之,示例代码段看起来像这样:

    strConn = "TEXT;" & "C:\Desktop\SourceFile.CSV"
    varDataTypes = Array(5, 1, 1, 1, 1, 1, 5, 5)
    With wksTarget.QueryTables.Add(Connection:=strConn, _ 
         Destination:=wksTarget.Range("A1"))
        .Name = "ImportCSV"
        .FieldNames = True
        .RefreshOnFileOpen = False
        .SaveData = True
        .TextFilePlatform = xlMSDOS
        .TextFileStartRow = 1
        .TextFileParseType = xlDelimited
        .TextFileCommaDelimiter = True
        .TextFileTextQualifier = xlTextQualifierDoubleQuote
        .TextFileColumnDataTypes = varDataTypes
        .Refresh BackgroundQuery:=False
    End With

我更喜欢在导入数据后删除QueryTable(wksTarget.QueryTable("ImportCSV").Delete),但我想它可以只创建一次,然后只是刷新,如果数据的源和目标不会改变。

答案 6 :(得分:1)

我提出了解决方案的另一种变体,用于使用可能的分隔符(例如双引号内的逗号)来解析带有“带引号”文本字符串的CSV文件。此方法不需要正则表达式或任何其他插件。另外,此代码在引号之间处理多个逗号。 这是用于测试的子例程:

Sub SubstituteBetweenQuotesSub()
'In-string character replacement function by Maryan Hutsul      1/29/2019
Dim quote, quoteTwo As Integer
Dim oddEven As Integer
Dim i, counter As Integer
Dim byteArray() As Byte

'LineItems are lines of text read from CSV file, or any other text string
LineItems = ",,,2019NoApocalypse.ditamap,jesus.christ@sky.com,Approver,""JC, ,Son"",Reviewer,god.allmighty@sky.com,""God, All-Mighty,"",2019-01-29T08:47:29.290-05:00"

quote = 1
oddEven = 0

Do Until quote = 0
quote = InStr(quote, LineItems, Chr(34))
quoteTwo = InStr(quote + 1, LineItems, Chr(34))

oddEven = oddEven + 1
    If oddEven Mod 2 = 1 And quote <> 0 Then

        counter = 0
        For i = quote To quoteTwo
            byteArray = StrConv(LineItems, vbFromUnicode)
            If i <> 0 Then
                If byteArray(i - 1) = 44 Then   '44 represents comma, can also do Chr(44)
                counter = counter + 1
                End If
            End If
        Next i

        LineItems = Left(LineItems, quote - 1) & Replace(LineItems, ",", ";", quote, counter)
        quote = quote + 1
    ElseIf quote <> 0 Then
        quote = quote + 1
    End If
Loop

End Sub

这是您可以传递.csv,.txt或任何其他文本文件中的行的功能:

Function SubstituteBetweenQuotes(LineItems)
'In-string character replacement function by Maryan Hutsul                                          1/29/2019
'LineItems are lines of text read from CSV file, or any other text string
Dim quote, quoteTwo As Integer
Dim oddEven As Integer
Dim i, counter As Integer
Dim byteArray() As Byte


quote = 1
oddEven = 0

Do Until quote = 0
quote = InStr(quote, LineItems, Chr(34))
quoteTwo = InStr(quote + 1, LineItems, Chr(34))

oddEven = oddEven + 1
    If oddEven Mod 2 = 1 And quote <> 0 Then

        counter = 0
        For i = quote To quoteTwo
            byteArray = StrConv(LineItems, vbFromUnicode)
            If i <> 0 Then
                If byteArray(i - 1) = 44 Then   '44 represents "," comma, can also do Chr(44)
                counter = counter + 1
                End If
            End If
        Next i

        LineItems = Left(LineItems, quote - 1) & Replace(LineItems, ",", ";", quote, counter)
        quote = quote + 1
    ElseIf quote <> 0 Then
        quote = quote + 1
    End If
Loop

SubstituteBetweenQuotes = LineItems

End Function

以下是使用功能读取CSV文件的代码:

Dim fullFilePath As String
Dim i As Integer

'fullFilePath - full link to your input CSV file
Open fullFilePath For Input As #1
row_number = 0
column_number = 0
'EOF - End Of File  (1) - file #1
Do Until EOF(1)
    Line Input #1, LineFromFile
            LineItems = Split(SubstituteBetweenQuotes(LineFromFile), ",")
    For i = LBound(LineItems) To UBound(LineItems)
    ActiveCell.Offset(row_number, i).Value = LineItems(i)
    Next i
    row_number = row_number + 1
Loop
Close #1

所有定界符和替换字符都可以根据您的需要进行修改。 我希望这对我很有帮助,因为我经历了一段解决CSV导入问题的旅程

答案 7 :(得分:0)

考虑到您的意见,您可以轻松地走出这里

  • 拆分“ - &gt;为您提供3个或更多条目(可能更多是由于字符串文字中的双引号)
  • 拆分第一部分,
  • 将第2部分保持在一起n-1(是你的字符串文字)
  • 拆分最后一部分,

答案 8 :(得分:0)

我们最近在excel中遇到了类似的CSV解析挑战,并实施了根据https://api.scryfall.com/cards/search?order=cmc&q=c%3Ared+pow%3D3改编的解决方案:

Function SplitCSV(csvText As String, delimiter As String) As String()

    ' Create a regular expression to parse the CSV values
    Dim RegEx As New RegExp

    ' Create pattern which will match each column in the CSV, wih submatches for each of the groups in the regex
    ' Match Groups:  Delimiter            Quoted fields                  Standard fields
    RegEx.Pattern = "(" + delimiter + "|^)(?:\""([^\""]*(?:\""\""[^\""]*)*)\""|([^\""\""" + delimiter + """]*))"
    RegEx.Global = True
    RegEx.IgnoreCase = True

    ' Create an array to hold all pattern matches (i.e. columns)
    Dim Matches As MatchCollection
    Set Matches = RegEx.Execute(csvText)

    ' Create an array to hold output data
    Dim Output() As String

    ' Create int to track array location when iterating
    Dim i As Integer
    i = 0

    ' Manually add blank if first column is blank, since VBA regex misses this
    If csvText Like ",*" Then
        ReDim Preserve Output(i)
        Output(i) = ""
        i = i + 1
    End If

    ' Iterate over all pattern matches and get values into output array
    Dim Match As Match
    Dim MatchedValue As String
    For Each Match In Matches

        ' Check to see which kind of value we captured (quoted or unquoted)
        If (Len(Match.SubMatches(1)) > 0) Then
            ' We found a quoted value. When we capture this value, unescape any double quotes
            MatchedValue = Replace(Match.SubMatches(1), """""", """")
        Else
            ' We found a non-quoted value
            MatchedValue = Match.SubMatches(2)
        End If

        ' Now that we have our value string, let's add it to the data array
        ReDim Preserve Output(i)
        Output(i) = MatchedValue
        i = i + 1

    Next Match

    ' Return the parsed data
    SplitCSV = Output

End Function

答案 9 :(得分:0)

尝试一下!确保在“工具”下的“引用”上勾选了“ Microsoft VBScript正则表达式5.5”。

enter image description here

Function Splitter(line As String, n As Integer)
Dim s() As String
Dim regex As Object
    Set regex = CreateObject("vbscript.regexp")
    regex.IgnoreCase = True
    regex.Global = True
    regex.Pattern = ",(?=([^\""]*\""[^\""]*\"")*[^\""]*$)"
    s = split(regex.Replace(line, "|/||\|"), "|/||\|")
    Splitter = s(n - 1)
End Function

答案 10 :(得分:0)

如果源CSV的每个字段都用双引号引起来,则split(strLine,“”“,”“”)可能会很好