RegEx - 解析Csv文本

时间:2012-05-09 13:04:31

标签: regex vb.net parsing csv

所以这里有大量帖子注意而不是滚动我自己的csv解析器我应该使用Vb.Net TextFiledParser。

我尝试了但是,请告诉我,如果我错了,它会基于一个分隔符解析。

因此,如果我有一个地址字段“Flat 1,StackOverflow House,London”,我会得到三个字段。不幸的是,这不是我想要的。我需要给定单元格中的所有内容保留为数组中的单个项目。

所以我开始编写自己的RegEx,如下所示:

var testString = @"""Test 1st string""" + "," + @"""Flat 1, StackOverflow House, London, England, The Earth""" + "," + "123456";

var matches = Regex.Matches(chars, @"""([^""\\])*?(?:\\.[^""\\]*)*?""");
var numbers = Regex.Matches(chars, @"\d+$");//only numbers
Assert.That(results.Count(), Is.EqualTo(3));
Assert.That(secondMatch.Count, Is.EqualTo(1));

第一个断言失败,因为未返回字符串“123456”。该表达式仅返回“Test 1st string”和“Flat 1,StackOverflow House,London,England,The Earth”

我想要的是正则表达式返回引用\转义和数字的所有内容。

我不控制数据,但是数字字符串将全部被引用\转义而数字则不会。

我真的很感激一些帮助,因为我在圈子里尝试第三方图书馆而没有太大的成功。

毋庸置疑,string.split在地址的情况下不起作用,http://www.filehelpers.com/似乎没有考虑到这些例子。

2 个答案:

答案 0 :(得分:2)

只是为了让你知道你面对的是什么:这是一个应该运作良好的正则表达式。但是你肯定需要测试它,因为有很多带有CSV的极端情况,我肯定会错过一些(而且我假设逗号作为分隔符而"作为引号字符(通过加倍逃脱)):

(?:           # Match either
 (?>[^",\n]*) #  0 or more characters except comma, quote or newline
|             # or
 "            #  an opening quote
 (?:          #  followed by either
  (?>[^"]*)   #   0 or more non-quote characters
 |            #  or
  ""          #   an escaped quote ("")
 )*           #  any number of times
 "            #  followed by a closing quote
)             # End of alternation
(?=,|$)       # Assert that the next character is a comma (or end of line)

在VB.NET中:

Dim ResultList As StringCollection = New StringCollection()
Dim RegexObj As New Regex(
    "(?:            # Match either" & chr(10) & _
    " (?>[^"",\n]*) #  0 or more characters except comma, quote or newline" & chr(10) & _
    "|              # or" & chr(10) & _
    " ""            #  an opening quote" & chr(10) & _
    " (?:           #  followed by either" & chr(10) & _
    "  (?>[^""]*)   #   0 or more non-quote characters" & chr(10) & _
    " |             #  or" & chr(10) & _
    "  """"         #   an escaped quote ("""")" & chr(10) & _
    " )*            #  any number of times" & chr(10) & _
    " ""            #  followed by a closing quote" & chr(10) & _
    ")              # End of alternation" & chr(10) & _
    "(?=,|$)        # Assert that the next character is a comma (or end of line)", 
    RegexOptions.Multiline Or RegexOptions.IgnorePatternWhitespace)
Dim MatchResult As Match = RegexObj.Match(SubjectString)
While MatchResult.Success
    ResultList.Add(MatchResult.Value)
    MatchResult = MatchResult.NextMatch()
End While

答案 1 :(得分:0)

我曾经快速绕过它的一种hacky方式是先用引号引出Split,然后在每个其他索引之间删除引号(或用某些东西替换它们)。然后在逗号上再次Split字符串

刚刚发现:Javascript code to parse CSV data - 我感谢它是JavaScript,而不是vb.net。但是,你应该能够遵循它

另外How can I parse a CSV string with Javascript, which contains comma in data?