所以这里有大量帖子注意而不是滚动我自己的csv解析器我应该使用Vb.Net TextFiledParser。
我尝试了但是,请告诉我,如果我错了,它会基于一个分隔符解析。
因此,如果我有一个地址字段“Flat 1,StackOverflow House,London”,我会得到三个字段。不幸的是,这不是我想要的。我需要给定单元格中的所有内容保留为数组中的单个项目。
所以我开始编写自己的RegEx,如下所示:
var testString = @"""Test 1st string""" + "," + @"""Flat 1, StackOverflow House, London, England, The Earth""" + "," + "123456";
var matches = Regex.Matches(chars, @"""([^""\\])*?(?:\\.[^""\\]*)*?""");
var numbers = Regex.Matches(chars, @"\d+$");//only numbers
Assert.That(results.Count(), Is.EqualTo(3));
Assert.That(secondMatch.Count, Is.EqualTo(1));
第一个断言失败,因为未返回字符串“123456”。该表达式仅返回“Test 1st string”和“Flat 1,StackOverflow House,London,England,The Earth”
我想要的是正则表达式返回引用\转义和数字的所有内容。
我不控制数据,但是数字字符串将全部被引用\转义而数字则不会。
我真的很感激一些帮助,因为我在圈子里尝试第三方图书馆而没有太大的成功。
毋庸置疑,string.split在地址的情况下不起作用,http://www.filehelpers.com/似乎没有考虑到这些例子。
答案 0 :(得分:2)
只是为了让你知道你面对的是什么:这是一个应该运作良好的正则表达式。但是你肯定需要测试它,因为有很多带有CSV的极端情况,我肯定会错过一些(而且我假设逗号作为分隔符而"
作为引号字符(通过加倍逃脱)):
(?: # Match either
(?>[^",\n]*) # 0 or more characters except comma, quote or newline
| # or
" # an opening quote
(?: # followed by either
(?>[^"]*) # 0 or more non-quote characters
| # or
"" # an escaped quote ("")
)* # any number of times
" # followed by a closing quote
) # End of alternation
(?=,|$) # Assert that the next character is a comma (or end of line)
在VB.NET中:
Dim ResultList As StringCollection = New StringCollection()
Dim RegexObj As New Regex(
"(?: # Match either" & chr(10) & _
" (?>[^"",\n]*) # 0 or more characters except comma, quote or newline" & chr(10) & _
"| # or" & chr(10) & _
" "" # an opening quote" & chr(10) & _
" (?: # followed by either" & chr(10) & _
" (?>[^""]*) # 0 or more non-quote characters" & chr(10) & _
" | # or" & chr(10) & _
" """" # an escaped quote ("""")" & chr(10) & _
" )* # any number of times" & chr(10) & _
" "" # followed by a closing quote" & chr(10) & _
") # End of alternation" & chr(10) & _
"(?=,|$) # Assert that the next character is a comma (or end of line)",
RegexOptions.Multiline Or RegexOptions.IgnorePatternWhitespace)
Dim MatchResult As Match = RegexObj.Match(SubjectString)
While MatchResult.Success
ResultList.Add(MatchResult.Value)
MatchResult = MatchResult.NextMatch()
End While
答案 1 :(得分:0)
我曾经快速绕过它的一种hacky方式是先用引号引出Split
,然后在每个其他索引之间删除引号(或用某些东西替换它们)。然后在逗号上再次Split
字符串
刚刚发现:Javascript code to parse CSV data - 我感谢它是JavaScript,而不是vb.net。但是,你应该能够遵循它
另外How can I parse a CSV string with Javascript, which contains comma in data?