csv的正则表达式,带逗号但没有引号

时间:2012-07-13 11:30:49

标签: regex csv comma quote

我正在尝试解析真正复杂的csv,它是用逗号列的任何引号生成的。
 我得到的唯一提示是,字段中包含前后空格的逗号。

Jake,HomePC,Microsoft VS2010, Microsoft Office 2010

应解析为

Jake
HomePC
Microsoft VS2010, Microsoft Office 2010

有人可以建议如何在列体中加入“\ s”和“\ s”。

3 个答案:

答案 0 :(得分:2)

如果您的语言支持lookbehind断言,请分开

(?<!\s),(?!\s)

在C#中:

string[] splitArray = Regex.Split(subjectString, 
    @"(?<!\s) # Assert that the previous character isn't whitespace
    ,         # Match a comma
    (?!\s)    # Assert that the following character isn't whitespace", 
    RegexOptions.IgnorePatternWhitespace);

答案 1 :(得分:0)

由r“(?!\ s +),(?!\ s +)”

分开

在python中你可以这样做

import re
re.split(r"(?!\s+),(?!\s+)", s) # s is your string

答案 2 :(得分:0)

试试这个。它给了我你所提到的理想结果。

StringBuilder testt = new StringBuilder("Jake,HomePC,Microsoft VS2010, Microsoft Office 2010,Microsoft VS2010, Microsoft Office 2010");
Pattern varPattern = Pattern.compile("[a-z0-9],[a-z0-9]", Pattern.CASE_INSENSITIVE);
Matcher varMatcher = varPattern.matcher(testt);
List<String> list = new ArrayList<String>();
int startIndex = 0, endIndex = 0;
boolean found = false;
while (varMatcher.find()) {
endIndex = varMatcher.start()+1;
if (startIndex == 0) {
list.add(testt.substring(startIndex, endIndex));
} else {
startIndex++;
list.add(testt.substring(startIndex, endIndex));
}
startIndex = endIndex;
found = true;
}
if (found) {
if (startIndex == 0) {
list.add(testt.substring(startIndex));
} else {
list.add(testt.substring(startIndex + 1));
}
}
for (String s : list) {
System.out.println(s);
}

请注意,代码是Java格式。