我正在尝试解析真正复杂的csv,它是用逗号列的任何引号生成的。
我得到的唯一提示是,字段中包含前后空格的逗号。
Jake,HomePC,Microsoft VS2010, Microsoft Office 2010
应解析为
Jake
HomePC
Microsoft VS2010, Microsoft Office 2010
有人可以建议如何在列体中加入“\ s”和“\ s”。
答案 0 :(得分:2)
如果您的语言支持lookbehind断言,请分开
(?<!\s),(?!\s)
在C#中:
string[] splitArray = Regex.Split(subjectString,
@"(?<!\s) # Assert that the previous character isn't whitespace
, # Match a comma
(?!\s) # Assert that the following character isn't whitespace",
RegexOptions.IgnorePatternWhitespace);
答案 1 :(得分:0)
由r“(?!\ s +),(?!\ s +)”
分开在python中你可以这样做
import re
re.split(r"(?!\s+),(?!\s+)", s) # s is your string
答案 2 :(得分:0)
试试这个。它给了我你所提到的理想结果。
StringBuilder testt = new StringBuilder("Jake,HomePC,Microsoft VS2010, Microsoft Office 2010,Microsoft VS2010, Microsoft Office 2010");
Pattern varPattern = Pattern.compile("[a-z0-9],[a-z0-9]", Pattern.CASE_INSENSITIVE);
Matcher varMatcher = varPattern.matcher(testt);
List<String> list = new ArrayList<String>();
int startIndex = 0, endIndex = 0;
boolean found = false;
while (varMatcher.find()) {
endIndex = varMatcher.start()+1;
if (startIndex == 0) {
list.add(testt.substring(startIndex, endIndex));
} else {
startIndex++;
list.add(testt.substring(startIndex, endIndex));
}
startIndex = endIndex;
found = true;
}
if (found) {
if (startIndex == 0) {
list.add(testt.substring(startIndex));
} else {
list.add(testt.substring(startIndex + 1));
}
}
for (String s : list) {
System.out.println(s);
}
请注意,代码是Java格式。