我想通过正则表达式进行java分割。 当我的字符串不在单引号或括号中时,我想在每个逗号上拆分我的字符串。 例如:
Hello, 'my,',friend,(how ,are, you),(,)
should give:
hello
my,
friend
how, are, you
,
我试过了:
(?i),(?=([^\'|\(]*\'|\([^\'|\(]*\'|\()*[^\'|\)]*$)
但是我无法让它工作(我通过http://java-regex-tester.appspot.com/测试)
有什么想法吗?
答案 0 :(得分:6)
正则表达式无法拆分嵌套的paranthesises。它更容易手动拆分。
public static List<String> split(String orig) {
List<String> splitted = new ArrayList<String>();
int nextingLevel = 0;
StringBuilder result = new StringBuilder();
for (char c : orig.toCharArray()) {
if (c == ',' && nextingLevel == 0) {
splitted.add(result.toString());
result.setLength(0);// clean buffer
} else {
if (c == '(')
nextingLevel++;
if (c == ')')
nextingLevel--;
result.append(c);
}
}
// Thanks PoeHah for pointing it out. This adds the last element to it.
splitted.add(result.toString());
return splitted;
}
希望这有帮助。
答案 1 :(得分:3)
java CSV解析器库比正则表达式更适合此任务:http://sourceforge.net/projects/javacsv/
答案 2 :(得分:1)
假设没有嵌套的()
,您可以拆分
",(?=(?:[^']*'[^']*')*[^']*$)(?=(?:[^()]*\\([^()]*\\))*[^()]*$)"
当字符串中的前面是偶数个'
和括号对时,它只会在逗号上拆分。
这是一个脆弱的解决方案,但它可能已经足够了。
答案 3 :(得分:1)
正如@Balthus的一些评论和回答一样,最好在 CSV Parser 中完成。您需要执行一些smart RexEx replacement
来准备输入字符串以进行解析。考虑这样的代码:
String str = "Hello, 'my,',friend,(how ,are, you),(,)"; // input string
// prepare String for CSV parser: replace left/right brackets OR ' by a "
CsvReader reader = CsvReader.parse(str.replaceAll("[(')]", "\""));
reader.readRecord(); // read the CSV input
for (int i=0; i<reader.getColumnCount(); i++)
System.out.printf("col[%d]: [%s]%n", i, reader.get(i));
<强>输出强>
col[0]: [Hello]
col[1]: [my,]
col[2]: [friend]
col[3]: [how ,are, you]
col[4]: [,]
答案 4 :(得分:0)
我还需要在引号和括号之外用逗号分隔。
在搜索了SO上的所有相关答案之后,我意识到在这种情况下需要词法分析器,并且我为自己编写了一个通用实现。它支持分隔符,多个引号和多个括号作为正则表达式。
public static List<String> split(String string, String regex, String[] quotesRegex, String[] leftBracketsRegex,
String[] rightBracketsRegex) {
if (leftBracketsRegex.length != rightBracketsRegex.length) {
throw new IllegalArgumentException("Bracket count mismatch, left: " + leftBracketsRegex.length + ", right: "
+ rightBracketsRegex.length);
}
// Prepare all delimiters.
String[] delimiters = new String[1 + quotesRegex.length + leftBracketsRegex.length + rightBracketsRegex.length];
delimiters[0] = regex;
System.arraycopy(quotesRegex, 0, delimiters, 1, quotesRegex.length);
System.arraycopy(leftBracketsRegex, 0, delimiters, 1 + quotesRegex.length, leftBracketsRegex.length);
System.arraycopy(rightBracketsRegex, 0, delimiters, 1 + quotesRegex.length + leftBracketsRegex.length,
rightBracketsRegex.length);
// Build delimiter regex.
StringBuilder delimitersRegexBuilder = new StringBuilder("(?:");
boolean first = true;
for (String delimiter : delimiters) {
if (delimiter.endsWith("\\") && !delimiter.endsWith("\\\\")) {
throw new IllegalArgumentException("Delimiter contains trailing single \\: " + delimiter);
}
if (first) {
first = false;
} else {
delimitersRegexBuilder.append("|");
}
delimitersRegexBuilder
.append("(")
.append(delimiter)
.append(")");
}
delimitersRegexBuilder.append(")");
String delimitersRegex = delimitersRegexBuilder.toString();
// Scan.
int pendingQuoteIndex = -1;
Deque<Integer> bracketStack = new LinkedList<>();
StringBuilder pendingSegmentBuilder = new StringBuilder();
List<String> segmentList = new ArrayList<>();
Matcher matcher = Pattern.compile(delimitersRegex).matcher(string);
int matcherIndex = 0;
while (matcher.find()) {
pendingSegmentBuilder.append(string.substring(matcherIndex, matcher.start()));
int delimiterIndex = -1;
for (int i = 1; i <= matcher.groupCount(); ++i) {
if (matcher.group(i) != null) {
delimiterIndex = i - 1;
break;
}
}
if (delimiterIndex < 1) {
// Regex.
if (pendingQuoteIndex == -1 && bracketStack.isEmpty()) {
segmentList.add(pendingSegmentBuilder.toString());
pendingSegmentBuilder.setLength(0);
} else {
pendingSegmentBuilder.append(matcher.group());
}
} else {
delimiterIndex -= 1;
pendingSegmentBuilder.append(matcher.group());
if (delimiterIndex < quotesRegex.length) {
// Quote.
if (pendingQuoteIndex == -1) {
pendingQuoteIndex = delimiterIndex;
} else if (pendingQuoteIndex == delimiterIndex) {
pendingQuoteIndex = -1;
}
// Ignore unpaired quotes.
} else if (pendingQuoteIndex == -1) {
delimiterIndex -= quotesRegex.length;
if (delimiterIndex < leftBracketsRegex.length) {
// Left bracket
bracketStack.push(delimiterIndex);
} else {
delimiterIndex -= leftBracketsRegex.length;
// Right bracket
int topBracket = bracketStack.peek();
// Ignore unbalanced brackets.
if (delimiterIndex == topBracket) {
bracketStack.pop();
}
}
}
}
matcherIndex = matcher.end();
}
pendingSegmentBuilder.append(string.substring(matcherIndex, string.length()));
segmentList.add(pendingSegmentBuilder.toString());
while (segmentList.size() > 0 && segmentList.get(segmentList.size() - 1).isEmpty()) {
segmentList.remove(segmentList.size() - 1);
}
return segmentList;
}