给出以下字符串:
"foo bar-baz-zzz"
我想将它分成字符“”和“ - ”,保留它们的值,但得到所有输入组合。
我想获得一个包含
的二维数组{{"foo", "bar", "baz", "zzz"}
,{"foo bar", "baz", "zzz"}
,{"foo", "bar-baz", "zzz"}
,{"foo bar-baz", "zzz"}
,{"foo", "bar", "baz-zzz"}
,{"foo bar", "baz-zzz"}
,{"foo", "bar-baz-zzz"}
,{"foo bar-baz-zzz"}}
Java中是否有任何内置方法以这种方式拆分字符串?也许在像Apache Commons这样的库中?或者我是否必须写一个for-loops墙?
答案 0 :(得分:6)
以下是有效的递归解决方案。我使用List<List<String>>
而不是二维数组来简化操作。代码有点难看,可能会稍微整理一下。
示例输出:
$ java Main foo bar-baz-zzz
Processing: foo bar-baz-zzz
[foo, bar, baz, zzz]
[foo, bar, baz-zzz]
[foo, bar-baz, zzz]
[foo, bar-baz-zzz]
[foo bar, baz, zzz]
[foo bar, baz-zzz]
[foo bar-baz, zzz]
[foo bar-baz-zzz]
代码:
import java.util.*;
public class Main {
public static void main(String[] args) {
// First build a single string from the command line args.
StringBuilder sb = new StringBuilder();
Iterator<String> it = Arrays.asList(args).iterator();
while (it.hasNext()) {
sb.append(it.next());
if (it.hasNext()) {
sb.append(' ');
}
}
process(sb.toString());
}
protected static void process(String str) {
System.err.println("Processing: " + str);
List<List<String>> results = new LinkedList<List<String>>();
// Invoke the recursive method that does the magic.
process(str, 0, results, new LinkedList<String>(), new StringBuilder());
for (List<String> result : results) {
System.err.println(result);
}
}
protected static void process(String str, int pos, List<List<String>> resultsSoFar, List<String> currentResult, StringBuilder sb) {
if (pos == str.length()) {
// Base case: Reached end of string so add buffer contents to current result
// and add current result to resultsSoFar.
currentResult.add(sb.toString());
resultsSoFar.add(currentResult);
} else {
// Step case: Inspect character at pos and then make recursive call.
char c = str.charAt(pos);
if (c == ' ' || c == '-') {
// When we encounter a ' ' or '-' we recurse twice; once where we treat
// the character as a delimiter and once where we treat it as a 'normal'
// character.
List<String> copy = new LinkedList<String>(currentResult);
copy.add(sb.toString());
process(str, pos + 1, resultsSoFar, copy, new StringBuilder());
sb.append(c);
process(str, pos + 1, resultsSoFar, currentResult, sb);
} else {
sb.append(c);
process(str, pos + 1, resultsSoFar, currentResult, sb);
}
}
}
}
答案 1 :(得分:4)
这是一个更短的版本,以递归方式编写。我为只能用Python写它而道歉。我喜欢它是多么简洁;肯定有人能够制作Java版本。
def rec(h,t):
if len(t)<2: return [[h+t]]
if (t[0]!=' ' and t[0]!='-'): return rec(h+t[0], t[1:])
return rec(h+t[0], t[1:]) + [ [h]+x for x in rec('',t[1:])]
结果:
>>> rec('',"foo bar-baz-zzz") [['foo bar-baz-zzz'], ['foo bar-baz', 'zzz'], ['foo bar', 'baz-zzz'], ['foo bar' , 'baz', 'zzz'], ['foo', 'bar-baz-zzz'], ['foo', 'bar-baz', 'zzz'], ['foo', 'bar ', 'baz-zzz'], ['foo', 'bar', 'baz', 'zzz']]
答案 2 :(得分:3)
这是一个懒惰地返回分割值列表的类:
public class Split implements Iterator<List<String>> {
private Split kid; private final Pattern pattern;
private String subsequence; private final Matcher matcher;
private boolean done = false; private final String sequence;
public Split(Pattern pattern, String sequence) {
this.pattern = pattern; matcher = pattern.matcher(sequence);
this.sequence = sequence;
}
@Override public List<String> next() {
if (done) { throw new IllegalStateException(); }
while (true) {
if (kid == null) {
if (matcher.find()) {
subsequence = sequence.substring(matcher.end());
kid = new Split(pattern, sequence.substring(0, matcher.start()));
} else { break; }
} else {
if (kid.hasNext()) {
List<String> next = kid.next();
next.add(subsequence);
return next;
} else { kid = null; }
}
}
done = true;
List<String> list = new ArrayList<String>();
list.add(sequence);
return list;
}
@Override public boolean hasNext() { return !done; }
@Override public void remove() { throw new UnsupportedOperationException(); }
}
(原谅代码格式化 - 它是为了避免嵌套的滚动条)。
对于样本调用:
Pattern pattern = Pattern.compile(" |-");
String str = "foo bar-baz-zzz";
Split split = new Split(pattern, str);
while (split.hasNext()) {
System.out.println(split.next());
}
......它会发出:
[foo, bar-baz-zzz]
[foo, bar, baz-zzz]
[foo bar, baz-zzz]
[foo, bar-baz, zzz]
[foo, bar, baz, zzz]
[foo bar, baz, zzz]
[foo bar-baz, zzz]
[foo bar-baz-zzz]
我想可以改进实施。
答案 3 :(得分:1)
你为什么需要那个?
请注意,对于给定的N个标记字符串,您希望获得一个ca N * 2 ^ N个字符串的数组。如果没有以安全的方式完成,这可以消耗大量的内存......
我想你可能需要通过这一切迭代,对吧?如果是这样的话,最好创建一个保留原始字符串的类,并在每次提问时给你不同的方法来分割行。这样可以节省大量内存并获得更好的可扩展性。
答案 4 :(得分:0)
没有库方法。
要实现这一点,您应该通过保留分隔符来标记字符串(在您的情况下使用“ - ”),然后您应该将分隔符视为与二进制标志关联,并根据标志的值构建所有组合。
在你的情况下,你有3个分隔符:“”,“ - ”和“ - ”,所以你有3个二进制标志。您将在字符串中得到2 ^ 3 = 8个值。