拆分字符串 - 笛卡尔方式

时间:2009-08-25 14:46:59

标签: java algorithm string split cartesian-product

给出以下字符串:

"foo bar-baz-zzz"

我想将它分成字符“”和“ - ”,保留它们的值,但得到所有输入组合。

我想获得一个包含

的二维数组
{{"foo", "bar", "baz", "zzz"}
,{"foo bar", "baz", "zzz"}
,{"foo", "bar-baz", "zzz"}
,{"foo bar-baz", "zzz"}
,{"foo", "bar", "baz-zzz"}
,{"foo bar", "baz-zzz"}
,{"foo", "bar-baz-zzz"}
,{"foo bar-baz-zzz"}}

Java中是否有任何内置方法以这种方式拆分字符串?也许在像Apache Commons这样的库中?或者我是否必须写一个for-loops墙?

5 个答案:

答案 0 :(得分:6)

以下是有效的递归解决方案。我使用List<List<String>>而不是二维数组来简化操作。代码有点难看,可能会稍微整理一下。

示例输出:

$ java Main foo bar-baz-zzz
Processing: foo bar-baz-zzz
[foo, bar, baz, zzz]
[foo, bar, baz-zzz]
[foo, bar-baz, zzz]
[foo, bar-baz-zzz]
[foo bar, baz, zzz]
[foo bar, baz-zzz]
[foo bar-baz, zzz]
[foo bar-baz-zzz]

代码:

import java.util.*;

public class Main {
  public static void main(String[] args) {
    // First build a single string from the command line args.
    StringBuilder sb = new StringBuilder();
    Iterator<String> it = Arrays.asList(args).iterator();
    while (it.hasNext()) {
      sb.append(it.next());

      if (it.hasNext()) {
        sb.append(' ');
      }
    }

    process(sb.toString());
  }

  protected static void process(String str) {
    System.err.println("Processing: " + str);
    List<List<String>> results = new LinkedList<List<String>>();

    // Invoke the recursive method that does the magic.
    process(str, 0, results, new LinkedList<String>(), new StringBuilder());

    for (List<String> result : results) {
      System.err.println(result);
    }
  }

  protected static void process(String str, int pos, List<List<String>> resultsSoFar, List<String> currentResult, StringBuilder sb) {
    if (pos == str.length()) {
      // Base case: Reached end of string so add buffer contents to current result
      // and add current result to resultsSoFar.
      currentResult.add(sb.toString());
      resultsSoFar.add(currentResult);
    } else {
      // Step case: Inspect character at pos and then make recursive call.
      char c = str.charAt(pos);

      if (c == ' ' || c == '-') {
        // When we encounter a ' ' or '-' we recurse twice; once where we treat
        // the character as a delimiter and once where we treat it as a 'normal'
        // character.
        List<String> copy = new LinkedList<String>(currentResult);
        copy.add(sb.toString());
        process(str, pos + 1, resultsSoFar, copy, new StringBuilder());

        sb.append(c);
        process(str, pos + 1, resultsSoFar, currentResult, sb);
      } else {
        sb.append(c);
        process(str, pos + 1, resultsSoFar, currentResult, sb);
      }
    }
  }
}

答案 1 :(得分:4)

这是一个更短的版本,以递归方式编写。我为只能用Python写它而道歉。我喜欢它是多么简洁;肯定有人能够制作Java版本。

def rec(h,t):
  if len(t)<2: return [[h+t]]
  if (t[0]!=' ' and t[0]!='-'): return rec(h+t[0], t[1:])
  return rec(h+t[0], t[1:]) + [ [h]+x for x in rec('',t[1:])]

结果:

>>> rec('',"foo bar-baz-zzz")
[['foo bar-baz-zzz'], ['foo bar-baz', 'zzz'], ['foo bar', 'baz-zzz'], ['foo bar'
, 'baz', 'zzz'], ['foo', 'bar-baz-zzz'], ['foo', 'bar-baz', 'zzz'], ['foo', 'bar
', 'baz-zzz'], ['foo', 'bar', 'baz', 'zzz']]

答案 2 :(得分:3)

这是一个懒惰地返回分割值列表的类:

public class Split implements Iterator<List<String>> {
  private Split kid;                 private final Pattern pattern;
  private String subsequence;        private final Matcher matcher;
  private boolean done = false;      private final String sequence;
  public Split(Pattern pattern, String sequence) {
    this.pattern = pattern;          matcher = pattern.matcher(sequence);
    this.sequence = sequence;
  }

  @Override public List<String> next() {
    if (done) { throw new IllegalStateException(); }
    while (true) {
      if (kid == null) {
        if (matcher.find()) {
          subsequence = sequence.substring(matcher.end());
          kid = new Split(pattern, sequence.substring(0, matcher.start()));
        } else { break; }
      } else {
        if (kid.hasNext()) {
          List<String> next = kid.next();
          next.add(subsequence);
          return next;
        } else { kid = null; }
      }
    }
    done = true;
    List<String> list = new ArrayList<String>();
    list.add(sequence);
    return list;
  }
  @Override public boolean hasNext() { return !done; }
  @Override public void remove() { throw new UnsupportedOperationException(); }
}

(原谅代码格式化 - 它是为了避免嵌套的滚动条)。

对于样本调用:

Pattern pattern = Pattern.compile(" |-");
String str = "foo bar-baz-zzz";
Split split = new Split(pattern, str);
while (split.hasNext()) {
  System.out.println(split.next());
}

......它会发出:

[foo, bar-baz-zzz]
[foo, bar, baz-zzz]
[foo bar, baz-zzz]
[foo, bar-baz, zzz]
[foo, bar, baz, zzz]
[foo bar, baz, zzz]
[foo bar-baz, zzz]
[foo bar-baz-zzz]

我想可以改进实施。

答案 3 :(得分:1)

你为什么需要那个?

请注意,对于给定的N个标记字符串,您希望获得一个ca N * 2 ^ N个字符串的数组。如果没有以安全的方式完成,这可以消耗大量的内存......

我想你可能需要通过这一切迭代,对吧?如果是这样的话,最好创建一个保留原始字符串的类,并在每次提问时给你不同的方法来分割行。这样可以节省大量内存并获得更好的可扩展性。

答案 4 :(得分:0)

没有库方法。

要实现这一点,您应该通过保留分隔符来标记字符串(在您的情况下使用“ - ”),然后您应该将分隔符视为与二进制标志关联,并根据标志的值构建所有组合。

在你的情况下,你有3个分隔符:“”,“ - ”和“ - ”,所以你有3个二进制标志。您将在字符串中得到2 ^ 3 = 8个值。