Question

我有一个字符串和一个单词数组，我必须编写代码来查找字符串的所有子字符串，其中包含任何顺序的数组中的所有单词。该字符串不包含任何特殊字符/数字，每个单词用空格分隔。

例如：

给出的字符串：

aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc

数组中的字词

aaaa
bbbb
cccc

输出样本：

aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb    

aaaa aaaa aaaa aaaa cccc bbbb    

aaaa cccc bbbb bbbb bbbb bbbb    

cccc bbbb bbbb bbbb bbbb aaaa  

aaaa cccc bbbb

我已经使用for循环实现了这个，但效率非常低。

如何更有效地完成这项工作？

我的代码：

    for(int i=0;i<str_arr.length;i++)
    {
        if( (str_arr.length - i) >= words.length)
        {
            String res = check(i);
            if(!res.equals(""))
            {
                System.out.println(res);
                System.out.println("");
            }
            reset_all();
        }
        else
        {
            break;
        }
    }

public static String check(int i)
{
    String res = "";
    num_words = 0;

    for(int j=i;j<str_arr.length;j++)
    {
        if(has_word(str_arr[j]))
        {
            t.put(str_arr[j].toLowerCase(), 1);
            h.put(str_arr[j].toLowerCase(), 1);

            res = res + str_arr[j]; //+ " ";

            if(all_complete())
            {
                return res;
            }

            res = res + " ";
        }
        else
        {
            res = res + str_arr[j] + " ";
        }

    }
    res = "";
    return res;
}

Answer 1

我的第一种方法是类似下面的伪代码

  for word:string {
    if word in array {
      for each stored potential substring {
        if word wasnt already found {
          remove word from notAlreadyFoundList
          if notAlreadyFoundList is empty {
            use starting pos and ending pos to save our substring
          }
        }
      store position and array-word as potential substring
  }

这应该具有不错的性能，因为您只遍历字符串一次。

<强> [编辑]

这是我的伪代码的实现，尝试一下，看它是否表现更好或更差。它的工作原理是，一旦找到最后一个单词，就会找到匹配的子字符串。如果您真的想要所有匹配，请更改标记为//ALLMATCHES的行：

class SubStringFinder {
    String textString = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc";
    Set<String> words = new HashSet<String>(Arrays.asList("aaaa", "bbbb", "cccc"));

    public static void main(String[] args) {
        new SubStringFinder();
    }

    public SubStringFinder() {
        List<PotentialMatch> matches = new ArrayList<PotentialMatch>();
        for (String textPart : textString.split(" ")) {
            if (words.contains(textPart)) {
                for (Iterator<PotentialMatch> matchIterator = matches.iterator(); matchIterator.hasNext();) {
                    PotentialMatch match = matchIterator.next();
                    String result = match.tryMatch(textPart);
                    if (result != null) {
                        System.out.println("Match found: \"" + result + "\"");
                        matchIterator.remove(); //ALLMATCHES - remove this line
                    }
                }
                Set<String> unfound = new HashSet<String>(words);
                unfound.remove(textPart);
                matches.add(new PotentialMatch(unfound, textPart));
            }// ALLMATCHES add these lines 
             // else {
             // matches.add(new PotentialMatch(new HashSet<String>(words), textPart));
             // }
        }
    }

    class PotentialMatch {
        Set<String> unfoundWords;
        StringBuilder stringPart;
        public PotentialMatch(Set<String> unfoundWords, String part) {
            this.unfoundWords = unfoundWords;
            this.stringPart = new StringBuilder(part);
        }
        public String tryMatch(String part) {
            this.stringPart.append(' ').append(part);
            unfoundWords.remove(part);                
            if (unfoundWords.isEmpty()) {
                return this.stringPart.toString();
            }
            return null;
        }
    }
}

Answer 2

这是另一种方法：

public static void main(String[] args) throws FileNotFoundException {
    // init
    List<String> result = new ArrayList<String>();
    String string = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc";
    String[] words = { "aaaa", "bbbb", "cccc" };
    // find all combs as regexps (e.g. "(aaaa )+(bbbb )+(cccc )*cccc", "(aaaa )+(cccc )+(bbbb )*bbbb")
    List<String> regexps = findCombs(Arrays.asList(words));
    // compile and add
    for (String regexp : regexps) {
        Pattern p = Pattern.compile(regexp);
        Matcher m = p.matcher(string);
        while (m.find()) {
            result.add(m.group());
        }
    }
    System.out.println(result);
}

private static List<String> findCombs(List<String> words) {
    if (words.size() == 1) {
        words.set(0, "(" + Pattern.quote(words.get(0)) + " )*" + Pattern.quote(words.get(0)));
        return words;
    }
    List<String> list = new ArrayList<String>();
    for (String word : words) {
        List<String> tail = new LinkedList<String>(words);
        tail.remove(word);
        for (String s : findCombs(tail)) {
            list.add("(" + Pattern.quote(word) + " ?)+" + s);
        }
    }
    return list;
}

这将输出：

[aaaa bbbb cccc, aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb, cccc bbbb bbbb bbbb bbbb aaaa]

我知道结果并不完整：你只有可用的组合，完全扩展，但你得到了所有这些。

查找字符串的子字符串包含数组中的所有单词

2 个答案: