如何在Java中找到两个字符串之间的所有重叠短语?

时间:2014-12-02 04:58:28

标签: java string string-matching

假设我有两个字符串

  1. 我喜欢鸡肉沙拉,这是我最喜欢的食物。

  2. 这本书包含大量制作各种食物的食谱,包括蛋糕,鸡肉沙拉等。

  3. 这两个字符串之间的重叠短语是 - 鸡肉,沙拉,鸡肉沙拉,食物。

    找到两个字符串之间重叠短语的最佳方法是什么?假设语法和语义都是干净的,第一个总是比第二个短得多。

4 个答案:

答案 0 :(得分:4)

您可以尝试这样的事情:

**

List<String> al = new ArrayList<String>();
    String one = "I like chicken salad, it's my favorite food.";
    String result = one.replaceAll("[.,]","");
    String[] tokens = result.split(" ");
    String second = "This book contains tons of recipes on making all sorts of food, including cakes, chicken salad, etc.";
    System.out.println(result);
    for(int i=0;i<tokens.length;i++){
        if(second.indexOf(tokens[i])>=0){
            al.add(tokens[i]);
        }
    }
    System.out.println(al);
    }

**

答案 1 :(得分:1)

我试过这种方法。似乎只需要salad, chicken, chicken salad, food作为重叠短语。

public static void main(String a[]) throws IOException{
    String firstSentence = "I like chicken salad, it's my favorite food";
    String secondSentence = "This book contains tons of recipes on making all sorts of food, including cakes, chicken salad, etc";
    String[] firstSentenceWords = firstSentence.replaceAll("[.,]", "").split(" ");
    Set<String> overlappingPhrases = new HashSet<String>();     
    String lastPhrase = "";     
    for(String word : firstSentenceWords){
        if(lastPhrase.isEmpty()){
            lastPhrase = word;
        }else{
            lastPhrase = lastPhrase + " " + word;
        }
        if(secondSentence.contains(word)){
            overlappingPhrases.add(word);
            if(secondSentence.contains(lastPhrase)){
                overlappingPhrases.add(lastPhrase);
            }
        }else{
            lastPhrase = "";
        }
    }
    System.out.println(overlappingPhrases);
}

overlappingPhrases设置包含[chicken salad, chicken, salad, food]

答案 2 :(得分:0)

首先,我认为你可以使用Brute-Force算法。你可以在shor字符串中溢出这个单词,你也可以像这样在长字符串中填充单词:

String short_words[] = short_string.spilt(" ");
String long_words[] = long_string.spilt(" ");

接下来你可以对short_words数组中的单词进行迭代。并检查每个单词是否在long_words数组中。但是Complexity的时间非常糟糕,为0(m * n)。 第二,我认为你可以使用哈希函数来做到这一点。

答案 3 :(得分:0)

满足您要求的方法:

public static void overlappingPhrases() {
    List<String> list = new ArrayList<>();
    String string1 = "I like chicken salad, it's my favorite food.";
    String string2 = "This book contains tons of recipes on making all sorts of food, including cakes, chicken salad, etc.";
    String[] words = string1.replaceAll("[.,]","").split(" ");
    System.out.println(string1+"\n"+string2);
    for(int i=0;i<words.length;i++){
        if(string2.indexOf(words[i])>=0){
            list.add(words[i]);     
            int j=i;
            String tmp=words[i];
            while(j+1<words.length){
                if(string2.indexOf(tmp + " " + words[++j])>=0)
                   tmp = tmp + " " + words[j]; 
                else {
                    if (!tmp.equals(words[i]))
                        list.add(tmp);                         
                    break;
                }
            }                        
         }                            
    }
    System.out.println("Overlapping phrases: "+list);
} 

输出:

[chicken, chicken salad, salad, food]