获得单词的五个连续组合

时间:2014-04-01 10:22:18

标签: java

所以我试图获得五个连续的单词。我有这个输入:

  

太平洋是地球上最大的海洋分区

输出应该是:

 Pacific
 Pacific Ocean
 Pacific Ocean is
 Pacific Ocean is the
 Pacific Ocean is the largest
 Ocean
 Ocean is
 Ocean is the
 Ocean is the largest
 Ocean is the largest of
 is
 is the
 is the largest
 is the largest of
 is the largest of the
 the
 the largest
 the largest of
 the largest of the
 the largest of the Earth's
 largest
 largest of
 largest of the
 largest of the Earth's
 largest of the Earth's oceanic
 of
 of the
 of the Earth's
 of the Earth's oceanic
 of the Earth's oceanic divisions
 the
 the Earth's
 the Earth's oceanic
 the Earth's oceanic divisions
 Earth's
 Earth's oceanic
 Earth's oceanic divisions
 oceanic
 oceanic divisions
 divisions

我的尝试:

public void getComb(String line) {
    String words[] = line.split(" ");
    int count = 0;

    for (int i = 0; i < words.length; i++) {
        String word = "";
        int m = i;
        while (count < 5) {
            count++;
            word += " " + words[m];
            System.out.println(word);
            m++;
        }
    }
}

但输出错了!输出:

 Pacific
 Pacific Ocean
 Pacific Ocean is
 Pacific Ocean is the
 Pacific Ocean is the largest

如何解决?

4 个答案:

答案 0 :(得分:4)

使用嵌套的for循环而不是while循环,并在外循环中前进起始字:

public static void getComb(String line) {
    String words[] = line.split(" ");

    for (int i = 0; i < words.length; i++) {
        String word = "";

        for (int w = i; w < ((i + 5 < words.length) ? (i + 5) : words.length); w++) {
            word += " " + words[w];
            System.out.println(word);
        }
    }
}

注意内部for循环中条件的((i + 5 < words.length) ? (i + 5) : words.length);它是必要的,以便当不到五个单词时,你不会访问数组之外​​的元素 - 没有它你会得到一个ArrayIndexOutOfBoundsException

答案 1 :(得分:2)

更改代码段count = 0的位置:

public void getComb(String line) {
    String words[] = line.split(" ");

    for (int i = 0; i < words.length; i++) {
        int count = 0;   // RESET COUNT
        String word = "";
        int m = i;
        while (count < 5 && m < words.length) { // NO EXCEPTION with 'm' limit
            count++;
            word += " " + words[m];
            System.out.println(word);
            m++;
        }
    }
}

答案 2 :(得分:1)

正式地,您希望从字符串中找到大小为1,2,3,4和5的n-grams。 Apache Lucene库中的ShingleFilter类可用于此目的。来自JavaDoc:

  

ShingleFilter从令牌流构造带状疱疹(令牌n-gram)。换句话说,它创建令牌组合作为单个令牌。   例如,句子&#34;请将这句话分成带状疱疹&#34;可能被标记为带状疱疹&#34;请分开&#34;,&#34;将此&#34;,&#34;这句话&#34;,&#34;句子分成&#34;,&#34;进入带状疱疹&#34;。

答案 3 :(得分:1)

尝试以下方法..安迪尼定的修改版本

public void getComb(String line)
{
    String words[] = line.split(" ");

    for(int i=0;i<words.length;i++)
    {
        int count=0;   //******* RESET CONT *****//
        String word = "";
        int m=i;
        while(count<5 && m < 10)
        {
            count++;
            word += " "+words[m];
            System.out.println(word);
            m++;
        }
    }
}