所以我试图获得五个连续的单词。我有这个输入:
太平洋是地球上最大的海洋分区
输出应该是:
Pacific
Pacific Ocean
Pacific Ocean is
Pacific Ocean is the
Pacific Ocean is the largest
Ocean
Ocean is
Ocean is the
Ocean is the largest
Ocean is the largest of
is
is the
is the largest
is the largest of
is the largest of the
the
the largest
the largest of
the largest of the
the largest of the Earth's
largest
largest of
largest of the
largest of the Earth's
largest of the Earth's oceanic
of
of the
of the Earth's
of the Earth's oceanic
of the Earth's oceanic divisions
the
the Earth's
the Earth's oceanic
the Earth's oceanic divisions
Earth's
Earth's oceanic
Earth's oceanic divisions
oceanic
oceanic divisions
divisions
我的尝试:
public void getComb(String line) {
String words[] = line.split(" ");
int count = 0;
for (int i = 0; i < words.length; i++) {
String word = "";
int m = i;
while (count < 5) {
count++;
word += " " + words[m];
System.out.println(word);
m++;
}
}
}
但输出错了!输出:
Pacific
Pacific Ocean
Pacific Ocean is
Pacific Ocean is the
Pacific Ocean is the largest
如何解决?
答案 0 :(得分:4)
使用嵌套的for循环而不是while循环,并在外循环中前进起始字:
public static void getComb(String line) {
String words[] = line.split(" ");
for (int i = 0; i < words.length; i++) {
String word = "";
for (int w = i; w < ((i + 5 < words.length) ? (i + 5) : words.length); w++) {
word += " " + words[w];
System.out.println(word);
}
}
}
注意内部for循环中条件的((i + 5 < words.length) ? (i + 5) : words.length)
;它是必要的,以便当不到五个单词时,你不会访问数组之外的元素 - 没有它你会得到一个ArrayIndexOutOfBoundsException
答案 1 :(得分:2)
更改代码段count = 0
的位置:
public void getComb(String line) {
String words[] = line.split(" ");
for (int i = 0; i < words.length; i++) {
int count = 0; // RESET COUNT
String word = "";
int m = i;
while (count < 5 && m < words.length) { // NO EXCEPTION with 'm' limit
count++;
word += " " + words[m];
System.out.println(word);
m++;
}
}
}
答案 2 :(得分:1)
正式地,您希望从字符串中找到大小为1,2,3,4和5的n-grams。 Apache Lucene库中的ShingleFilter类可用于此目的。来自JavaDoc:
ShingleFilter从令牌流构造带状疱疹(令牌n-gram)。换句话说,它创建令牌组合作为单个令牌。 例如,句子&#34;请将这句话分成带状疱疹&#34;可能被标记为带状疱疹&#34;请分开&#34;,&#34;将此&#34;,&#34;这句话&#34;,&#34;句子分成&#34;,&#34;进入带状疱疹&#34;。
答案 3 :(得分:1)
尝试以下方法..安迪尼定的修改版本
public void getComb(String line)
{
String words[] = line.split(" ");
for(int i=0;i<words.length;i++)
{
int count=0; //******* RESET CONT *****//
String word = "";
int m=i;
while(count<5 && m < 10)
{
count++;
word += " "+words[m];
System.out.println(word);
m++;
}
}
}