我试图从字符串中删除标点但保留空格,因为我需要能够区分不同的单词。最终目标是找到字符串中每个单词的长度。
我设置了一个for
循环来检查一个单词的长度,直到它到达一个空格,但这会将标点符号计为一个字母。我知道我必须更改if语句中的变量,以反映字符串中i
和indexOf
之间的子字符串的长度。
for(int i=0; i > stringLength - 1;){
original.substring(i, original.indexOf(' '));
if(i > minLength)
答案 0 :(得分:0)
如果你需要获得每个单词的长度而不是这样做,否则你在if statment中做了那个opertaion:
int cnt = 0;
for(int i=0; i < original.length();i++){
if(",;:.?! ".indexOf(orignal.charAt(i)) > -1){
if(cnt > 0){
System.out.println(cnt);
cnt = 0;
}
} else {
cnt++;
}
}
答案 1 :(得分:0)
虽然扔掉一堆fors和ifs可能很诱人,但使用正则表达式会更简洁:
Pattern.compile("[.,; ]+").splitAsStream(input)
一个完整的例子:
import java.util.regex.Pattern;
import java.util.stream.Collectors;
public class Counting {
public static void main(String... args) {
String text = "This is a string. With some punctuation, but I only care about words.";
String wordsWithLengths = Pattern.compile("[.,; ]+")
.splitAsStream(text)
.map(word -> word + " => " + word.length())
.collect(Collectors.joining("\n"));
System.out.println(wordsWithLengths);
}
}
输出:
This => 4
is => 2
a => 1
string => 6
With => 4
some => 4
punctuation => 11
but => 3
I => 1
only => 4
care => 4
about => 5
words => 5
另外,如果你想计算多少个单词有多于N个字符,你可以:
import java.util.regex.Pattern;
public class CountingWords {
public static void main(String... args) {
String text = "This is a string. With some punctuation, but I only care about words.";
int threshold = 5;
long amountOfWords = Pattern.compile("[.,; ]+")
.splitAsStream(text)
.filter( word -> word.length() > threshold)
.count();
System.out.println("There are " + amountOfWords + " words with more than " + threshold + " characters");
}
}