删除标点符号java

时间:2016-11-06 03:56:10

标签: java string char

我试图从字符串中删除标点但保留空格,因为我需要能够区分不同的单词。最终目标是找到字符串中每个单词的长度。

我设置了一个for循环来检查一个单词的长度,直到它到达一个空格,但这会将标点符号计为一个字母。我知道我必须更改if语句中的变量,以反映字符串中iindexOf之间的子字符串的长度。

for(int i=0; i > stringLength - 1;){
original.substring(i, original.indexOf(' '));
if(i > minLength)

2 个答案:

答案 0 :(得分:0)

如果你需要获得每个单词的长度而不是这样做,否则你在if statment中做了那个opertaion:

int cnt = 0;
for(int i=0; i < original.length();i++){
   if(",;:.?! ".indexOf(orignal.charAt(i)) > -1){
       if(cnt > 0){
          System.out.println(cnt);
          cnt = 0;
       }
   } else {
       cnt++;
   }
}

答案 1 :(得分:0)

虽然扔掉一堆fors和ifs可能很诱人,但使用正则表达式会更简洁:

Pattern.compile("[.,; ]+").splitAsStream(input)

一个完整的例子:

import java.util.regex.Pattern;
import java.util.stream.Collectors;

public class Counting {
    public static void main(String... args) {
        String text = "This is a string. With some punctuation, but I only care about words.";

        String wordsWithLengths = Pattern.compile("[.,; ]+")
                .splitAsStream(text)
                .map(word -> word + " => " + word.length())
                .collect(Collectors.joining("\n"));

        System.out.println(wordsWithLengths);
    }
}

输出:

This => 4
is => 2
a => 1
string => 6
With => 4
some => 4
punctuation => 11
but => 3
I => 1
only => 4
care => 4
about => 5
words => 5

另外,如果你想计算多少个单词有多于N个字符,你可以:

import java.util.regex.Pattern;

public class CountingWords {
    public static void main(String... args) {
        String text = "This is a string. With some punctuation, but I only care about words.";

        int threshold = 5;
        long amountOfWords = Pattern.compile("[.,; ]+")
                .splitAsStream(text)
                .filter( word -> word.length() > threshold)
                .count();

        System.out.println("There are " + amountOfWords +  " words with more than " + threshold + " characters");
    }
}