Java Scanner按句子拆分字符串

时间:2014-01-07 11:36:48

标签: java regex java.util.scanner

我试图将一段文本拆分为基于标点符号的单独句子,即[。?!]但是,扫描程序也会在每个新行的末尾分割行,即使我已经指定了一个特定的图案。我该如何解决这个问题?谢谢!

this is a text file. yes the
deliminator works
no it does not. why not?

Scanner scanner = new Scanner(fileInputStream);
scanner.useDelimiter("[.?!]");
while (scanner.hasNext()) {
  line = scanner.next();
  System.out.println(line);
}

1 个答案:

答案 0 :(得分:5)

我不相信扫描仪会在换行符上拆分它,只是你的“行”变量中有换行符,这就是你得到那个输出的原因。例如,您可以用空格替换这些换行符:

(我正在阅读您从文件中提供的相同输入文本,因此它有一些额外的文件读取代码,但您可以获得图片。)

try {
    File file = new File("assets/test.txt");
    Scanner scanner = new Scanner(file);
    scanner.useDelimiter("[.?!]");
    while (scanner.hasNext()) {
        String sentence = scanner.next();
        sentence = sentence.replaceAll("\\r?\\n", " ");
        // uncomment for nicer output
        //line = line.trim();
        System.out.println(sentence);
    }
    scanner.close();
} catch (FileNotFoundException e) {
    e.printStackTrace();
}

结果如下:

this is a text file
 yes the deliminator works no it does not
 why not

如果我取消修剪修剪线,它会更好一些:

this is a text file
yes the deliminator works no it does not
why not