我试图将一段文本拆分为基于标点符号的单独句子,即[。?!]但是,扫描程序也会在每个新行的末尾分割行,即使我已经指定了一个特定的图案。我该如何解决这个问题?谢谢!
this is a text file. yes the
deliminator works
no it does not. why not?
Scanner scanner = new Scanner(fileInputStream);
scanner.useDelimiter("[.?!]");
while (scanner.hasNext()) {
line = scanner.next();
System.out.println(line);
}
答案 0 :(得分:5)
我不相信扫描仪会在换行符上拆分它,只是你的“行”变量中有换行符,这就是你得到那个输出的原因。例如,您可以用空格替换这些换行符:
(我正在阅读您从文件中提供的相同输入文本,因此它有一些额外的文件读取代码,但您可以获得图片。)
try {
File file = new File("assets/test.txt");
Scanner scanner = new Scanner(file);
scanner.useDelimiter("[.?!]");
while (scanner.hasNext()) {
String sentence = scanner.next();
sentence = sentence.replaceAll("\\r?\\n", " ");
// uncomment for nicer output
//line = line.trim();
System.out.println(sentence);
}
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
结果如下:
this is a text file
yes the deliminator works no it does not
why not
如果我取消修剪修剪线,它会更好一些:
this is a text file
yes the deliminator works no it does not
why not