我有一个包含一些文本的文件,最后是一个数字。该文件类似于:
to Polyxena. Achilles appears in the in the novel The Firebrand by Marion
the firebrand 14852520
fantasy novelist David Gemmell omic book hero Captain Marvel is endowed with the courage of Achilles, as well
captain marvel 403585
the city its central theme and
corfu 45462
我想要的是将所有文本分组到数字。例如:
" to Polyxena. Achilles appears in the in the novel The Firebrand by Marion the firebrand 14852520"
" fantasy novelist David Gemmell omic book hero Captain Marvel is endowed with the courage of Achilles, as well captain marvel 403585"
我注意到每组文本都以空格开头。但是我很难将它们分组。我编码了这个:
String line;
String s = " ";
char whiteSpace = s.charAt(0);
ArrayList<String> lines = new ArrayList<>();
BufferedReader in = new BufferedReader(new FileReader(args[0]));
while((line = in.readLine()) != null)
{
if (whiteSpace == line.charAt(0)){ //start of sentence
lines.add(line);
}
}
in.close();
答案 0 :(得分:1)
您可以遵循此算法:
这样的事情:
String text = " to Polyxena. Achilles appears in the in the novel The Firebrand by Marion \n" +
"the firebrand 14852520\n" +
" fantasy novelist David Gemmell omic book hero Captain Marvel is endowed with the courage of Achilles, as well \n" +
"captain marvel 403585\n" +
" the city its central theme and \n" +
"corfu 45462";
Scanner scanner = new Scanner(text);
List<String> lines = new ArrayList<>();
StringBuilder buffer = new StringBuilder();
while (scanner.hasNext()) {
String line = scanner.nextLine();
buffer.append(line);
if (line.matches(".*\\d+$")) {
lines.add(buffer.toString());
buffer.setLength(0);
}
}