读取文件并将其文本分组

时间:2016-10-29 16:23:21

标签: java text filereader

我有一个包含一些文本的文件,最后是一个数字。该文件类似于:

 to Polyxena. Achilles appears in the in the novel The Firebrand by Marion 
the firebrand   14852520
 fantasy novelist David Gemmell omic book hero Captain Marvel is endowed with the courage of Achilles, as well 
captain marvel  403585
 the city its central theme and 
corfu   45462

我想要的是将所有文本分组到数字。例如:

" to Polyxena. Achilles appears in the in the novel The Firebrand by Marion the firebrand   14852520" 

" fantasy novelist David Gemmell omic book hero Captain Marvel is endowed with the courage of Achilles, as well captain marvel  403585"

我注意到每组文本都以空格开头。但是我很难将它们分组。我编码了这个:

String line;
String s = " ";
char whiteSpace = s.charAt(0);

ArrayList<String> lines = new ArrayList<>();
BufferedReader in = new BufferedReader(new FileReader(args[0]));
while((line = in.readLine()) != null)
{   
    if (whiteSpace == line.charAt(0)){ //start of sentence
        lines.add(line);            
    }
}
in.close();

1 个答案:

答案 0 :(得分:1)

您可以遵循此算法:

  • 创建一个空缓冲区
  • 每行:
    • 附加到缓冲区
    • 如果该行以数字结尾:
    • 将缓冲区添加到列表
    • 清空缓冲区

这样的事情:

String text = " to Polyxena. Achilles appears in the in the novel The Firebrand by Marion \n" +
        "the firebrand   14852520\n" +
        " fantasy novelist David Gemmell omic book hero Captain Marvel is endowed with the courage of Achilles, as well \n" +
        "captain marvel  403585\n" +
        " the city its central theme and \n" +
        "corfu   45462";
Scanner scanner = new Scanner(text);

List<String> lines = new ArrayList<>();
StringBuilder buffer = new StringBuilder();

while (scanner.hasNext()) {
    String line = scanner.nextLine();
    buffer.append(line);
    if (line.matches(".*\\d+$")) {
        lines.add(buffer.toString());
        buffer.setLength(0);
    }
}