Question

我使用Java 8流来处理文件，但到目前为止总是逐行处理。

我想要的是一个函数，它获取BufferedReader br并且应该读取特定数量的单词（由"\\s+"分隔）并且应该将BufferedReader保留在确切的位置，其中单词的数量到了。

现在我有一个版本，它按行读取文件：

    final int[] wordCount = {20};
    br
          .lines()
          .map(l -> l.split("\\s+"))
          .flatMap(Arrays::stream)
          .filter(s -> {
              //Process s
              if(--wordCount[0] == 0) return true;
              return false;
          }).findFirst();

这显然使Inputstream处于下一行的位置第20个字。
有没有办法从输入流中获取少于一行的流？

修改
我正在解析一个文件，其中第一个单词包含下列单词的数量。我读了这个词然后读了具体的单词数。该文件包含多个这样的部分，其中每个部分在所描述的函数中被解析。

阅读完所有有用的评论之后，我很清楚，使用Scanner是解决此问题的正确选择，并且Java 9将具有提供流功能的Scanner类（{ {1}}和Scanner.tokens()）以我描述的方式使用Streams将无法保证读者将在流的终端操作（API docs）之后处于特定位置，因此使得流成为解析结构的错误选择，只解析一个部分，并且必须跟踪位置。

Answer 1

关于您的原始问题：我假设您的文件如下所示：

5 a section of five words 3 three words
section 2 short section 7 this section contains a lot 
of words

你想得到这样的输出：

[a, section, of, five, words]
[three, words, section]
[short, section]
[this, section, contains, a, lot, of, words]

通常，Stream API非常适合此类问题。编写普通的旧循环在这里看起来更好。如果您仍想查看基于Stream API的解决方案，我可以建议使用包含StreamEx方法的headTail()库，以便您轻松编写自定义流转换逻辑。以下是使用headTail：

解决问题的方法

/* Transform Stream of words like 2, a, b, 3, c, d, e to
   Stream of lists like [a, b], [c, d, e] */
public static StreamEx<List<String>> records(StreamEx<String> input) {
    return input.headTail((count, tail) -> 
        makeRecord(tail, Integer.parseInt(count), new ArrayList<>()));
}

private static StreamEx<List<String>> makeRecord(StreamEx<String> input, int count, 
                                                 List<String> buf) {
    return input.headTail((head, tail) -> {
        buf.add(head);
        return buf.size() == count 
                ? records(tail).prepend(buf)
                : makeRecord(tail, count, buf);
    });
}

用法示例：

String s = "5 a section of five words 3 three words\n"
        + "section 2 short section 7 this section contains a lot\n"
        + "of words";
Reader reader = new StringReader(s);
Stream<List<String>> stream = records(StreamEx.ofLines(reader)
               .flatMap(Pattern.compile("\\s+")::splitAsStream));
stream.forEach(System.out::println);

结果与上面所需的输出完全相同。将reader替换为BufferedReader或FileReader以从输入文件中读取。记录流是懒惰的：一次最多只有一条记录由流保存，如果你短路，其余的输入将不会被读取（当然，当前文件行将被读取到最后）。该解决方案虽然看起来是递归的，但不会占用堆栈或堆，因此它也适用于大型文件。

说明：

headTail()方法采用两参数lambda，在请求stream元素时，在外部流终端操作执行期间最多执行一次。 lambda接收第一个流元素（head）和包含所有其他原始元素（tail）的流。 lambda应返回一个新流，而不是原始流。在records我们有：

return input.headTail((count, tail) -> 
    makeRecord(tail, Integer.parseInt(count), new ArrayList<>()));

输入的第一个元素是count：将其转换为数字，创建空ArrayList并为尾部调用makeRecord。这是makeRecord辅助方法实现：

return input.headTail((head, tail) -> {

第一个流元素是head，将其添加到当前缓冲区：

    buf.add(head);

达到目标缓冲区大小？

    return buf.size() == count

如果是，请再次为records调用tail（处理下一条记录，如果有的话），并在前面添加单个元素：当前缓冲区。

            ? records(tail).prepend(buf)

否则，请自行调试尾部（向缓冲区添加更多元素）。

            : makeRecord(tail, count, buf);
});

Java 8 Streams：逐字读取文件

1 个答案: