Question

Java 8有一种从文件行创建Stream的方法。在这种情况下，foreach将逐步执行。我有一个格式如下的文本文件..

bunch of lines with text
$$$$
bunch of lines with text
$$$$

我需要将$$$$之前的每一行都放到Stream中的单个元素中。

换句话说，我需要一串字符串。每个字符串都包含$$$$之前的内容。

执行此操作的最佳方式（最小开销）是什么？

Answer 1

我无法想出一个懒洋洋地处理这些行的解决方案。我不确定这是否可行。

我的解决方案产生ArrayList。如果您必须使用Stream，只需在其上调用stream()。

public class DelimitedFile {
    public static void main(String[] args) throws IOException {
        List<String> lines = lines(Paths.get("delimited.txt"), "$$$$");
        for (int i = 0; i < lines.size(); i++) {
            System.out.printf("%d:%n%s%n", i, lines.get(i));
        }
    }

    public static List<String> lines(Path path, String delimiter) throws IOException {
        return Files.lines(path)
                .collect(ArrayList::new, new BiConsumer<ArrayList<String>, String>() {
                    boolean add = true;

                    @Override
                    public void accept(ArrayList<String> lines, String line) {
                        if (delimiter.equals(line)) {
                            add = true;
                        } else {
                            if (add) {
                                lines.add(line);
                                add = false;
                            } else {
                                int i = lines.size() - 1;
                                lines.set(i, lines.get(i) + '\n' + line);
                            }
                        }
                    }
                }, ArrayList::addAll);
    }
}

文件内容：

bunch of lines with text
bunch of lines with text2
bunch of lines with text3
$$$$
2bunch of lines with text
2bunch of lines with text2
$$$$
3bunch of lines with text
3bunch of lines with text2
3bunch of lines with text3
3bunch of lines with text4
$$$$

输出：

0:
bunch of lines with text
bunch of lines with text2
bunch of lines with text3
1:
2bunch of lines with text
2bunch of lines with text2
2:
3bunch of lines with text
3bunch of lines with text2
3bunch of lines with text3
3bunch of lines with text4

修改

我终于想出了一个懒惰地生成Stream的解决方案：

public static Stream<String> lines(Path path, String delimiter) throws IOException { Stream<String> lines = Files.lines(path); Iterator<String> iterator = lines.iterator(); return StreamSupport.stream(Spliterators.spliteratorUnknownSize(new Iterator<String>() { String nextLine; @Override public boolean hasNext() { if (nextLine != null) { return true; } while (iterator.hasNext()) { String line = iterator.next(); if (!delimiter.equals(line)) { nextLine = line; return true; } } lines.close(); return false; } @Override public String next() { if (!hasNext()) { throw new NoSuchElementException(); } StringBuilder sb = new StringBuilder(nextLine); nextLine = null; while (iterator.hasNext()) { String line = iterator.next(); if (delimiter.equals(line)) { break; } sb.append('\n').append(line); } return sb.toString(); } }, Spliterator.ORDERED | Spliterator.NONNULL | Spliterator.IMMUTABLE), false); }

这实际上/恰巧与BufferedReader.lines()的实现非常相似（Files.lines(Path)在内部使用）。不使用这两种方法可能会减少开销，而是直接使用Files.newBufferedReader(Path)和BufferedReader.readLine()。

Answer 2

你可以尝试

    List<String> list = new ArrayList<>();
    try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
            list = stream
                .filter(line -> !line.equals("$$$$"))
                .collect(Collectors.toList());
    } catch (IOException e) {
        e.printStackTrace();
    }

Answer 3

已经存在类似的较短答案，但是type.safe如下，没有额外的状态：

    Path path = Paths.get("... .txt");
    try {
        List<StringBuilder> glist = Files.lines(path, StandardCharsets.UTF_8)
                .collect(() -> new ArrayList<StringBuilder>(),
                        (list, line) -> {
                            if (list.isEmpty() || list.get(list.size() - 1).toString().endsWith("$$$$\n")) {
                                list.add(new StringBuilder());
                            }
                            list.get(list.size() - 1).append(line).append('\n');
                        },
                        (list1, list2) -> {
                            if (!list1.isEmpty() && !list1.get(list1.size() - 1).toString().endsWith("$$$$\n")
                                    && !list2.isEmpty()) {
                                // Merge last of list1 and first of list2:
                                list1.get(list1.size() - 1).append(list2.remove(0).toString());
                            }
                            list1.addAll(list2);
                        });
        glist.forEach(sb -> System.out.printf("------------------%n%s%n", sb));
    } catch (IOException ex) {
        Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex);
    }

而不是.endsWith("$$$$\n")，最好这样做：

.matches("(^|\n)\\$\\$\\$\\$\n")

Answer 4

此处基于this previous work的解决方案：

public class ChunkSpliterator extends Spliterators.AbstractSpliterator<List<String>> {
    private final Spliterator<String> source;
    private final Predicate<String> delimiter;
    private final Consumer<String> getChunk;
    private List<String> current;

    ChunkSpliterator(Spliterator<String> lineSpliterator, Predicate<String> mark) {
        super(lineSpliterator.estimateSize(), ORDERED|NONNULL);
        source=lineSpliterator;
        delimiter=mark;
        getChunk=s -> {
            if(current==null) current=new ArrayList<>();
            current.add(s);
        };
    }
    public boolean tryAdvance(Consumer<? super List<String>> action) {
        while(current==null || !delimiter.test(current.get(current.size()-1)))
            if(!source.tryAdvance(getChunk)) return lastChunk(action);
        current.remove(current.size()-1);
        action.accept(current);
        current=null;
        return true;
    }
    private boolean lastChunk(Consumer<? super List<String>> action) {
        if(current==null) return false;
        action.accept(current);
        current=null;
        return true;
    }

    public static Stream<List<String>> toChunks(
        Stream<String> lines, Predicate<String> splitAt, boolean parallel) {
        return StreamSupport.stream(
            new ChunkSpliterator(lines.spliterator(), splitAt),
            parallel);
    }
}

你可以使用

try(Stream<String> lines=Files.lines(pathToYourFile)) {
    ChunkSpliterator.toChunks(
        lines,
        Pattern.compile("^\\Q$$$$\\E$").asPredicate(),
        false)
    /* chain your stream operations, e.g.
    .forEach(s -> { s.forEach(System.out::print); System.out.println(); })
     */;
}

Answer 5

您可以使用Scanner作为迭代器并从中创建流：

private static Stream<String> recordStreamOf(Readable source) {
    Scanner scanner = new Scanner(source);
    scanner.useDelimiter("$$$$");
    return StreamSupport
        .stream(Spliterators.spliteratorUnknownSize(scanner, Spliterator.ORDERED | Spliterator.NONNULL), false)
        .onClose(scanner::close);
}

这将保留块中的换行符以进行进一步过滤或拆分。

使用Java 8 Stream

5 个答案: