Question

我有Cp1250编码的源文件。所有这些文件都在dirName目录或其子目录中。我想通过添加内容将它们合并到一个utf-8文件中。不幸的是，我在结果文件的开头得到空行。

public static void processDir(String dirName, String resultFileName) {
    try {
        File resultFile = new File(resultFileName);
        BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(resultFile), "utf-8"));
        Files.walk(Paths.get(dirName)).filter(Files::isRegularFile).forEach((path) -> {
            try {
                Files.readAllLines(path, Charset.forName("Windows-1250")).stream().forEach((line) -> {
                    try {
                        bw.newLine();
                        bw.write(line);                     
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                });
            } catch (Exception e) {
                e.printStackTrace();
            }
        });
        bw.close();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

原因是我不知道如何检测流中的第一个文件。

我想出了一个非常愚蠢的解决方案，它不依赖于流，因此不能令人满意：

public static void processDir(String dirName, String resultFileName) {
        try {
            File resultFile = new File(resultFileName);
            BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(resultFile), "utf-8"));
            Files.walk(Paths.get(dirName)).filter(Files::isRegularFile).forEach((path) -> {
                try {
                    Files.readAllLines(path, Charset.forName("Windows-1250")).stream().forEach((line) -> {
                        try {
                            if(resultFile.length() != 0){
                                bw.newLine();
                            }
                            bw.write(line);
                            if(resultFile.length() == 0){
                                bw.flush();
                            }
                        } catch (Exception e) {
                            e.printStackTrace();
                        }
                    });
                } catch (Exception e) {
                    e.printStackTrace();
                }
            });
            bw.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

我也可以使用静态boolean但这完全是胡言乱语。

Answer 1

您可以使用flatMap创建所有文件的所有行的流，然后再次使用flatMap将其与行分隔符交错，然后使用skip(1)跳过前导分隔符像这样：

public static void processDir(String dirName, String resultFileName) {
    try(BufferedWriter bw = Files.newBufferedWriter(Paths.get(resultFileName))) {
        Files.walk(Paths.get(dirName)).filter(Files::isRegularFile)
            .flatMap(path -> {
                try {
                    return Files.lines(path, Charset.forName("Windows-1250"));
                } catch (IOException e) {
                    throw new UncheckedIOException(e);
                }
            })
            .flatMap(line -> Stream.of(System.lineSeparator(), line))
            .skip(1)
            .forEach(line -> {
                try {
                    bw.write(line);
                } catch (IOException e) {
                    throw new UncheckedIOException(e);
                }
            });
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
}

通常，使用flatMap + skip组合可以帮助解决许多类似的问题。

另请注意Files.newBufferedWriter方法，这种方法更简单，可以创建BufferedWriter。不要忘记尝试资源。

Answer 2

重新思考你的策略。如果您想加入文件，既不删除也不转换行终止符，则没有理由处理行。看来，编写代码处理线的唯一原因是，您需要将lambda表达式和流保存到解决方案中，并且当前API提供的唯一可能性是处理行的流。但显然，它们不适合这项工作：

public static void processDir(String dirName, String resultFileName) throws IOException {
    Charset cp1250 = Charset.forName("Windows-1250");
    CharBuffer buffer=CharBuffer.allocate(8192);
    try(BufferedWriter bw
          =Files.newBufferedWriter(Paths.get(resultFileName), CREATE, TRUNCATE_EXISTING)) {
        Files.walkFileTree(Paths.get(dirName), new SimpleFileVisitor<Path>() {
            @Override public FileVisitResult visitFile(
                             Path path, BasicFileAttributes attrs) throws IOException {
                try(BufferedReader r=Files.newBufferedReader(path, cp1250)) {
                    while(r.read(buffer)>0) {
                        bw.write(buffer.array(), buffer.arrayOffset(), buffer.position());
                        buffer.clear();
                    }
                }
                return FileVisitResult.CONTINUE;
            }
        });
        bw.close();
    }
}

请注意此解决方案如何解决您首次尝试的问题。您不必在此处理行终止符，此代码甚至不会浪费资源来尝试在输入中找到它们。它所做的只是对输入数据块执行字符集转换并将它们写入目标。性能差异可能很大。

此外，代码不会被捕获异常，你无法处理。如果在操作的任何位置发生IOException，则会正确关闭所有待处理资源，并将异常转发给调用者。

当然，它只使用一个好的旧内部类而不是lambda表达式。但与您的尝试相比，它并没有降低可读性。如果仍然让您感到困扰的是没有涉及lambda表达式，您可以检查this question & answer以获取再次使用它们的方法。

如何检测流中的第一个文件

2 个答案: