Question

第一段代码：

// code is a private "global variable" for the class
// SourceCodeBuilder is a class that uses StringBuilder()
// basically it is based on String(s), formatted and with many appends depending on the "loc()" calls (see below)
private SourceCodeBuilder code = new SourceCodeBuilder();

[...]

    // create "file.txt" and call algorithm
    fileOut = new FileWriter("file.txt");

    for (int i=0; i<x; i++) {
        algorithm();
    }

其中 algorithm（）是这样的方法：

private void algorithm () {
    for (int i=0; i<y; i++) {
        code.loc("some text");
        code.loc("other text");
        ...
    }

    // after "building" the code value I wrote it on the file
    fileOut.write(code.toString());
    fileOut.flush();
    code.free(); // this call "empties" the code variable (so the next time algorithm() is called it has the code var sets to "" - it frees a lot of memory)
                 // basically it calls "setLength(0)" method of StringBuilder
}

当我在大文本文件上执行所有这些操作时，执行大约需要4500毫秒，而内存则少于60MB。

然后我尝试使用其他代码。第二段代码：

private SourceCodeBuilder code = new SourceCodeBuilder();

[...]

    // create "file.txt" and call algorithm
    fileOut = new FileWriter("file.txt");

    for (int i=0; i<x; i++) {
        algorithm();
    }

    fileOut.write(code.toString());
    fileOut.flush();
    fileOut.close();

此时 algorithm（）是这样的方法：

private void algorithm () {
    for (int i=0; i<y; i++) {
        code.loc("some text");
        code.loc("other text");
        ...
    }
}

它需要超过250MB的内存（并且没关系，因为我没有在代码变量上调用“free（）”方法，所以它是“continuos”附加在同一个变量上），但令人惊讶的是它需要更多超过5300ms执行。这比第一个代码慢了大约16％，我无法向自己解释原因。

在第一段代码中，我在“file.txt”上写了一小段文字多次。在第二个代码中，我在“file.txt”上写了一段很大的文本，但只有一次，并且使用了更多的内存。使用第二个代码我期望更多的内存消耗，但没有更多的CPU消耗（仅仅因为有更多的I / O操作）。

结论：第一段代码比第二段快，即使第一段代码执行的I / O操作多于第二段。为什么？我错过了什么吗？

Answer 1

每个系统调用都有一个开销，您可以通过使用BufferedWriter或读取器或流来避免这些开销。（这就是你使用缓冲的原因）

在第一种情况下，您在写入之前缓冲整个内容。在第二种情况下，您一次只编写一些文件，这将导致更多的系统调用，从而增加开销。

如果您要更快地生成文件，您可能会发现几乎所有时间都花在系统调用上。

您在块中流式传输数据（使用缓冲）的原因是您不会使用这么多内存。也就是说，有一个点，较大的缓冲区会减慢你的速度，而不是帮助你。

在你的情况下，我怀疑你正在写一个StringBuilder或StringWriter（使用StringBuffer），这必须复制，因为它被调整大小到你最终需要的大小。这会产生一些GC开销，导致更多的复制。

Answer 2

当您慢慢填充大内存缓冲区时，所需的时间会非线性增长，因为您需要多次重新分配缓冲区，每次将整个内容复制到内存中的新位置。这需要时间，特别是当缓冲区为200MB +时。如果您预先分配缓冲区，您的流程可能会更快。

然而，以上所有仅仅是我的猜测。您应该对应用程序进行分析，以了解额外时间的确切位置。

怎么可能是文件上的多次写入比单个写入更快

2 个答案: