Question

我有BufferedReader的包装器，它一个接一个地读入文件，以便在多个文件中创建一个不间断的流：

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.ArrayList;
import java.util.zip.GZIPInputStream;

/**
 * reads in a whole bunch of files such that when one ends it moves to the
 * next file.
 * 
 * @author isaak
 *
 */
class LogFileStream implements FileStreamInterface{
    private ArrayList<String> fileNames;
    private BufferedReader br;
    private boolean done = false;

    /**
    * 
    * @param files an array list of files to read from, order matters.
    * @throws IOException
    */
    public LogFileStream(ArrayList<String> files) throws IOException {
        fileNames = new ArrayList<String>();
        for (int i = 0; i < files.size(); i++) {
            fileNames.add(files.get(i));
        }
        setFile();
    }

    /**
     * advances the file that this class is reading from.
     * 
     * @throws IOException
     */
    private void setFile() throws IOException {
        if (fileNames.size() == 0) {
            this.done = true;
            return;
        }
        if (br != null) {
            br.close();
        }
        //if the file is a .gz file do a little extra work.
        //otherwise read it in with a standard file Reader
        //in either case, set the buffer size to 128kb
        if (fileNames.get(0).endsWith(".gz")) {
            InputStream fileStream = new FileInputStream(fileNames.get(0));
            InputStream gzipStream = new GZIPInputStream(fileStream);
            // TODO this probably needs to be modified to work well on any
            // platform, UTF-8 is standard for debian/novastar though.
            Reader decoder = new InputStreamReader(gzipStream, "UTF-8");
            // note that the buffer size is set to 128kb instead of the standard
            // 8kb.
            br = new BufferedReader(decoder, 131072);
            fileNames.remove(0);
        } else {
            FileReader filereader = new FileReader(fileNames.get(0));
            br = new BufferedReader(filereader, 131072);
            fileNames.remove(0);
        }
    }

    /**
     * returns true if there are more lines available to read.
     * @return true if there are more lines available to read.
     */
    public boolean hasMore() {
        return !done;
    }

    /**
      * Gets the next line from the correct file.
      * @return the next line from the files, if there isn't one it returns null
      * @throws IOException
      */
    public String nextLine() throws IOException {
        if (done == true) {
            return null;
        }
        String line = br.readLine();
        if (line == null) {
            setFile();
            return nextLine();
        }
        return line;
    }
}

如果我在一大堆文件（300MB文件）上构造此对象，则在while循环中反复打印nextLine()性能会不断降低，直到不再使用RAM为止。即使我正在读取大约500kb的文件并使用具有32MB内存的虚拟机，也会发生这种情况。

我希望这些代码能够在大量数据集（数百GB的文件）上运行，并且它是需要以32MB或更少内存运行的程序的一个组件。

使用的文件大多标记为CSV文件，因此使用Gzip在磁盘上压缩它们。这个阅读器需要处理gzip和未压缩的文件。

如果我错了，请纠正我，但是一旦文件被读完并且其行从该文件中吐出数据，与该文件相关的对象以及其他所有内容应该可用于垃圾收集？

Answer 1

使用Java 8，GZIP支持已从Java代码转移到本机zlib使用。

非封闭GZIP流泄漏本机内存（我真的说“本机”而不是“堆”内存）并且它很难诊断。根据这些流的应用程序使用情况，操作系统可能会很快达到其内存限制。

症状是操作系统进程内存使用情况与本机内存跟踪https://docs.oracle.com/javase/8/docs/technotes/guides/vm/nmt-8.html生成的JVM内存使用情况不一致

您可以在http://www.evanjones.ca/java-native-leak-bug.html

找到完整的故事详情

Answer 2

最后一次调用setFile不会关闭你的BufferedReader，因此你正在泄漏资源。

确实在nextLine中你读到第一个文件直到结束。到达结尾时，调用setFile并检查是否有更多要处理的文件。但是，如果没有更多文件，则在不关闭最后一个BufferReader用户的情况下返回imediatly。

此外，如果您不处理所有文件，您的资源仍在使用中。

Answer 3

您的代码中至少有一个泄漏：方法setFile()未关闭上一个BufferedReader，因为if (fileNames.size() == 0)检查在if (br != null)检查之前。

但是，只有在多次实例化LogFileStream时才会产生所描述的效果。

使用LinkedList而不是ArrayList也会更好，因为fileNames.remove(0)更加昂贵＆＃39;在ArrayList上而不是在LinkedList上。您可以使用构造函数中的以下单行来实例化它：fileNames = new LinkedList<>(files);

Answer 4

每隔一段时间，您就可以flush()或close() BufferedReader。这将清除读者的内容，因此每次使用setFile()方法时，请刷新阅读器。然后，在每次调用之前，br = new BufferedReader(decoder, 131072)，close() BufferedReader

Answer 5

关闭连接/阅读器后，GC开始工作。如果您使用的是Java 7或更高版本，则可能需要考虑使用try-with-resource语句，这是处理IO操作的更好方法。https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html

为什么我的BufferedReader代码会泄漏内存？

5 个答案: