Question

我正在创建一个将数据上传到服务器的应用程序。数据非常庞大，高达60-70gb。我正在使用java，因为我需要它在任何浏览器中运行。

我的方法是这样的：

InputStream s = new FileInputStream(file);
byte[] chunk = new byte[20000000];
s.read(chunk);
s.close();
client.postToServer(chunk);

目前它使用了大量内存，稳定地爬升到大约1gb，当垃圾收集器点击它时非常明显，块之间有5-6秒的差距。

有没有办法提高性能并将内存占用保持在合适的水平？

修改

这不是我的真实代码。还有很多其他的事情我喜欢计算CRC，验证InputStream.read的返回值，等等。

Answer 1

您需要考虑缓冲区重用，如下所示：

int size = 64*1024; // 64KiB
byte[] chunk = new byte[size];
int read = -1;
for( read = s.read(chunk); read != -1; read = s.read(chunk)) {
  /*
   * I do hope you have some API call like the thing below, or at least one with a wrapper object that 
   * exposes partially filled buffers. Because read might not be the size of the entire buffer if there
   * are less than that amount of bytes available in the input stream until the end of the file...
   */
  client.postToServer(chunk, 0, read);
}

Answer 2

第一步是重新使用缓冲区，如果你还没有这样做的话。读取一个巨大的文件不通常需要大量的内存，除非你把它全部留在内存中。

另外：你为什么要使用这么大的缓冲区？没有什么可以从中获得（除非你有一个非常快速的网络连接和硬盘）。将其减少到大约64k 应该对性能没有负面影响，可能帮助Java使用GC。

Answer 3

您可以尝试调整垃圾收集器（http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html，http://www.petefreitag.com/articles/gctuning/）

当Java应用程序以块的形式读取巨大的文件时，减少内存印记

3 个答案: