我正在编写一个处理二进制文件(最多50兆)的大量整数的应用程序。我需要尽快完成,主要的性能问题是磁盘访问时间,因为我从磁盘中进行大量读取,优化读取时间通常会提高应用程序的性能。
到目前为止,我认为将文件拆分的块越少(即读取的读取次数越少/读取的大小越大),我的应用程序运行得越快。这是因为HDD在寻找时非常慢,即由于其机械性质而定位块的开始。但是,一旦它找到块的开头,你要求它读取它应该相当快地执行实际读取。
嗯,直到我运行这个测试:
旧测试已删除,由于HDD缓存而出现问题
NEW TEST(硬盘缓存在这里没有帮助,因为文件太大(1gb)并且我访问其中的随机位置):
int mega = 1024 * 1024;
int giga = 1024 * 1024 * 1024;
byte[] bigBlock = new byte[mega];
int hundredKilo = mega / 10;
byte[][] smallBlocks = new byte[10][hundredKilo];
String location = "C:\\Users\\Vladimir\\Downloads\\boom.avi";
RandomAccessFile raf;
FileInputStream f;
long start;
long end;
int position;
java.util.Random rand = new java.util.Random();
int bigBufferTotalReadTime = 0;
int smallBufferTotalReadTime = 0;
for (int j = 0; j < 100; j++)
{
position = rand.nextInt(giga);
raf = new RandomAccessFile(location, "r");
raf.seek((long) position);
f = new FileInputStream(raf.getFD());
start = System.currentTimeMillis();
f.read(bigBlock);
end = System.currentTimeMillis();
bigBufferTotalReadTime += end - start;
f.close();
}
for (int j = 0; j < 100; j++)
{
position = rand.nextInt(giga);
raf = new RandomAccessFile(location, "r");
raf.seek((long) position);
f = new FileInputStream(raf.getFD());
start = System.currentTimeMillis();
for (int i = 0; i < 10; i++)
{
f.read(smallBlocks[i]);
}
end = System.currentTimeMillis();
smallBufferTotalReadTime += end - start;
f.close();
}
System.out.println("Average performance of small buffer: " + (smallBufferTotalReadTime / 100));
System.out.println("Average performance of big buffer: " + (bigBufferTotalReadTime / 100));
结果: 小缓冲区的平均值 - 35ms 大缓冲区的平均值 - 40ms? (尝试在Linux和Windows上,在这两种情况下,更大的块大小会导致更长的读取时间,为什么?)
经过多次这个测试之后,我意识到,由于一些神奇的原因,读取一个大块的平均时间比按顺序读取10个较小块的块要长。我认为这可能是Windows过于聪明并试图在其文件系统中优化某些内容的结果,所以我在Linux上运行相同的代码,令我惊讶的是我得到了相同的结果。
我不知道为什么会这样,有人可以给我一个暗示吗?在这种情况下,最好的块大小是什么?
亲切的问候
答案 0 :(得分:1)
第一次读取数据后,数据将位于磁盘缓存中。第二次读取应该快得多。您需要先运行您认为更快的测试。 ;)
如果你有50 MB的内存,你应该能够一次读取整个文件。
package com.google.code.java.core.files;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
public class FileReadingMain {
public static void main(String... args) throws IOException {
File temp = File.createTempFile("deleteme", "zeros");
FileOutputStream fos = new FileOutputStream(temp);
fos.write(new byte[50 * 1024 * 1024]);
fos.close();
for (int i = 0; i < 3; i++)
for (int blockSize = 1024 * 1024; blockSize >= 512; blockSize /= 2) {
readFileNIO(temp, blockSize);
readFile(temp, blockSize);
}
}
private static void readFile(File temp, int blockSize) throws IOException {
long start = System.nanoTime();
byte[] bytes = new byte[blockSize];
int r;
for (r = 0; System.nanoTime() - start < 2e9; r++) {
FileInputStream fis = new FileInputStream(temp);
while (fis.read(bytes) > 0) ;
fis.close();
}
long time = System.nanoTime() - start;
System.out.printf("IO: Reading took %.3f ms using %,d byte blocks%n", time / r / 1e6, blockSize);
}
private static void readFileNIO(File temp, int blockSize) throws IOException {
long start = System.nanoTime();
ByteBuffer bytes = ByteBuffer.allocateDirect(blockSize);
int r;
for (r = 0; System.nanoTime() - start < 2e9; r++) {
FileChannel fc = new FileInputStream(temp).getChannel();
while (fc.read(bytes) > 0) {
bytes.clear();
}
fc.close();
}
long time = System.nanoTime() - start;
System.out.printf("NIO: Reading took %.3f ms using %,d byte blocks%n", time / r / 1e6, blockSize);
}
}
在我的笔记本电脑上打印
NIO: Reading took 57.255 ms using 1,048,576 byte blocks
IO: Reading took 112.943 ms using 1,048,576 byte blocks
NIO: Reading took 48.860 ms using 524,288 byte blocks
IO: Reading took 78.002 ms using 524,288 byte blocks
NIO: Reading took 41.474 ms using 262,144 byte blocks
IO: Reading took 61.744 ms using 262,144 byte blocks
NIO: Reading took 41.336 ms using 131,072 byte blocks
IO: Reading took 56.264 ms using 131,072 byte blocks
NIO: Reading took 42.184 ms using 65,536 byte blocks
IO: Reading took 64.700 ms using 65,536 byte blocks
NIO: Reading took 41.595 ms using 32,768 byte blocks <= fastest for NIO
IO: Reading took 49.385 ms using 32,768 byte blocks <= fastest for IO
NIO: Reading took 49.676 ms using 16,384 byte blocks
IO: Reading took 59.731 ms using 16,384 byte blocks
NIO: Reading took 55.596 ms using 8,192 byte blocks
IO: Reading took 74.191 ms using 8,192 byte blocks
NIO: Reading took 77.148 ms using 4,096 byte blocks
IO: Reading took 84.943 ms using 4,096 byte blocks
NIO: Reading took 104.242 ms using 2,048 byte blocks
IO: Reading took 112.768 ms using 2,048 byte blocks
NIO: Reading took 177.214 ms using 1,024 byte blocks
IO: Reading took 185.006 ms using 1,024 byte blocks
NIO: Reading took 303.164 ms using 512 byte blocks
IO: Reading took 316.487 ms using 512 byte blocks
似乎最佳读取大小可能是32KB。注意:由于文件完全在磁盘缓存中,因此从磁盘读取的文件可能不是最佳大小。
答案 1 :(得分:1)
如上所述,通过为每个数据读取相同的数据,您的测试无可救药。
我可以开始,但你可能会更多地阅读this article,,然后查看this example如何使用FileChannel。