如果我将文件中的字节读入字节[],我发现当数组大约为1 MB而不是128 KB时,FileInputStream的性能会更差。在我测试的2个工作站上,它几乎是128 KB的两倍。那是为什么?
import java.io.*;
public class ReadFileInChuncks
{
public static void main(String[] args) throws IOException
{
byte[] buffer1 = new byte[1024*128];
byte[] buffer2 = new byte[1024*1024];
String path = "some 1 gb big file";
readFileInChuncks(path, buffer1, false);
readFileInChuncks(path, buffer1, true);
readFileInChuncks(path, buffer2, true);
readFileInChuncks(path, buffer1, true);
readFileInChuncks(path, buffer2, true);
}
public static void readFileInChuncks(String path, byte[] buffer, boolean report) throws IOException
{
long t = System.currentTimeMillis();
InputStream is = new FileInputStream(path);
while ((readToArray(is, buffer)) != 0) {}
if (report)
System.out.println((System.currentTimeMillis()-t) + " ms");
}
public static int readToArray(InputStream is, byte[] buffer) throws IOException
{
int index = 0;
while (index != buffer.length)
{
int read = is.read(buffer, index, buffer.length - index);
if (read == -1)
break;
index += read;
}
return index;
}
}
输出
422 ms
717 ms
422 ms
718 ms
请注意,这是对已发布问题的重新定义。另一方受到无关讨论的污染。我会将另一个标记为删除。
编辑:重复,真的吗?我确定可以制作一些更好的代码来证明我的观点,但this没有回答我的问题
Edit2:我在每个5 KB到1000 KB之间的缓冲区运行测试 Win7 / JRE 1.8.0_25和糟糕的性能从精确的508 KB开始,随后都是。对不起的图表军团很抱歉,x是缓冲区大小,y是毫秒
答案 0 :(得分:8)
TL; DR 性能下降是由内存分配引起的,而不是由文件读取问题引起的。
典型的基准测试问题:您对一件事情进行基准测试,但实际上衡量另一件事情。
首先,当我使用RandomAccessFile
,FileChannel
和ByteBuffer.allocateDirect
重写代码时,阈值已经消失。对于128K和1M缓冲区,文件读取性能大致相同。
与直接ByteBuffer不同,I / O FileInputStream.read
无法将数据直接加载到Java字节数组中。它需要先将数据放入某个本机缓冲区,然后使用JNI SetByteArrayRegion
函数将其复制到Java。
因此,我们必须查看FileInputStream.read
的本机实现。它归结为io_util.c中的以下代码:
if (len == 0) {
return 0;
} else if (len > BUF_SIZE) {
buf = malloc(len);
if (buf == NULL) {
JNU_ThrowOutOfMemoryError(env, NULL);
return 0;
}
} else {
buf = stackBuf;
}
此处BUF_SIZE == 8192.如果缓冲区大于此保留堆栈区域,则malloc
分配临时缓冲区。在Windows上malloc
通常通过HeapAlloc WINAPI调用实现。
接下来,我在没有文件I / O的情况下单独测量了HeapAlloc
+ HeapFree
个调用的性能。结果很有意思:
128K: 5 μs
256K: 10 μs
384K: 15 μs
512K: 20 μs
640K: 25 μs
768K: 29 μs
896K: 33 μs
1024K: 316 μs <-- almost 10x leap
1152K: 356 μs
1280K: 399 μs
1408K: 436 μs
1536K: 474 μs
1664K: 511 μs
1792K: 553 μs
1920K: 592 μs
2048K: 628 μs
正如您所看到的,OS内存分配的性能在1MB边界处急剧变化。这可以通过用于小块和大块的不同分配算法来解释。
<强>更新强>
HeapCreate的文档证实了关于大于1MB的块的特定分配策略的想法(参见 dwMaximumSize 描述)。
此外,可以从堆分配的最大内存块对于32位进程略小于512 KB,对于64位进程略小于1,024 KB。
...
分配大于固定大小堆限制的内存块的请求不会自动失败;相反,系统调用VirtualAlloc函数来获取大块所需的内存。
答案 1 :(得分:1)
最佳缓冲区大小会影响文件系统块大小,CPU缓存大小和缓存延迟。大多数操作系统使用块大小4096或8192,因此建议使用具有此大小或此值的多重性的缓冲区。
答案 2 :(得分:0)
我重写了测试以测试不同大小的缓冲区。
这是新代码:
public class ReadFileInChunks {
public static void main(String[] args) throws IOException {
String path = "C:\\\\tmp\\1GB.zip";
readFileInChuncks(path, new byte[1024 * 128], false);
for (int i = 1; i <= 1024; i+=10) {
readFileInChuncks(path, new byte[1024 * i], true);
}
}
public static void readFileInChuncks(String path, byte[] buffer, boolean report) throws IOException {
long t = System.currentTimeMillis();
InputStream is = new FileInputStream(path);
while ((readToArray(is, buffer)) != 0) {
}
if (report) {
System.out.println("buffer size = " + buffer.length/1024 + "kB , duration = " + (System.currentTimeMillis() - t) + " ms");
}
}
public static int readToArray(InputStream is, byte[] buffer) throws IOException {
int index = 0;
while (index != buffer.length) {
int read = is.read(buffer, index, buffer.length - index);
if (read == -1) {
break;
}
index += read;
}
return index;
}
}
以下是结果......
buffer size = 121kB , duration = 320 ms
buffer size = 131kB , duration = 330 ms
buffer size = 141kB , duration = 330 ms
buffer size = 151kB , duration = 323 ms
buffer size = 161kB , duration = 320 ms
buffer size = 171kB , duration = 320 ms
buffer size = 181kB , duration = 320 ms
buffer size = 191kB , duration = 310 ms
buffer size = 201kB , duration = 320 ms
buffer size = 211kB , duration = 310 ms
buffer size = 221kB , duration = 310 ms
buffer size = 231kB , duration = 310 ms
buffer size = 241kB , duration = 310 ms
buffer size = 251kB , duration = 310 ms
buffer size = 261kB , duration = 320 ms
buffer size = 271kB , duration = 310 ms
buffer size = 281kB , duration = 320 ms
buffer size = 291kB , duration = 310 ms
buffer size = 301kB , duration = 319 ms
buffer size = 311kB , duration = 320 ms
buffer size = 321kB , duration = 310 ms
buffer size = 331kB , duration = 320 ms
buffer size = 341kB , duration = 310 ms
buffer size = 351kB , duration = 320 ms
buffer size = 361kB , duration = 310 ms
buffer size = 371kB , duration = 320 ms
buffer size = 381kB , duration = 311 ms
buffer size = 391kB , duration = 310 ms
buffer size = 401kB , duration = 310 ms
buffer size = 411kB , duration = 320 ms
buffer size = 421kB , duration = 310 ms
buffer size = 431kB , duration = 310 ms
buffer size = 441kB , duration = 310 ms
buffer size = 451kB , duration = 320 ms
buffer size = 461kB , duration = 310 ms
buffer size = 471kB , duration = 310 ms
buffer size = 481kB , duration = 310 ms
buffer size = 491kB , duration = 310 ms
buffer size = 501kB , duration = 310 ms
buffer size = 511kB , duration = 320 ms
buffer size = 521kB , duration = 300 ms
buffer size = 531kB , duration = 310 ms
buffer size = 541kB , duration = 312 ms
buffer size = 551kB , duration = 311 ms
buffer size = 561kB , duration = 320 ms
buffer size = 571kB , duration = 310 ms
buffer size = 581kB , duration = 314 ms
buffer size = 591kB , duration = 320 ms
buffer size = 601kB , duration = 310 ms
buffer size = 611kB , duration = 310 ms
buffer size = 621kB , duration = 310 ms
buffer size = 631kB , duration = 310 ms
buffer size = 641kB , duration = 310 ms
buffer size = 651kB , duration = 310 ms
buffer size = 661kB , duration = 301 ms
buffer size = 671kB , duration = 310 ms
buffer size = 681kB , duration = 310 ms
buffer size = 691kB , duration = 310 ms
buffer size = 701kB , duration = 310 ms
buffer size = 711kB , duration = 300 ms
buffer size = 721kB , duration = 310 ms
buffer size = 731kB , duration = 310 ms
buffer size = 741kB , duration = 310 ms
buffer size = 751kB , duration = 310 ms
buffer size = 761kB , duration = 311 ms
buffer size = 771kB , duration = 310 ms
buffer size = 781kB , duration = 300 ms
buffer size = 791kB , duration = 300 ms
buffer size = 801kB , duration = 310 ms
buffer size = 811kB , duration = 310 ms
buffer size = 821kB , duration = 300 ms
buffer size = 831kB , duration = 310 ms
buffer size = 841kB , duration = 310 ms
buffer size = 851kB , duration = 300 ms
buffer size = 861kB , duration = 310 ms
buffer size = 871kB , duration = 310 ms
buffer size = 881kB , duration = 310 ms
buffer size = 891kB , duration = 304 ms
buffer size = 901kB , duration = 310 ms
buffer size = 911kB , duration = 310 ms
buffer size = 921kB , duration = 310 ms
buffer size = 931kB , duration = 299 ms
buffer size = 941kB , duration = 321 ms
buffer size = 951kB , duration = 310 ms
buffer size = 961kB , duration = 310 ms
buffer size = 971kB , duration = 310 ms
buffer size = 981kB , duration = 310 ms
buffer size = 991kB , duration = 295 ms
buffer size = 1001kB , duration = 339 ms
buffer size = 1011kB , duration = 302 ms
buffer size = 1021kB , duration = 610 ms
看起来某种阈值在大约1021kB大小的缓冲区中被击中。深入研究这一点,我看到......
buffer size = 1017kB , duration = 310 ms
buffer size = 1018kB , duration = 310 ms
buffer size = 1019kB , duration = 602 ms
buffer size = 1020kB , duration = 600 ms
因此,当达到此阈值时,看起来会出现某种加倍效应。我最初的想法是readToArray while循环的循环次数是达到阈值时的两倍,但事实并非如此,while循环只进行一次迭代,无论是300ms运行还是600ms运行。因此,让我们看看实际实现的io_utils.c
实现从磁盘读取数据的一些线索。
jint
readBytes(JNIEnv *env, jobject this, jbyteArray bytes,
jint off, jint len, jfieldID fid)
{
jint nread;
char stackBuf[BUF_SIZE];
char *buf = NULL;
FD fd;
if (IS_NULL(bytes)) {
JNU_ThrowNullPointerException(env, NULL);
return -1;
}
if (outOfBounds(env, off, len, bytes)) {
JNU_ThrowByName(env, "java/lang/IndexOutOfBoundsException", NULL);
return -1;
}
if (len == 0) {
return 0;
} else if (len > BUF_SIZE) {
buf = malloc(len);
if (buf == NULL) {
JNU_ThrowOutOfMemoryError(env, NULL);
return 0;
}
} else {
buf = stackBuf;
}
fd = GET_FD(this, fid);
if (fd == -1) {
JNU_ThrowIOException(env, "Stream Closed");
nread = -1;
} else {
nread = (jint)IO_Read(fd, buf, len);
if (nread > 0) {
(*env)->SetByteArrayRegion(env, bytes, off, nread, (jbyte *)buf);
} else if (nread == JVM_IO_ERR) {
JNU_ThrowIOExceptionWithLastError(env, "Read error");
} else if (nread == JVM_IO_INTR) {
JNU_ThrowByName(env, "java/io/InterruptedIOException", NULL);
} else { /* EOF */
nread = -1;
}
}
if (buf != stackBuf) {
free(buf);
}
return nread;
}
需要注意的一点是,BUF_SIZE设置为8192.倍增效果发生在上面。所以下一个罪魁祸首是IO_Read
方法。
windows/native/java/io/io_util_md.h:#define IO_Read handleRead
所以我们去了handleRead方法。
windows/native/java/io/io_util_md.c:handleRead(jlong fd, void *buf, jint len)
此方法将请求交给一个名为ReadFile的方法。
JNIEXPORT
size_t
handleRead(jlong fd, void *buf, jint len)
{
DWORD read = 0;
BOOL result = 0;
HANDLE h = (HANDLE)fd;
if (h == INVALID_HANDLE_VALUE) {
return -1;
}
result = ReadFile(h, /* File handle to read */
buf, /* address to put data */
len, /* number of bytes to read */
&read, /* number of bytes read */
NULL); /* no overlapped struct */
if (result == 0) {
int error = GetLastError();
if (error == ERROR_BROKEN_PIPE) {
return 0; /* EOF */
}
return -1;
}
return read;
}
现在,这就是小道冷落的地方。如果我找到ReadFile的代码,我会看一下并发回。
答案 3 :(得分:-1)
这可能是因为cpu cache,
cpu有自己的缓存内存,并且有一些修复大小可以通过在cmd上执行此命令来检查cpu缓存大小
wmic cpu get L2CacheSize
假设您有256k作为cpu缓存大小, 所以会发生什么是如果您读取256k或更小的块,当读取访问缓冲区时,写入缓冲区的内容仍然在CPU缓存中。如果你有超过256k的块,那么读取的最后256k是在CPU缓存中,所以当从头开始读取时,必须从主存中检索内容。