Java ByteBuffer性能问题

时间:2011-10-12 10:35:38

标签: java performance nio bytebuffer

在处理多个千兆字节文件时,我注意到一些奇怪的事情:似乎从使用filechannel的文件读取到使用allocateDirect分配的重用ByteBuffer对象比从MappedByteBuffer读取要慢得多,实际上它甚至比读取速度慢使用常规读取调用进入字节数组!

我期望它(几乎)与从mappedbytebuffers读取一样快,因为我的ByteBuffer被分配了allocateDirect,因此读取应该直接在我的bytebuffer中结束而没有任何中间副本。

我现在的问题是:我做错了什么?或者bytebuffer + filechannel是否比普通的io / mmap慢?

我在下面的示例代码中还添加了一些代码,将读取的内容转换为long值,因为这是我的实际代码不断执行的操作。我希望ByteBuffer getLong()方法比我自己的字节shuffeler快得多。

试验结果: mmap:3.828 bytebuffer:55.097 常规i / o:38.175

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.MappedByteBuffer;

class testbb {
    static final int size = 536870904, n = size / 24;

    static public long byteArrayToLong(byte [] in, int offset) {
        return ((((((((long)(in[offset + 0] & 0xff) << 8) | (long)(in[offset + 1] & 0xff)) << 8 | (long)(in[offset + 2] & 0xff)) << 8 | (long)(in[offset + 3] & 0xff)) << 8 | (long)(in[offset + 4] & 0xff)) << 8 | (long)(in[offset + 5] & 0xff)) << 8 | (long)(in[offset + 6] & 0xff)) << 8 | (long)(in[offset + 7] & 0xff);
    }

    public static void main(String [] args) throws IOException {
        long start;
        RandomAccessFile fileHandle;
        FileChannel fileChannel;

        // create file
        fileHandle = new RandomAccessFile("file.dat", "rw");
        byte [] buffer = new byte[24];
        for(int index=0; index<n; index++)
            fileHandle.write(buffer);
        fileChannel = fileHandle.getChannel();

        // mmap()
        MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, size);
        byte [] buffer1 = new byte[24];
        start = System.currentTimeMillis();
        for(int index=0; index<n; index++) {
                mbb.position(index * 24);
                mbb.get(buffer1, 0, 24);
                long dummy1 = byteArrayToLong(buffer1, 0);
                long dummy2 = byteArrayToLong(buffer1, 8);
                long dummy3 = byteArrayToLong(buffer1, 16);
        }
        System.out.println("mmap: " + (System.currentTimeMillis() - start) / 1000.0);

        // bytebuffer
        ByteBuffer buffer2 = ByteBuffer.allocateDirect(24);
        start = System.currentTimeMillis();
        for(int index=0; index<n; index++) {
            buffer2.rewind();
            fileChannel.read(buffer2, index * 24);
            buffer2.rewind();   // need to rewind it to be able to use it
            long dummy1 = buffer2.getLong();
            long dummy2 = buffer2.getLong();
            long dummy3 = buffer2.getLong();
        }
        System.out.println("bytebuffer: " + (System.currentTimeMillis() - start) / 1000.0);

        // regular i/o
        byte [] buffer3 = new byte[24];
        start = System.currentTimeMillis();
        for(int index=0; index<n; index++) {
                fileHandle.seek(index * 24);
                fileHandle.read(buffer3);
                long dummy1 = byteArrayToLong(buffer1, 0);
                long dummy2 = byteArrayToLong(buffer1, 8);
                long dummy3 = byteArrayToLong(buffer1, 16);
        }
        System.out.println("regular i/o: " + (System.currentTimeMillis() - start) / 1000.0);
    }
}

当加载大型部分然后处理它们不是一个选项(我会在整个地方读取数据)我想我应该坚持使用MappedByteBuffer。 谢谢大家的建议。

4 个答案:

答案 0 :(得分:10)

我相信你只是进行微观优化,which might just not matter (www.codinghorror.com)

以下版本中包含较大的缓冲区并删除了多余的seek / setPosition个调用。

  • 当我启用“本机字节排序”时(如果机器使用不同的“endian”约定,这实际上是不安全的):
mmap: 1.358
bytebuffer: 0.922
regular i/o: 1.387
  • 当我注释掉order语句并使用默认的big-endian顺序时:
mmap: 1.336
bytebuffer: 1.62
regular i/o: 1.467
  • 您的原始代码:
mmap: 3.262
bytebuffer: 106.676
regular i/o: 90.903

以下是代码:

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.MappedByteBuffer;

class Testbb2 {
    /** Buffer a whole lot of long values at the same time. */
    static final int BUFFSIZE = 0x800 * 8; // 8192
    static final int DATASIZE = 0x8000 * BUFFSIZE;

    static public long byteArrayToLong(byte [] in, int offset) {
        return ((((((((long)(in[offset + 0] & 0xff) << 8) | (long)(in[offset + 1] & 0xff)) << 8 | (long)(in[offset + 2] & 0xff)) << 8 | (long)(in[offset + 3] & 0xff)) << 8 | (long)(in[offset + 4] & 0xff)) << 8 | (long)(in[offset + 5] & 0xff)) << 8 | (long)(in[offset + 6] & 0xff)) << 8 | (long)(in[offset + 7] & 0xff);
    }

    public static void main(String [] args) throws IOException {
        long start;
        RandomAccessFile fileHandle;
        FileChannel fileChannel;

        // Sanity check - this way the convert-to-long loops don't need extra bookkeeping like BUFFSIZE / 8.
        if ((DATASIZE % BUFFSIZE) > 0 || (DATASIZE % 8) > 0) {
            throw new IllegalStateException("DATASIZE should be a multiple of 8 and BUFFSIZE!");
        }

        int pos;
        int nDone;

        // create file
        File testFile = new File("file.dat");
        fileHandle = new RandomAccessFile("file.dat", "rw");

        if (testFile.exists() && testFile.length() >= DATASIZE) {
            System.out.println("File exists");
        } else {
            testFile.delete();
            System.out.println("Preparing file");
            byte [] buffer = new byte[BUFFSIZE];
            pos = 0;
            nDone = 0;
            while (pos < DATASIZE) {
                fileHandle.write(buffer);
                pos += buffer.length;
            }

            System.out.println("File prepared");
        } 
        fileChannel = fileHandle.getChannel();

        // mmap()
        MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, DATASIZE);
        byte [] buffer1 = new byte[BUFFSIZE];
        mbb.position(0);
        start = System.currentTimeMillis();
        pos = 0;
        while (pos < DATASIZE) {
            mbb.get(buffer1, 0, BUFFSIZE);
            // This assumes BUFFSIZE is a multiple of 8.
            for (int i = 0; i < BUFFSIZE; i += 8) {
                long dummy = byteArrayToLong(buffer1, i);
            }
            pos += BUFFSIZE;
        }
        System.out.println("mmap: " + (System.currentTimeMillis() - start) / 1000.0);

        // bytebuffer
        ByteBuffer buffer2 = ByteBuffer.allocateDirect(BUFFSIZE);
//        buffer2.order(ByteOrder.nativeOrder());
        buffer2.order();
        fileChannel.position(0);
        start = System.currentTimeMillis();
        pos = 0;
        nDone = 0;
        while (pos < DATASIZE) {
            buffer2.rewind();
            fileChannel.read(buffer2);
            buffer2.rewind();   // need to rewind it to be able to use it
            // This assumes BUFFSIZE is a multiple of 8.
            for (int i = 0; i < BUFFSIZE; i += 8) {
                long dummy = buffer2.getLong();
            }
            pos += BUFFSIZE;
        }
        System.out.println("bytebuffer: " + (System.currentTimeMillis() - start) / 1000.0);

        // regular i/o
        fileHandle.seek(0);
        byte [] buffer3 = new byte[BUFFSIZE];
        start = System.currentTimeMillis();
        pos = 0;
        while (pos < DATASIZE && nDone != -1) {
            nDone = 0;
            while (nDone != -1  && nDone < BUFFSIZE) {
                nDone = fileHandle.read(buffer3, nDone, BUFFSIZE - nDone);
            }
            // This assumes BUFFSIZE is a multiple of 8.
            for (int i = 0; i < BUFFSIZE; i += 8) {
                long dummy = byteArrayToLong(buffer3, i);
            }
            pos += nDone;
        }
        System.out.println("regular i/o: " + (System.currentTimeMillis() - start) / 1000.0);
    }
}

答案 1 :(得分:5)

读入直接字节缓冲区的速度更快,但将数据从JVM中取出更慢。直接字节缓冲区适用于您只是复制数据而不在Java代码中实际查看数据的情况。然后它根本不必越过本地&gt; JVM边界,因此它比使用例如一个byte []数组或一个普通的ByteBuffer,其中数据必须在复制过程中两次越过该边界。

答案 2 :(得分:2)

如果循环迭代次数超过10,000次,则可以触发整个方法编译为本机代码。但是,您的后续循环尚未运行,无法进行相同程度的优化。要避免此问题,请将每个循环放在不同的方法中并再次运行。

此外,您可能希望将ByteBuffer的Order设置为order(ByteOrder.nativeOrder()),以避免在执行getLong时所有字节交换并一次读取超过24个字节。 (因为读取非常小的部分会产生更多的系统调用)尝试一次读取32 * 1024字节。

我还在本地字节顺序的MappedByteBuffer上尝试getLong。这可能是最快的。

答案 3 :(得分:0)

MappedByteBuffer始终是最快的,因为操作系统将操作系统级磁盘缓冲区与进程内存空间相关联。相比之下,读入分配的直接缓冲区首先将块加载到OS缓冲区,然后将OS缓冲区的内容复制到分配的进程内缓冲区中。

您的测试代码也会进行大量非常小的(24字节)读取。如果您的实际应用程序执行相同操作,那么您将通过映射文件获得更大的性能提升,因为每个读取都是一个单独的内核调用。您应该通过映射多次看到性能。

至于直接缓冲区比java.io读取的速度慢:你没有给出任何数字,但我希望稍微降级,因为getLong()调用需要越过JNI边界。