Question

我在嵌入式Linux设备上使用Java 1.5，并希望读取具有2MB int值的二进制文件。（现在4字节Big Endian，但我可以决定格式）

使用DataInputStream BufferedInputStream使用dis.readInt()，这些500 000个调用需要17秒才能读取，但读入一个大字节缓冲区的文件需要5秒钟。

如何更快地将该文件读入一个巨大的int []？

阅读过程不应超过512 kb。

以下使用nio的代码并不比java io的readInt（）方法快。

    // asume I already know that there are now 500 000 int to read:
    int numInts = 500000;
    // here I want the result into
    int[] result = new int[numInts];
    int cnt = 0;

    RandomAccessFile aFile = new RandomAccessFile("filename", "r");
    FileChannel inChannel = aFile.getChannel();

    ByteBuffer buf = ByteBuffer.allocate(512 * 1024);

    int bytesRead = inChannel.read(buf); //read into buffer.

    while (bytesRead != -1) {

      buf.flip();  //make buffer ready for get()

      while(buf.hasRemaining() && cnt < numInts){
       // probably slow here since called 500 000 times
          result[cnt] = buf.getInt();
          cnt++;
      }

      buf.clear(); //make buffer ready for writing
      bytesRead = inChannel.read(buf);
    }


    aFile.close();
    inChannel.close();

更新：评估答案：

在PC上，使用IntBuffer方法的Memory Map是我设置中最快的在嵌入式设备上，没有jit，java.io DataiInputStream.readInt（）有点快（17s，与使用IntBuffer的MemMap相比为20s）

最终结论：通过算法更改可以更轻松地实现显着的加速。（初始化文件较小）

Answer 1

我不知道这是否会比亚历山大提供的更快，但你可以尝试映射文件。

    try (FileInputStream stream = new FileInputStream(filename)) {
        FileChannel inChannel = stream.getChannel();

        ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
        int[] result = new int[500000];

        buffer.order( ByteOrder.BIG_ENDIAN );
        IntBuffer intBuffer = buffer.asIntBuffer( );
        intBuffer.get(result);
    }

Answer 2

您可以使用nio package中的IntBuffer - ＆gt; http://docs.oracle.com/javase/6/docs/api/java/nio/IntBuffer.html

int[] intArray = new int[ 5000000 ];

IntBuffer intBuffer = IntBuffer.wrap( intArray );

...

通过拨打inChannel.read(intBuffer)来填写缓冲区。

缓冲区已满后，intArray将包含500000个整数。

修改

意识到频道仅支持ByteBuffer。

// asume I already know that there are now 500 000 int to read: int numInts = 500000; // here I want the result into int[] result = new int[numInts]; // 4 bytes per int, direct buffer ByteBuffer buf = ByteBuffer.allocateDirect( numInts * 4 ); // BIG_ENDIAN byte order buf.order( ByteOrder.BIG_ENDIAN ); // Fill in the buffer while ( buf.hasRemaining( ) ) { // Per EJP's suggestion check EOF condition if( inChannel.read( buf ) == -1 ) { // Hit EOF throw new EOFException( ); } } buf.flip( ); // Create IntBuffer view IntBuffer intBuffer = buf.asIntBuffer( ); // result will now contain all ints read from file intBuffer.get( result );

Answer 3

我使用序列化/反序列化，DataInputStream和ObjectInputStream进行了相当仔细的实验，两者都基于ByteArrayInputStream以避免IO效应。对于一百万个整数，readObject约为20毫秒，readInt约为116.百万个int数组的序列化开销为27个字节。这是2013年的MacBook Pro。

话虽如此，对象序列化有点邪恶，你必须用Java程序写出数据。

从二进制文件中读取大量int的最快方法

3 个答案: