奇怪的方法调用优化问题

时间:2014-06-18 08:20:50

标签: java performance optimization

我一直在测试太慢的DataInputStream.readByte()方法工作的问题,并发现了有趣但不可理解的问题。我正在使用jdk1.7.0_40Windows 7 64 bit

考虑我们有一些巨大的字节数组并从中读取数据。让我们比较4种从这个数组中逐字节读取的方法:

  1. 通过简单循环阅读
  2. 通过ByteArrayInputStream阅读 - > DataInputStream
  3. 通过ByteArrayInputStream阅读 - >我们自己的DataInputStream实施(MyDataInputStream
  4. 通过ByteArrayInputStream阅读readByte()方法DataInputStream的副本。
  5. 我发现了以下结果(经过长时间的测试循环迭代):

    • Loop采取了aprox。 312446094 ns
    • DataInputStream服了天花。 2555898090 ns
    • MyDataInputStream采取了aprox。 2630664298 ns
    • 通过方法readByte()复制了309265568 ns

    换句话说,我们有奇怪的优化问题:通过对象方法调用执行的相同操作需要花费10倍的工作时间,然后通过“本机”实现。

    问题:为什么?

    有关信息:

    @Test
    public void testBytes1() throws IOException {
        byte[] bytes = new byte[1_000_000_000];
        Random r = new Random();
        for (int i = 0; i < bytes.length; i++)
            bytes[i] = (byte) r.nextInt();
    
        do {
            System.out.println();
    
            bytes[r.nextInt(1_000_000_000)] = (byte) r.nextInt();
    
            testLoop(bytes);
            testDis(bytes);
            testMyDis(bytes);
            testViaMethod(bytes);
        } while (true);
    }
    
    private void testDis(byte[] bytes) throws IOException {
        long time1 = System.nanoTime();
        long c = 0;
        try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
             DataInputStream dis = new DataInputStream(bais)) {
            for (int i = 0; i < bytes.length; i++) {
                c += dis.readByte();
            }
        }
        long time2 = System.nanoTime();
        System.out.println("Dis: \t\t\t\t" + (time2 - time1) + "\t\t\t\t" + c);
    }
    
    private void testMyDis(byte[] bytes) throws IOException {
        long time1 = System.nanoTime();
        long c = 0;
        try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
             MyDataInputStream dis = new MyDataInputStream(bais)) {
            for (int i = 0; i < bytes.length; i++) {
                c += dis.readByte();
            }
        }
        long time2 = System.nanoTime();
        System.out.println("My Dis: \t\t\t" + (time2 - time1) + "\t\t\t\t" + c);
    }
    
    private void testViaMethod(byte[] bytes) throws IOException {
        long time1 = System.nanoTime();
        long c = 0;
        try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes)
        ) {
            for (int i = 0; i < bytes.length; i++) {
                c += readByte(bais);
            }
        }
        long time2 = System.nanoTime();
        System.out.println("Via method: \t\t" + (time2 - time1) + "\t\t\t\t" + c);
    }
    
    private void testLoop(byte[] bytes) {
        long time1 = System.nanoTime();
        long c = 0;
        for (int i = 0; i < bytes.length; i++) {
            c += bytes[i];
        }
        long time2 = System.nanoTime();
        System.out.println("Loop: \t\t\t\t" + (time2 - time1) + "\t\t\t\t" + c);
    }
    
    public final byte readByte(InputStream in) throws IOException {
        int ch = in.read();
        if (ch < 0)
            throw new EOFException();
        return (byte)(ch);
    }
    
    static class MyDataInputStream implements Closeable {
    
        InputStream in;
    
        MyDataInputStream(InputStream in) {
            this.in = in;
        }
    
        public final byte readByte() throws IOException {
            int ch = in.read();
            if (ch < 0)
                throw new EOFException();
            return (byte)(ch);
        }
    
        @Override
        public void close() throws IOException {
            in.close();
        }
    }
    

    P.S。更新表示对我的结果有疑问的对象,这是打印输出,使用-XX:+PrintCompilation -verbose:gc -XX:CICompilerCount=1

         37    1             java.lang.String::hashCode (55 bytes)
         41    2             java.lang.String::charAt (29 bytes)
         43    3             java.lang.String::indexOf (70 bytes)
         49    4             java.lang.AbstractStringBuilder::ensureCapacityInternal (16 bytes)
         52    5             java.lang.AbstractStringBuilder::append (29 bytes)
        237    6             java.util.Random::nextInt (7 bytes)
        237    9     n       sun.misc.Unsafe::compareAndSwapLong (native)   
        238    7             java.util.concurrent.atomic.AtomicLong::get (5 bytes)
        238    8             java.util.concurrent.atomic.AtomicLong::compareAndSet (13 bytes)
        239   10             java.util.Random::next (47 bytes)
        239   11 %           fias.TestArrays::testBytes1 @ 15 (77 bytes)
       9645   11 %           fias.TestArrays::testBytes1 @ -2 (77 bytes)   made not entrant
    
       9646   12 %           fias.TestArrays::testLoop @ 10 (77 bytes)
       9964   12 %           fias.TestArrays::testLoop @ -2 (77 bytes)   made not entrant
    Loop:               318726397               -500090432
       9965   13             java.io.DataInputStream::readByte (23 bytes)
       9966   14  s          java.io.ByteArrayInputStream::read (36 bytes)
       9967   15 % !         fias.TestArrays::testDis @ 37 (279 bytes)
    Dis:                2684374258              -500090432
      12651   16             fias.TestArrays$MyDataInputStream::readByte (23 bytes)
      12652   17 % !         fias.TestArrays::testMyDis @ 37 (279 bytes)
    My Dis:             2675570541              -500090432
      15327   18             fias.TestArrays::readByte (20 bytes)
      15328   19 % !         fias.TestArrays::testViaMethod @ 23 (179 bytes)
    Via method:         2367507141              -500090432
    
      17694   20             fias.TestArrays::testLoop (77 bytes)
      17699   21 %           fias.TestArrays::testLoop @ 10 (77 bytes)
    Loop:               374525891               -500090567
      18069   22   !         fias.TestArrays::testDis (279 bytes)
    Dis:                2674626125              -500090567
      20745   23   !         fias.TestArrays::testMyDis (279 bytes)
    My Dis:             2671418683              -500090567
      23417   24   !         fias.TestArrays::testViaMethod (179 bytes)
    Via method:         2359181776              -500090567
    
    Loop:               315081855               -500090663
    Dis:                2558738649              -500090663
    My Dis:             2627056034              -500090663
    Via method:         311692727               -500090663
    
    Loop:               317813286               -500090778
    Dis:                2565161726              -500090778
    My Dis:             2630665760              -500090778
    Via method:         314594434               -500090778
    
    Loop:               313695660               -500090797
    Dis:                2568251556              -500090797
    My Dis:             2635236578              -500090797
    Via method:         311882312               -500090797
    
    Loop:               316781686               -500090929
    Dis:                2563535623              -500090929
    My Dis:             2638487613              -500090929
    Via method:         313170789               -500090929
    

    UPD-2 :@maaartinus提供了benchmarkresults

2 个答案:

答案 0 :(得分:3)

令人惊讶的是,理由是在MyDataInputStream / DataInputStream

上尝试使用资源声明

如果我们在try块中移动初始化,性能就像循环/方法调用

private void testMyDis(byte[] bytes) throws IOException {
    final long time1 = System.nanoTime();
    long c = 0;
    try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes)) {
        final MyDataInputStream dis = new MyDataInputStream(bais);
        for (int i = 0; i < bytes.length; i++) {
            c += dis.readByte();
        }
    }
    final long time2 = System.nanoTime();
    System.out.println("My Dis: \t\t\t" + (time2 - time1) + "\t\t\t\t" + c);
}

我认为有了这个不必要的资源,JIT就无法使用Range Check Elimination

答案 1 :(得分:-1)

答案一直在测试中。额外的成本归功于函数调用。通常我们鼓励编写简短而干净的函数而不是长函数,并且考虑函数调用的成本非常低。但调用成本仍然大于直接内存访问。

在这种情况下,对于testloop,我们可以估计内存读取成本~3 ns(包括整数运算,例如i ++,c +) 对于其他人来说,有2个额外的函数调用层。每个函数调用约为15 ns。现实我们可以说函数调用非常快。

唯一的一点是每个进程中有2 000 000 000个函数调用,这真的是一个很大的数字。

还有另一个测试用例来证明函数调用成本:不使用任何流,只需通过附加函数调用读取字节:

添加以下功能,

public final long getByte( long c, byte value, int dep ) {
    if ( dep > 0 ) {
        return getByte( c, value, dep - 1);
    }
    return c + value;
}

然后在testLoop中调用,如:

c = getByte( c, bytes[i], 2);

然后最终成本增加到同一水平:

循环:4044010718 -499870245

Dis:5182272442 -499870245

我的消息:5228065271 -499870245

通过方法:655108198 -499870245