为什么内联Math.max会使代码速度超过200倍?

时间:2016-07-19 19:34:42

标签: java performance

我最近开始对一些Java代码进行基准测试,以便为我的程序获得最佳性能结果,并注意到一些奇怪的事情。也就是说,我已经对以下方法进行了基准测试:

private static final int n = 10000;

public static void test0(){
    int m = 0;

    for(int i = 0; i < n; ++i){
        m = Math.max(i, m);
    }
}

public static void test1(){
    int m = 0;

    for(int i = 0; i < n; ++i){
        m = ((i >= m) ? i : m);
    }
}

得到了那些结果:

          | Test 0          | Test 1          | 
----------+-----------------+-----------------+-
Average:  | 51,77 ns        | 13956,63 ns     | 
Best:     | 0,00 ns         | 6514,00 ns      | 
Worst:    | 25,45 ms        | 60,50 ms        | 
Tries:    | 16971233        | 16971233        | 

搜索完SO(即Is Math.max(a,b) or (a>b)?a:b faster in Java?)后,我确信test1不应该慢得多。

这些方法在30秒内在8个线程上随机测试,我运行的每个基准测试似乎都相似。 我使用jdk1.8.0_45

那么,为什么test1test0慢200多倍?

4 个答案:

答案 0 :(得分:3)

由于Math.max是一个静态函数,编译器可能会发现代码什么都不做,只是通过不执行它来优化执行!

变量m是函数的本地变量,分配它并没有帮助,因为它永远不会被读取。

您需要确保执行以某种方式修改某些内容,以便编译器不会对其进行积极优化。

例如,您可以在测试结束时简单地打印m的值,或者使m成为稍后可以访问的类变量,或者甚至按照我最初的建议对结果求和评论。

答案 1 :(得分:2)

Math.max(a,b)可以非常具有攻击性/明显优化为处理器的本机指令。

对于三元组,处理器指令的简单转换将是比较+跳转,尤其是跳转成本很高。

要将三元组优化到JIT(及时编译器),必须认识到代码表达了最大值,本机指令是最好的。

JIT可能最终会认识到这一点,但在此之前它会更慢。

答案 2 :(得分:1)

只是为了确认(在某种程度上)Jean Logeart在他的回答中所说的内容:添加一个调用两种方法的普通main时,如

class MaxOpt
{
    public static void main(String args[])
    {
        for (int i=0; i<100000; i++)
        {
            runTests();
        }
    }

    private static void runTests()
    {
        test0();
        test1();
    }

    private static final int n = 10000;

    public static void test0(){
        int m = 0;

        for(int i = 0; i < n; ++i){
            m = Math.max(i, m);
        }
    }

    public static void test1(){
        int m = 0;

        for(int i = 0; i < n; ++i){
            m = ((i >= m) ? i : m);
        }
    }
}

使用

在热点反汇编程序VM中运行它
-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintInlining -XX:+PrintAssembly

然后(在带有Java 1.8.0_92的Win7 / 64上)test0方法的最终机器代码将是

Decoding compiled method 0x0000000002925510:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x000000001bd10458} &apos;test0&apos; &apos;()V&apos; in &apos;MaxOpt&apos;
  #           [sp+0x20]  (sp of caller)
  0x0000000002925640: sub    $0x18,%rsp
  0x0000000002925647: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                        ; - MaxOpt::test0@-1 (line 20)

  0x000000000292564c: add    $0x10,%rsp
  0x0000000002925650: pop    %rbp
  0x0000000002925651: test   %eax,-0x26f5657(%rip)        # 0x0000000000230000
                        ;   {poll_return}
  0x0000000002925657: retq   
  0x0000000002925658: hlt    
  0x0000000002925659: hlt    
  0x000000000292565a: hlt    
  0x000000000292565b: hlt    
  0x000000000292565c: hlt    
  0x000000000292565d: hlt    
  0x000000000292565e: hlt    
  0x000000000292565f: hlt    
[Exception Handler]
  ...   

是的,它基本上什么也没做。

令人惊讶的是,对于test1,JIT显然会进行一些奇怪的循环展开,但似乎检测到该方法无用且无副作用(可以被优化掉,但事实并非如此)

结果组装相当庞大....:

Decoding compiled method 0x0000000002926290:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x000000001bd103a8} &apos;runTests&apos; &apos;()V&apos; in &apos;MaxOpt&apos;
  #           [sp+0x30]  (sp of caller)
  0x00000000029263c0: mov    %eax,-0x6000(%rsp)
  0x00000000029263c7: push   %rbp
  0x00000000029263c8: sub    $0x20,%rsp         ;*synchronization entry
                        ; - MaxOpt::runTests@-1 (line 13)

  0x00000000029263cc: xor    %r8d,%r8d
  0x00000000029263cf: mov    $0x1,%r11d
  0x00000000029263d5: data32 data32 nopw 0x0(%rax,%rax,1)
                        ;*iload_1
                        ; - MaxOpt::test1@11 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029263e0: cmp    %r8d,%r11d
  0x00000000029263e3: jl     0x000000000292650c  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029263e9: mov    %r11d,%r8d
  0x00000000029263ec: inc    %r8d               ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029263ef: cmp    %r11d,%r8d
  0x00000000029263f2: jl     0x000000000292652d  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029263f8: mov    %r11d,%r9d
  0x00000000029263fb: add    $0x2,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029263ff: cmp    %r8d,%r9d
  0x0000000002926402: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926408: mov    %r11d,%r8d
  0x000000000292640b: add    $0x3,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292640f: cmp    %r9d,%r8d
  0x0000000002926412: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926418: mov    %r11d,%r9d
  0x000000000292641b: add    $0x4,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292641f: cmp    %r8d,%r9d
  0x0000000002926422: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926428: mov    %r11d,%r8d
  0x000000000292642b: add    $0x5,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292642f: cmp    %r9d,%r8d
  0x0000000002926432: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926438: mov    %r11d,%r9d
  0x000000000292643b: add    $0x6,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292643f: cmp    %r8d,%r9d
  0x0000000002926442: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926448: mov    %r11d,%r8d
  0x000000000292644b: add    $0x7,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292644f: cmp    %r9d,%r8d
  0x0000000002926452: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926458: mov    %r11d,%r9d
  0x000000000292645b: add    $0x8,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292645f: cmp    %r8d,%r9d
  0x0000000002926462: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926468: mov    %r11d,%r8d
  0x000000000292646b: add    $0x9,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292646f: cmp    %r9d,%r8d
  0x0000000002926472: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926478: mov    %r11d,%r9d
  0x000000000292647b: add    $0xa,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292647f: cmp    %r8d,%r9d
  0x0000000002926482: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926488: mov    %r11d,%r8d
  0x000000000292648b: add    $0xb,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292648f: cmp    %r9d,%r8d
  0x0000000002926492: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926498: mov    %r11d,%r9d
  0x000000000292649b: add    $0xc,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292649f: cmp    %r8d,%r9d
  0x00000000029264a2: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264a4: mov    %r11d,%r8d
  0x00000000029264a7: add    $0xd,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264ab: cmp    %r9d,%r8d
  0x00000000029264ae: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264b0: mov    %r11d,%r9d
  0x00000000029264b3: add    $0xe,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264b7: cmp    %r8d,%r9d
  0x00000000029264ba: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264bc: mov    %r11d,%r8d
  0x00000000029264bf: add    $0xf,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264c3: cmp    %r9d,%r8d
  0x00000000029264c6: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264c8: add    $0x10,%r11d        ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264cc: cmp    $0x2701,%r11d
  0x00000000029264d3: jl     0x00000000029263e0  ;*if_icmpge
                        ; - MaxOpt::test1@8 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264d9: cmp    $0x2710,%r11d
  0x00000000029264e0: jge    0x0000000002926500
  0x00000000029264e2: xchg   %ax,%ax            ;*iload_1
                        ; - MaxOpt::test1@11 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264e4: cmp    %r8d,%r11d
  0x00000000029264e7: jl     0x0000000002926532  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264e9: mov    %r11d,%r10d
  0x00000000029264ec: inc    %r10d              ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264ef: cmp    $0x2710,%r10d
  0x00000000029264f6: jge    0x0000000002926500  ;*if_icmpge
                        ; - MaxOpt::test1@8 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264f8: mov    %r11d,%r8d
  0x00000000029264fb: mov    %r10d,%r11d
  0x00000000029264fe: jmp    0x00000000029264e4
  0x0000000002926500: add    $0x20,%rsp
  0x0000000002926504: pop    %rbp
  0x0000000002926505: test   %eax,-0x26f650b(%rip)        # 0x0000000000230000
                        ;   {poll_return}
  0x000000000292650b: retq   
  0x000000000292650c: mov    %r11d,%r9d
  0x000000000292650f: mov    %r9d,%r11d
  0x0000000002926512: mov    %r8d,%r9d
  0x0000000002926515: mov    %r11d,%r8d
  0x0000000002926518: mov    $0xffffff65,%edx
  0x000000000292651d: mov    %r8d,0x4(%rsp)
  0x0000000002926522: mov    %r9d,0x8(%rsp)
  0x0000000002926527: callq  0x00000000028557a0  ; OopMap{off=364}
                        ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)
                        ;   {runtime_call}
  0x000000000292652c: int3                      ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292652d: mov    %r11d,%r9d
  0x0000000002926530: jmp    0x0000000002926518
  0x0000000002926532: mov    %r8d,%r9d
  0x0000000002926535: mov    %r11d,%r8d
  0x0000000002926538: jmp    0x0000000002926518
  0x000000000292653a: hlt    
  0x000000000292653b: hlt    
  0x000000000292653c: hlt    
  0x000000000292653d: hlt    
  0x000000000292653e: hlt    
  0x000000000292653f: hlt    
[Exception Handler]
 ...

我想知道为什么这个方法没有优化。也许它再次值得作为一个单独的问题被问到。但我的直觉是它是人工测试配置的副作用,涉及最终n以及内部表达式的循环变量的使用。在类似的(更现实的)设置中,这种无用的,无副作用的方法通常被相当可靠地消除。

答案 3 :(得分:0)

当方法第一次运行时,JVM没有机会使用其即时编译器重新编译该方法。 这是许多微基准工作的问题。 假设main()调用test1(),make main()多次调用test1(),并测量每次test1()调用的时间。您将看到后续调用test1()将更快地运行

public class Test
{
    public static void main(String[] args)
    {
        System.out.println("test0");
        for (int i = 0; i < 10; i++)
        {
            long t = System.currentTimeMillis();
            for (int j = 0; j < 100000; j++)
            {
                test0();
            }
            t = System.currentTimeMillis() - t;
            System.out.println(t);
        }
        System.out.println("test1");
        for (int i = 0; i < 10; i++)
        {
            long t = System.currentTimeMillis();
            for (int j = 0; j < 100000; j++)
            {
                test1();
            }
            t = System.currentTimeMillis() - t;
            System.out.println(t);
        }
    }

    private static final int n = 10000;
    private static int z = 10000;

    private static void test0()
    {
        int m = 0;

        for(int i = 0; i < n; ++i)
        {
            m = Math.max(i, m);
        }

        z += m;
    }

    private static void test1()
    {
        int m = 0;

        for(int i = 0; i < n; ++i)
        {
            m = ((i >= m) ? i : m);
        }

        z += m;
    }
}