我最近开始对一些Java代码进行基准测试,以便为我的程序获得最佳性能结果,并注意到一些奇怪的事情。也就是说,我已经对以下方法进行了基准测试:
private static final int n = 10000;
public static void test0(){
int m = 0;
for(int i = 0; i < n; ++i){
m = Math.max(i, m);
}
}
public static void test1(){
int m = 0;
for(int i = 0; i < n; ++i){
m = ((i >= m) ? i : m);
}
}
得到了那些结果:
| Test 0 | Test 1 |
----------+-----------------+-----------------+-
Average: | 51,77 ns | 13956,63 ns |
Best: | 0,00 ns | 6514,00 ns |
Worst: | 25,45 ms | 60,50 ms |
Tries: | 16971233 | 16971233 |
搜索完SO(即Is Math.max(a,b) or (a>b)?a:b faster in Java?)后,我确信test1
不应该慢得多。
这些方法在30秒内在8个线程上随机测试,我运行的每个基准测试似乎都相似。
我使用jdk1.8.0_45
。
那么,为什么test1
比test0
慢200多倍?
答案 0 :(得分:3)
由于Math.max
是一个静态函数,编译器可能会发现代码什么都不做,只是通过不执行它来优化执行!
变量m
是函数的本地变量,分配它并没有帮助,因为它永远不会被读取。
您需要确保执行以某种方式修改某些内容,以便编译器不会对其进行积极优化。
例如,您可以在测试结束时简单地打印m
的值,或者使m
成为稍后可以访问的类变量,或者甚至按照我最初的建议对结果求和评论。
答案 1 :(得分:2)
Math.max(a,b)
可以非常具有攻击性/明显优化为处理器的本机指令。
对于三元组,处理器指令的简单转换将是比较+跳转,尤其是跳转成本很高。
要将三元组优化到JIT(及时编译器),必须认识到代码表达了最大值,本机指令是最好的。
JIT可能最终会认识到这一点,但在此之前它会更慢。
答案 2 :(得分:1)
只是为了确认(在某种程度上)Jean Logeart在他的回答中所说的内容:添加一个调用两种方法的普通main
时,如
class MaxOpt
{
public static void main(String args[])
{
for (int i=0; i<100000; i++)
{
runTests();
}
}
private static void runTests()
{
test0();
test1();
}
private static final int n = 10000;
public static void test0(){
int m = 0;
for(int i = 0; i < n; ++i){
m = Math.max(i, m);
}
}
public static void test1(){
int m = 0;
for(int i = 0; i < n; ++i){
m = ((i >= m) ? i : m);
}
}
}
使用
在热点反汇编程序VM中运行它-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintInlining -XX:+PrintAssembly
然后(在带有Java 1.8.0_92的Win7 / 64上)test0
方法的最终机器代码将是
Decoding compiled method 0x0000000002925510:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x000000001bd10458} 'test0' '()V' in 'MaxOpt'
# [sp+0x20] (sp of caller)
0x0000000002925640: sub $0x18,%rsp
0x0000000002925647: mov %rbp,0x10(%rsp) ;*synchronization entry
; - MaxOpt::test0@-1 (line 20)
0x000000000292564c: add $0x10,%rsp
0x0000000002925650: pop %rbp
0x0000000002925651: test %eax,-0x26f5657(%rip) # 0x0000000000230000
; {poll_return}
0x0000000002925657: retq
0x0000000002925658: hlt
0x0000000002925659: hlt
0x000000000292565a: hlt
0x000000000292565b: hlt
0x000000000292565c: hlt
0x000000000292565d: hlt
0x000000000292565e: hlt
0x000000000292565f: hlt
[Exception Handler]
...
是的,它基本上什么也没做。
令人惊讶的是,对于test1
,JIT显然会进行一些奇怪的循环展开,但不似乎检测到该方法无用且无副作用(可以被优化掉,但事实并非如此)
结果组装相当庞大....:
Decoding compiled method 0x0000000002926290:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x000000001bd103a8} 'runTests' '()V' in 'MaxOpt'
# [sp+0x30] (sp of caller)
0x00000000029263c0: mov %eax,-0x6000(%rsp)
0x00000000029263c7: push %rbp
0x00000000029263c8: sub $0x20,%rsp ;*synchronization entry
; - MaxOpt::runTests@-1 (line 13)
0x00000000029263cc: xor %r8d,%r8d
0x00000000029263cf: mov $0x1,%r11d
0x00000000029263d5: data32 data32 nopw 0x0(%rax,%rax,1)
;*iload_1
; - MaxOpt::test1@11 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x00000000029263e0: cmp %r8d,%r11d
0x00000000029263e3: jl 0x000000000292650c ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x00000000029263e9: mov %r11d,%r8d
0x00000000029263ec: inc %r8d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x00000000029263ef: cmp %r11d,%r8d
0x00000000029263f2: jl 0x000000000292652d ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x00000000029263f8: mov %r11d,%r9d
0x00000000029263fb: add $0x2,%r9d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x00000000029263ff: cmp %r8d,%r9d
0x0000000002926402: jl 0x000000000292650f ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x0000000002926408: mov %r11d,%r8d
0x000000000292640b: add $0x3,%r8d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x000000000292640f: cmp %r9d,%r8d
0x0000000002926412: jl 0x0000000002926518 ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x0000000002926418: mov %r11d,%r9d
0x000000000292641b: add $0x4,%r9d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x000000000292641f: cmp %r8d,%r9d
0x0000000002926422: jl 0x000000000292650f ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x0000000002926428: mov %r11d,%r8d
0x000000000292642b: add $0x5,%r8d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x000000000292642f: cmp %r9d,%r8d
0x0000000002926432: jl 0x0000000002926518 ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x0000000002926438: mov %r11d,%r9d
0x000000000292643b: add $0x6,%r9d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x000000000292643f: cmp %r8d,%r9d
0x0000000002926442: jl 0x000000000292650f ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x0000000002926448: mov %r11d,%r8d
0x000000000292644b: add $0x7,%r8d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x000000000292644f: cmp %r9d,%r8d
0x0000000002926452: jl 0x0000000002926518 ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x0000000002926458: mov %r11d,%r9d
0x000000000292645b: add $0x8,%r9d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x000000000292645f: cmp %r8d,%r9d
0x0000000002926462: jl 0x000000000292650f ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x0000000002926468: mov %r11d,%r8d
0x000000000292646b: add $0x9,%r8d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x000000000292646f: cmp %r9d,%r8d
0x0000000002926472: jl 0x0000000002926518 ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x0000000002926478: mov %r11d,%r9d
0x000000000292647b: add $0xa,%r9d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x000000000292647f: cmp %r8d,%r9d
0x0000000002926482: jl 0x000000000292650f ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x0000000002926488: mov %r11d,%r8d
0x000000000292648b: add $0xb,%r8d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x000000000292648f: cmp %r9d,%r8d
0x0000000002926492: jl 0x0000000002926518 ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x0000000002926498: mov %r11d,%r9d
0x000000000292649b: add $0xc,%r9d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x000000000292649f: cmp %r8d,%r9d
0x00000000029264a2: jl 0x000000000292650f ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264a4: mov %r11d,%r8d
0x00000000029264a7: add $0xd,%r8d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264ab: cmp %r9d,%r8d
0x00000000029264ae: jl 0x0000000002926518 ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264b0: mov %r11d,%r9d
0x00000000029264b3: add $0xe,%r9d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264b7: cmp %r8d,%r9d
0x00000000029264ba: jl 0x000000000292650f ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264bc: mov %r11d,%r8d
0x00000000029264bf: add $0xf,%r8d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264c3: cmp %r9d,%r8d
0x00000000029264c6: jl 0x0000000002926518 ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264c8: add $0x10,%r11d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264cc: cmp $0x2701,%r11d
0x00000000029264d3: jl 0x00000000029263e0 ;*if_icmpge
; - MaxOpt::test1@8 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264d9: cmp $0x2710,%r11d
0x00000000029264e0: jge 0x0000000002926500
0x00000000029264e2: xchg %ax,%ax ;*iload_1
; - MaxOpt::test1@11 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264e4: cmp %r8d,%r11d
0x00000000029264e7: jl 0x0000000002926532 ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264e9: mov %r11d,%r10d
0x00000000029264ec: inc %r10d ;*iinc
; - MaxOpt::test1@22 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264ef: cmp $0x2710,%r10d
0x00000000029264f6: jge 0x0000000002926500 ;*if_icmpge
; - MaxOpt::test1@8 (line 30)
; - MaxOpt::runTests@3 (line 14)
0x00000000029264f8: mov %r11d,%r8d
0x00000000029264fb: mov %r10d,%r11d
0x00000000029264fe: jmp 0x00000000029264e4
0x0000000002926500: add $0x20,%rsp
0x0000000002926504: pop %rbp
0x0000000002926505: test %eax,-0x26f650b(%rip) # 0x0000000000230000
; {poll_return}
0x000000000292650b: retq
0x000000000292650c: mov %r11d,%r9d
0x000000000292650f: mov %r9d,%r11d
0x0000000002926512: mov %r8d,%r9d
0x0000000002926515: mov %r11d,%r8d
0x0000000002926518: mov $0xffffff65,%edx
0x000000000292651d: mov %r8d,0x4(%rsp)
0x0000000002926522: mov %r9d,0x8(%rsp)
0x0000000002926527: callq 0x00000000028557a0 ; OopMap{off=364}
;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
; {runtime_call}
0x000000000292652c: int3 ;*if_icmplt
; - MaxOpt::test1@13 (line 31)
; - MaxOpt::runTests@3 (line 14)
0x000000000292652d: mov %r11d,%r9d
0x0000000002926530: jmp 0x0000000002926518
0x0000000002926532: mov %r8d,%r9d
0x0000000002926535: mov %r11d,%r8d
0x0000000002926538: jmp 0x0000000002926518
0x000000000292653a: hlt
0x000000000292653b: hlt
0x000000000292653c: hlt
0x000000000292653d: hlt
0x000000000292653e: hlt
0x000000000292653f: hlt
[Exception Handler]
...
我想知道为什么这个方法没有优化。也许它再次值得作为一个单独的问题被问到。但我的直觉是它是人工测试配置的副作用,涉及最终n
以及内部表达式的循环变量的使用。在类似的(更现实的)设置中,这种无用的,无副作用的方法通常被相当可靠地消除。
答案 3 :(得分:0)
当方法第一次运行时,JVM没有机会使用其即时编译器重新编译该方法。 这是许多微基准工作的问题。 假设main()调用test1(),make main()多次调用test1(),并测量每次test1()调用的时间。您将看到后续调用test1()将更快地运行 !
public class Test
{
public static void main(String[] args)
{
System.out.println("test0");
for (int i = 0; i < 10; i++)
{
long t = System.currentTimeMillis();
for (int j = 0; j < 100000; j++)
{
test0();
}
t = System.currentTimeMillis() - t;
System.out.println(t);
}
System.out.println("test1");
for (int i = 0; i < 10; i++)
{
long t = System.currentTimeMillis();
for (int j = 0; j < 100000; j++)
{
test1();
}
t = System.currentTimeMillis() - t;
System.out.println(t);
}
}
private static final int n = 10000;
private static int z = 10000;
private static void test0()
{
int m = 0;
for(int i = 0; i < n; ++i)
{
m = Math.max(i, m);
}
z += m;
}
private static void test1()
{
int m = 0;
for(int i = 0; i < n; ++i)
{
m = ((i >= m) ? i : m);
}
z += m;
}
}