内联代码比Java中的函数调用/静态函数慢

时间:2012-08-30 16:38:47

标签: java function inline

我一直在运行一些测试,以了解内联函数代码(在代码本身中显式编写函数算法)如何影响性能。我将一个简单的字节数组写入整数代码,然后将其包装在一个函数中,从另一个类中静态调用它,并从类本身静态调用它。代码如下:

public class FunctionCallSpeed {
    public static final int numIter = 50000000;

    public static void main (String [] args) {
        byte [] n = new byte[4];

        long start;

        System.out.println("Function from Static Class =================");
        start = System.nanoTime();
        for (int i = 0; i < numIter; i++) {
            StaticClass.toInt(n);
        }
        System.out.println("Elapsed time: " + (double)(System.nanoTime() - start) / 1000000000 + "s");

        System.out.println("Function from Class ========================");
        start = System.nanoTime();
        for (int i = 0; i < numIter; i++) {
            toInt(n);
        }
        System.out.println("Elapsed time: " + (double)(System.nanoTime() - start) / 1000000000 + "s");

        int actual = 0;

        int len = n.length;

        System.out.println("Inline Function ============================");
        start = System.nanoTime();
        for (int i = 0; i < numIter; i++) {
            for (int j = 0; j < len; j++) {
                actual += n[len - 1 - j] << 8 * j;
            }
        }
        System.out.println("Elapsed time: " + (double)(System.nanoTime() - start) / 1000000000 + "s");
    }

    public static int toInt(byte [] num) {
        int actual = 0;

        int len = num.length;

        for (int i = 0; i < len; i++) {
            actual += num[len - 1 - i] << 8 * i;
        }

        return actual;
    }
}

结果如下:

Function from Static Class =================
Elapsed time: 0.096559931s
Function from Class ========================
Elapsed time: 0.015741711s
Inline Function ============================
Elapsed time: 0.837626286s

字节码是否有奇怪的东西?我自己看过字节码,但我不是很熟悉,我无法做出正面或反面。

修改

我添加了assert语句来读取输出,然后将读取的字节随机化,基准测试现在的行为与我认为的方式相同。感谢Tomasz Nurkiewicz,他向我指出了微基准文章。因此得到的代码是:

public class FunctionCallSpeed {
public static final int numIter = 50000000;

public static void main (String [] args) {
    byte [] n;

    long start, end;
    int checker, calc;

    end = 0;
    System.out.println("Function from Object =================");
    for (int i = 0; i < numIter; i++) {
        checker = (int)(Math.random() * 65535);
        n = toByte(checker);
        start = System.nanoTime();
        calc = StaticClass.toInt(n);
        end += System.nanoTime() - start;
        assert calc == checker;
    }
    System.out.println("Elapsed time: " + (double)end / 1000000000 + "s");
    end = 0;
    System.out.println("Function from Class ==================");
    start = System.nanoTime();
    for (int i = 0; i < numIter; i++) {
        checker = (int)(Math.random() * 65535);
        n = toByte(checker);
        start = System.nanoTime();
        calc = toInt(n);
        end += System.nanoTime() - start;
        assert calc == checker;
    }
    System.out.println("Elapsed time: " + (double)end / 1000000000 + "s");


    int len = 4;
    end = 0;
    System.out.println("Inline Function ======================");
    start = System.nanoTime();
    for (int i = 0; i < numIter; i++) {
        calc = 0;
        checker = (int)(Math.random() * 65535);
        n = toByte(checker);
        start = System.nanoTime();
        for (int j = 0; j < len; j++) {
            calc += n[len - 1 - j] << 8 * j;
        }
        end += System.nanoTime() - start;
        assert calc == checker;
    }
    System.out.println("Elapsed time: " + (double)(System.nanoTime() - start) / 1000000000 + "s");
}

public static byte [] toByte(int val) {
    byte [] n = new byte[4];

    for (int i = 0; i < 4; i++) {
        n[i] = (byte)((val >> 8 * i) & 0xFF);
    }
    return n;
}

public static int toInt(byte [] num) {
    int actual = 0;

    int len = num.length;

    for (int i = 0; i < len; i++) {
        actual += num[len - 1 - i] << 8 * i;
    }

    return actual;
}
}

结果:

Function from Static Class =================
Elapsed time: 9.276437031s
Function from Class ========================
Elapsed time: 9.225660708s
Inline Function ============================
Elapsed time: 5.9512E-5s

4 个答案:

答案 0 :(得分:5)

总是很难保证JIT正在做什么,但如果我不得不猜测,它注意到函数的返回值从未被使用过,并且优化了很多。

如果您实际使用函数的返回值,我敢打赌它会改变速度。

答案 1 :(得分:3)

您有几个问题,但主要问题是您正在测试一个优化代码的一次迭代。这肯定会给你带来喜忧参半的结果。我建议运行测试2秒,忽略前10,000次迭代。

如果不保留循环的结果,则可以在一些随机间隔后丢弃整个循环。

将每个测试分解为单独的方法

public class FunctionCallSpeed {
    public static final int numIter = 50000000;
    private static int dontOptimiseAway;

    public static void main(String[] args) {
        byte[] n = new byte[4];

        for (int i = 0; i < 10; i++) {
            test1(n);
            test2(n);
            test3(n);
            System.out.println();
        }
    }

    private static void test1(byte[] n) {
        System.out.print("from Static Class: ");
        long start = System.nanoTime();
        for (int i = 0; i < numIter; i++) {
            dontOptimiseAway = FunctionCallSpeed.toInt(n);
        }
        System.out.print((System.nanoTime() - start) / numIter + "ns ");
    }

    private static void test2(byte[] n) {
        long start;
        System.out.print("from Class: ");
        start = System.nanoTime();
        for (int i = 0; i < numIter; i++) {
            dontOptimiseAway = toInt(n);
        }
        System.out.print((System.nanoTime() - start) / numIter + "ns ");
    }

    private static void test3(byte[] n) {
        long start;
        int actual = 0;

        int len = n.length;

        System.out.print("Inlined: ");
        start = System.nanoTime();
        for (int i = 0; i < numIter; i++) {
            for (int j = 0; j < len; j++) {
                actual += n[len - 1 - j] << 8 * j;
            }
            dontOptimiseAway = actual;
        }
        System.out.print((System.nanoTime() - start) / numIter + "ns ");
    }

    public static int toInt(byte[] num) {
        int actual = 0;

        int len = num.length;

        for (int i = 0; i < len; i++) {
            actual += num[len - 1 - i] << 8 * i;
        }

        return actual;
    }
}

打印

from Class: 7ns Inlined: 11ns from Static Class: 9ns 
from Class: 6ns Inlined: 8ns from Static Class: 8ns 
from Class: 6ns Inlined: 9ns from Static Class: 6ns

这表明当内循环单独优化时,效率稍高。

但是,如果我使用优化的字节转换为int

public static int toInt(byte[] num) {
    return num[0] + (num[1] << 8) + (num[2] << 16) + (num[3] << 24);
}

所有测试报告

from Static Class: 0ns from Class: 0ns Inlined: 0ns 
from Static Class: 0ns from Class: 0ns Inlined: 0ns 
from Static Class: 0ns from Class: 0ns Inlined: 0ns 

因为它意识到测试没有做任何有用的事情。 ;)

答案 2 :(得分:3)

我将您的测试用例移植到caliper

import com.google.caliper.SimpleBenchmark;

public class ToInt extends SimpleBenchmark {

    private byte[] n;
    private int total;

    @Override
    protected void setUp() throws Exception {
        n = new byte[4];
    }

    public int timeStaticClass(int reps) {
        for (int i = 0; i < reps; i++) {
            total += StaticClass.toInt(n);
        }
        return total;
    }

    public int timeFromClass(int reps) {
        for (int i = 0; i < reps; i++) {
            total += toInt(n);
        }
        return total;
    }

    public int timeInline(int reps) {
        for (int i = 0; i < reps; i++) {
            int actual = 0;
            int len = n.length;
            for (int i1 = 0; i1 < len; i1++) {
                actual += n[len - 1 - i1] << 8 * i1;
            }
            total += actual;
        }
        return total;
    }

    public static int toInt(byte[] num) {
        int actual = 0;
        int len = num.length;
        for (int i = 0; i < len; i++) {
            actual += num[len - 1 - i] << 8 * i;
        }
        return actual;
    }
}

class StaticClass {
    public static int toInt(byte[] num) {
        int actual = 0;

        int len = num.length;

        for (int i = 0; i < len; i++) {
            actual += num[len - 1 - i] << 8 * i;
        }

        return actual;
    }

}

确实看起来像内联版本是最慢的,而两个静态版本几乎相同(如预期的那样):

caliper

原因很难想象。我可以想到两个因素:

  • 当代码块尽可能小且易于推理时,JVM在执行微优化方面更胜一筹。当函数内联时,整个代码变得更加复杂并且JVM放弃了。使用较小的toInt()函数,JIT更聪明

  • 缓存局部性 - 不知何故JVM使用两个小块代码(循环和方法)而不是一个更大的代码表现得更好

答案 3 :(得分:0)

您的测试存在缺陷。第二个测试是已经运行的第一个测试的好处。您需要在自己的JVM调用中运行每个测试用例。