这个基准比较Java 8u40和8u31是否正确乘以BigInteger值?

时间:2015-04-09 12:17:47

标签: java performance biginteger jmh

我看到JDK-8055494 : Add C2 x86 intrinsic for BigInteger::multiplyToLen() method已修复为8u40更新,但我没想到它会加速BigInteger乘法2倍-3倍。

以下是在两个JVM更新中通过不同公式计算阶乘的基准测试结果:

8u31

[info] Benchmark                             (n)   Mode  Cnt       Score       Error   Units
[info] JavaFactorial.recursion              1000  thrpt    5      13.994 ±     0.175  ops/ms
[info] JavaFactorial.recursion             10000  thrpt    5       0.202 ±     0.054  ops/ms
[info] JavaFactorial.recursionPar           1000  thrpt    5      12.066 ±     8.011  ops/ms
[info] JavaFactorial.recursionPar          10000  thrpt    5       0.253 ±     0.055  ops/ms
[info] JavaFactorial.split                  1000  thrpt    5      18.255 ±     2.656  ops/ms
[info] JavaFactorial.split                 10000  thrpt    5       0.286 ±     0.063  ops/ms

8u40

[info] Benchmark                             (n)   Mode  Cnt       Score       Error   Units
[info] JavaFactorial.recursion              1000  thrpt    5      33.704 ±     0.445  ops/ms
[info] JavaFactorial.recursion             10000  thrpt    5       0.428 ±     0.199  ops/ms
[info] JavaFactorial.recursionPar           1000  thrpt    5      38.170 ±     0.433  ops/ms
[info] JavaFactorial.recursionPar          10000  thrpt    5       0.557 ±     0.030  ops/ms
[info] JavaFactorial.split                  1000  thrpt    5      46.447 ±    11.582  ops/ms
[info] JavaFactorial.split                 10000  thrpt    5       0.586 ±     0.154  ops/ms

我是否应该期望这种JDK改进能够更快地运行my code,还是这是另一个休闲基准?

修改

如何验证我的发现以证明这种加速是由于C2固有的?

经过测试的功能代码:

@State(Scope.Benchmark)
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
public class JavaFactorial {
    @Param({"10", "100", "1000", "10000"})
    public int n;

    private static ForkJoinPool pool = new ForkJoinPool();

    @Benchmark
    public BigInteger loop() {
        return n > 20 ? loop(1, n) : BigInteger.valueOf(fastLoop(1, n));
    }

    @Benchmark
    public BigInteger recursion() {
        return n > 20 ? recursion(1, n) : BigInteger.valueOf(fastLoop(1, n));
    }

    @Benchmark
    public BigInteger recursionPar() {
        return n > 20 ? recursePar(1, n) : BigInteger.valueOf(fastLoop(1, n));
    }

    @Benchmark
    public BigInteger split() {
        return n > 180 ? split(n) : (n > 20 ? recursion(1, n) : BigInteger.valueOf(fastLoop(1, n)));
    }

    private long fastLoop(final int n1, int n2) {
        long p = n1;
        while (n2 > n1) {
            p = p * n2;
            n2--;
        }
        return p;
    }

    private BigInteger loop(int n1, final int n2) {
        final long l = Long.MAX_VALUE >> (32 - Integer.numberOfLeadingZeros(n2));
        long p = 1;
        BigInteger r = BigInteger.ONE;
        while (n1 <= n2) {
            if (p <= l) {
                p *= n1;
            } else {
                r = r.multiply(BigInteger.valueOf(p));
                p = n1;
            }
            n1++;
        }
        return r.multiply(BigInteger.valueOf(p));
    }

    private BigInteger recursion(final int n1, final int n2) {
        if (n2 - n1 < 65) {
            return loop(n1, n2);
        }
        final int nm = (n1 + n2) >> 1;
        return recursion(nm + 1, n2).multiply(recursion(n1, nm));
    }

    private BigInteger recursePar(final int n1, final int n2) {
        if (n2 - n1 < 700) {
            return recursion(n1, n2);
        }
        final int nm = (n1 + n2) >> 1;
        RecursiveTask<BigInteger> t = new RecursiveTask<BigInteger>() {
            protected BigInteger compute() {
                return recursePar(nm + 1, n2);
            }
        };
        if (ForkJoinTask.getPool() == pool) {
            t.fork();
        } else {
            pool.execute(t);
        }
        return recursePar(n1, nm).multiply(t.join());
    }

    private BigInteger loop2(int n1, final int n2) {
        final long l = Long.MAX_VALUE >> (32 - Integer.numberOfLeadingZeros(n2));
        long p = 1;
        BigInteger r = BigInteger.ONE;
        while (n1 <= n2) {
            if (p <= l) {
                p *= n1;
            } else {
                r = r.multiply(BigInteger.valueOf(p));
                p = n1;
            }
            n1 += 2;
        }
        return r.multiply(BigInteger.valueOf(p));
    }

    private BigInteger recursion2(final int n1, final int n2) {
        if (n2 - n1 < 65) {
            return loop2(n1, n2);
        }
        final int nm = ((n1 + n2) >> 1) | 1;
        return recursion2(nm, n2).multiply(recursion2(n1, nm - 2));
    }

    private BigInteger split(int n) {
        int i = 31 - Integer.numberOfLeadingZeros(n), s = -n, o = 1;
        BigInteger p = BigInteger.ONE, r = BigInteger.ONE;
        while (i >= 0) {
            int h = n >> i;
            int o1 = (h - 1) | 1;
            if (o < o1) {
                p = p.multiply(recursion2(o + 2, o1));
                r = r.multiply(p);
            }
            o = o1;
            s += h;
            i--;
        }
        return r.shiftLeft(s);
    }
}

2 个答案:

答案 0 :(得分:2)

在调查了可用的JVM选项列表后,我发现其中一个被添加到8u40,可以准确地打开/关闭所需的内在。

以下是8u40的-XX:-UseMultiplyToLenIntrinsic与8u31非常接近的结果:

    [info] JavaFactorial.recursion              1000  thrpt    5      12.521 ±      3.352  ops/ms
    [info] JavaFactorial.recursion             10000  thrpt    5       0.217 ±      0.069  ops/ms
    [info] JavaFactorial.recursionPar           1000  thrpt    5      14.268 ±      8.319  ops/ms
    [info] JavaFactorial.recursionPar          10000  thrpt    5       0.286 ±      0.015  ops/ms
    [info] JavaFactorial.split                  1000  thrpt    5      18.768 ±      4.321  ops/ms
    [info] JavaFactorial.split                 10000  thrpt    5       0.255 ±      0.076  ops/ms

答案 1 :(得分:1)

这个基准确实表明任何大量使用BigInteger乘法的代码都会受益。

您问题的真正答案是为您自己的代码编写基准测试,并在8u318u40下运行。 希望您的代码更快,因此请对您的代码进行基准测试并对其进行测试。如果JRE代码或其他人的代码更快,这表明您的代码可能会受益,但您必须尝试并查看。