Question

[简答：糟糕的基准测试方法。你以为我现在已经知道了。]

问题显示为＆＃34;找到一种快速计算x ^ y的方法，其中x和y是正整数＆＃34;。一个典型的＆＃34;快速＆＃34;算法看起来像这样：

public long fastPower(int x, int y) {
  // Replaced my code with the "better" version described below,
  // but this version isn't measurably faster than what I had before
  long base = x; // otherwise, we may overflow at x *= x.
  long result = y % 2 == 1 ? x : 1;
  while (y > 1) {
    base *= base;
    y >>= 1;
    if (y % 2 == 1) result *= base;
  }

  return result;
}

我想知道这比说的快多少，调用Math.pow（），或者使用一种天真的方法，例如将x乘以y，就像这样：

public long naivePower(int x, int y) {
  long result = 1;
  for (int i = 0; i < y; i++) {
    result *= x;
  }
  return result;
}

编辑：好的，我已经（正确地）向我指出我的基准测试代码没有消耗结果，这完全抛弃了一切。一旦我开始消费结果，我仍然看到天真的方法比快速的＆＃34;快＃25;方法

原文：

我很惊讶地发现天真的方法比快速的＃34快4倍。版本，它本身比Math.pow（）版本快3倍。

我的测试是使用10,000,000次试验（然后是1亿次，只是为了绝对确保JIT有时间预热），每次都使用随机值（以防止调用被优化掉）2＆lt; = x＆lt; = 3，并且25＆lt; = y＆lt; = 29.我选择了一个范围窄的值，这些值不会产生大于2 ^ 63的结果，但是偏向于具有更大的指数以试图给出＆＃34 ;快速＆＃34;版本优势。我预先生成了10,000个伪随机数，以便从时间中消除该部分代码。

据我所知，对于小型指数，天真的版本可能会更快。＆＃34;快速＆＃34;版本有两个分支而不是一个，并且通常会执行两倍于天真的算术/存储操作 - 但我希望对于大型指数，这仍然会导致快速方法在最佳情况下节省一半的操作，并且在最坏的情况下差不多。

任何人都知道为什么天真的方法比快速的方式快得多？＃34;版本，即使数据偏向于＆＃34;快速＆＃34;版本（即更大的指数）？该代码中的额外分支是否在运行时占了很大的差异？

基准测试代码（是的，我知道我应该使用一些框架用于＆＃34;官方和＃34;基准测试，但这是一个玩具问题） - 更新为热身，并消耗结果：

PowerIf[] powers = new PowerIf[] {
  new EasyPower(), // just calls Math.pow() and cast to int
  new NaivePower(),
  new FastPower()
};

Random rand = new Random(0); // same seed for each run
int randCount = 10000;
int[] bases = new int[randCount];
int[] exponents = new int[randCount];
for (int i = 0; i < randCount; i++) {
  bases[i] = 2 + rand.nextInt(2);
  exponents[i] = 25 + rand.nextInt(5);
}

int count = 1000000000;

for (int trial = 0; trial < powers.length; trial++) {
  long total = 0;
  for (int i = 0; i < count; i++) { // warm up
    final int x = bases[i % randCount];
    final int y = exponents[i % randCount];
    total += powers[trial].power(x, y);
  }
  long start = System.currentTimeMillis();
  for (int i = 0; i < count; i++) {
    final int x = bases[i % randCount];
    final int y = exponents[i % randCount];
    total += powers[trial].power(x, y);
  }
  long end = System.currentTimeMillis();
  System.out.printf("%25s: %d ms%n", powers[trial].toString(), (end - start)); 
  System.out.println(total);
}

产生输出：

                EasyPower: 7908 ms
-407261252961037760
               NaivePower: 1993 ms
-407261252961037760
                FastPower: 2394 ms
-407261252961037760

使用随机数和试验的参数确实会改变输出特性，但测试之间的比率始终与显示的相同。

Answer 1

fastPower有两个问题：

最好将y % 2 == 0替换为(y & 1) == 0;按位运算更快。
您的代码总是递减y并执行额外的乘法，包括y为偶数时的情况。最好将此部分放入else子句中。

无论如何，我猜你的基准测试方法并不完美。 4倍的性能差异听起来很奇怪，如果没有看到完整的代码就无法解释。

在应用了上述改进之后，我使用JMH基准验证了fastPower确实比naivePower更快，因子为1.3x到2x。

package bench;

import org.openjdk.jmh.annotations.*;

@State(Scope.Benchmark)
public class FastPow {
    @Param("3")
    int x;
    @Param({"25", "28", "31", "32"})
    int y;

    @Benchmark
    public long fast() {
        return fastPower(x, y);
    }

    @Benchmark
    public long naive() {
        return naivePower(x, y);
    }

    public static long fastPower(long x, int y) {
        long result = 1;
        while (y > 0) {
            if ((y & 1) == 0) {
                x *= x;
                y >>>= 1;
            } else {
                result *= x;
                y--;
            }
        }
        return result;
    }

    public static long naivePower(long x, int y) {
        long result = 1;
        for (int i = 0; i < y; i++) {
            result *= x;
        }
        return result;
    }
}

结果：

Benchmark      (x)  (y)   Mode  Cnt    Score   Error   Units
FastPow.fast     3   25  thrpt   10  103,406 ± 0,664  ops/us
FastPow.fast     3   28  thrpt   10  103,520 ± 0,351  ops/us
FastPow.fast     3   31  thrpt   10   85,390 ± 0,286  ops/us
FastPow.fast     3   32  thrpt   10  115,868 ± 0,294  ops/us
FastPow.naive    3   25  thrpt   10   76,331 ± 0,660  ops/us
FastPow.naive    3   28  thrpt   10   69,527 ± 0,464  ops/us
FastPow.naive    3   31  thrpt   10   54,407 ± 0,231  ops/us
FastPow.naive    3   32  thrpt   10   56,127 ± 0,207  ops/us

注意：整数乘法操作非常快，sometimes even faster than an extra comparison。不要指望使用适合long的值的巨大性能改进。快速功率算法的优势将在具有较大指数的BigInteger上显而易见。

更新

由于作者发布了基准测试，我必须承认，令人惊讶的性能结果来自常见的基准测试陷阱。我在保留原始方法的同时改进了基准，现在它显示FastPower确实比NaivePower，see here更快。

改进版本的主要变化是什么？

应在不同的JVM实例中单独测试不同的算法，以防止配置文件污染。
必须多次调用基准测试才能进行正确的编译/重新编译，直到达到稳定状态。
一个基准测试应该放在一个单独的方法中，以避免堆栈内更换问题。
y % 2已替换为y & 1，因为HotSpot不会自动执行此优化。
最大限度地减少主要基准测试循环中不相关操作的影响。

手动编写微基准测试是一项艰巨的任务。这就是为什么强烈建议使用适当的基准测试框架，如JMH。

Answer 2

如果无法review and replicate您的基准测试，那么尝试分解您的结果就没什么意义了。它们可能是由于输入选择不当，错误的基准测试实践，例如在另一个测试之前运行一个测试（从而使JVM时间为＃34;预热＆＃34;）等等。请分享您的基准代码，而不仅仅是您的结果。

我建议在你的测试中加入Guava＆＃39; LongMath.pow()（src），这是一个使用频繁且基准很好的方法。虽然你可能能够通过某些输入击败它，但在一般情况下你不可能改善其运行时间（如果可以的话，他们会喜欢听到它）。

Math.pow()表现比正整数算法差，这并不奇怪。看着＆＃34;快速＆＃34; vs.＆＃34;天真＆＃34;实施它显然非常依赖于您选择的输入作为Mike＆＃39; Pomax＆＃39; Kamermans建议。适用于y的小值{＆0;天真＆＃34;解决方案显然必须做更少的工作。但是对于更大的值，我们使用＆＃34; fast＆＃34;来保存大量的迭代次数。实施

Answer 3

在我看来，问题的第一个fastPower(base, exponent)是错误的，如果没有给出错误的结果。（下面的intPower()的第一个版本是 buggy ，如同给出错误的结果，以及略微误导的基准测试结果。）
由于评论＆＃34;格式化功能＆＃34;，通过平方作为答案进行争论的另一种取幂：

static public long intPower(int base, int exponent) {
    if (0 == base
        || 1 == base)
        return base;
    int y = exponent;
    if (y <= 0)
        return 0 == y ? 1 : -1 != base ? 0 : y % 2 == 1 ? -1 : 1;
    long result = y % 2 == 1 ? base : 1,
        power = base;
    while (1 < y) {
        power *= power;
        y >>= 1; // easier to see termination after Type.SIZE iterations
        if (y % 2 == 1)
            result *= power;
    }
    return result;
}

如果你做微基准测试（什么是典型的整数指数使用？），如果使用框架，做一个适当的预热。 永远不会在微基准测试结果中投入时间进行计时运行，每次替换时间不到5秒。

一种来自番石榴LongMath.pow(b, e)的替代品：

public long power(int base, int k) {
    for (long accum = 1, b = base ;; k >>>= 1)
        switch (k) {
        case 0:
            return accum;
        case 1:
            return accum * b;
        default:
            if ((k&1) != 0) // guava uses conditional multiplicand
                accum *= b;
            b *= b;
        }
}

Answer 4

while循环运行log2(y)次，而for循环运行y次，因此根据您的输入，运行速度会快于另一个。< / p>

最糟糕的情况是，while循环运行：

比较（while有条件的）
一个模数，
比较，最糟糕的情况是另外三个操作：
乘法
转移分配
另一个乘法，最后，
减量。

而天真的for循环运行：

比较（for有条件的），
乘法，
增量（for迭代器）

所以你希望天然循环对y的小值更快，因为for循环中较少的操作数量优于“快速”方法的log2减少，如这些额外操作的时间损失大于log2减少y所获得的时间。

＆＃34;快速＆＃34; Java中的整数权力

4 个答案:

更新