Question

我有两个几乎相同的类，可以找到下一个斐波纳契数。不同之处在于其中一个使用内部锁定而另一个使用显式锁定。内部锁定实现比具有显式锁定的实现快得多，并且比STM或无锁实现更快。

Impl/Threads      | 1      | 2      | 4      | 8      | 16     |
---               | ---    | ---    | ---    | ---    | ---    |
IntrinsicLocking  | 4959   | 3132   | 3458   | 3059   | 3402   |
ExplicitLocking   | 4112   | 5348   | 6478   | 12895  | 13492  |
STM               | 5193   | 5210   | 4899   | 5259   | 6733   |
LockFree          | 4362   | 3601   | 3660   | 4732   | 4923   |

该表显示了下一个斐波纳契数的单次计算的平均时间。在java 8和7上测试过。代码放在github https://github.com/f0y/fibench/tree/master/src/main/java/fibonacci/mdl

上

有人可以解释为什么内部锁定实现会获胜吗？

Answer 1

这个基准测试在很多层面都是错误的，但讨论结果还没有意义。

这是简单的JMH基准测试：

package fibonacci.bench;

import fibonacci.mdl.ExplicitLocking;
import fibonacci.mdl.FibonacciGenerator;
import fibonacci.mdl.IntrinsicLocking;
import fibonacci.mdl.LockFree;
import fibonacci.mdl.STM;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.GenerateMicroBenchmark;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;

import java.math.BigInteger;

/*
   Implementation notes:

   * This benchmark does not exhibit a steady state, which means
     we can not do timed runs. Instead, we have to time single
     invocations; therefore preset benchmark mode.

   * Each iteration should start from the pristine state,
     therefore we reinitialize in @Setup(Iteration).

   * Since we are interested in performance beyond the first
     invocation, we have to call several times and aggregate
     the time; this is why we have batchSize > 1. Note the
     performance might be different depending on given batch
     size.

   * Since we have to provide warmup, we do many iterations.

   * The performance is different run to run, because we are 
     measuring sometimes undeterministic thread allocations.
     JMH does it for us, hence multiple forks.

   * We don't want the profiles for difference FibonacciGenerator
     to mix up. JMH already takes care of that for us by forking
     each test.
     */

@BenchmarkMode(Mode.SingleShotTime)
@Warmup(iterations = 100, batchSize = JmhBench.BATCH_SIZE)
@Measurement(iterations = 100, batchSize = JmhBench.BATCH_SIZE)
@State(Scope.Benchmark)
public class JmhBench {

    public static final int BATCH_SIZE = 50000;

    private FibonacciGenerator explicitLock;
    private IntrinsicLocking intrinsicLock;
    private LockFree lockFree;
    private STM stm;

    @Setup(Level.Iteration)
    public void setup() {
        explicitLock = new ExplicitLocking();
        intrinsicLock = new IntrinsicLocking();
        lockFree = new LockFree();
        stm = new STM();
    }

    @GenerateMicroBenchmark
    public BigInteger stm() {
        return stm.next();
    }

    @GenerateMicroBenchmark
    public BigInteger explicitLock() {
        return explicitLock.next();
    }

    @GenerateMicroBenchmark
    public BigInteger intrinsicLock() {
        return intrinsicLock.next();
    }

    @GenerateMicroBenchmark
    public BigInteger lockFree() {
        return lockFree.next();
    }

}

在我的Linux x86_64，JDK 8b129和四个线程上：

Benchmark                      Mode   Samples         Mean   Mean error    Units
f.b.JmhBench.explicitLock        ss       100     1010.921       31.117       ms
f.b.JmhBench.intrinsicLock       ss       100     1121.355       20.386       ms
f.b.JmhBench.lockFree            ss       100     1848.635       83.700       ms
f.b.JmhBench.stm                 ss       100     1893.477       52.665       ms

java中的显式vs内在锁定降级

1 个答案: