我想使用CAS来改进我的代码,但我怀疑它可以获得更好的性能,所以我做了一个测试。这是测试代码,这个jmh代码是可靠的吗?
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.SampleTime)
@Warmup(iterations = 5)
@Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS)
@Threads(20)
@Fork(1)
@State(Scope.Benchmark)
public class CASBench {
private int id=24;
private static Object[] lockObj;
private static AtomicReference<Integer>[] locks;
static {
lockObj = new Object[100];
for (int i = 0; i < lockObj.length; i++) {
lockObj[i] = new Object();
}
locks = new AtomicReference[100];
for (int i = 0; i < locks.length; i++) {
locks[i] = new AtomicReference<Integer>(null);
}
}
@Benchmark
public void sync() throws Exception {
int index = id % 100;
synchronized (lockObj[index]) {
test();
}
}
@Benchmark
public void cas() throws Exception {
AtomicReference<Integer> lock = locks[id % 100];
while (!lock.compareAndSet(null, id)) {
}
test();
lock.compareAndSet(id, null);
}
public void test() throws Exception {
int sum=0;
for(int i=0;i<100;i++){
sum += i;
}
}
}
我得到了jmh测试结果:
Benchmark Mode Cnt Score Error Units
CASBench.cas sample 25866638 0.014 ± 0.001 ms/op
CASBench.cas:cas·p0.00 sample ≈ 10⁻⁶ ms/op
CASBench.cas:cas·p0.50 sample ≈ 10⁻⁴ ms/op
CASBench.cas:cas·p0.90 sample 0.001 ms/op
CASBench.cas:cas·p0.95 sample 0.001 ms/op
CASBench.cas:cas·p0.99 sample 0.001 ms/op
CASBench.cas:cas·p0.999 sample 0.002 ms/op
CASBench.cas:cas·p0.9999 sample 38.164 ms/op
CASBench.cas:cas·p1.00 sample 813.695 ms/op
CASBench.sync sample 26257757 0.011 ± 0.001 ms/op
CASBench.sync:sync·p0.00 sample ≈ 10⁻⁶ ms/op
CASBench.sync:sync·p0.50 sample ≈ 10⁻⁴ ms/op
CASBench.sync:sync·p0.90 sample 0.001 ms/op
CASBench.sync:sync·p0.95 sample 0.001 ms/op
CASBench.sync:sync·p0.99 sample 0.005 ms/op
CASBench.sync:sync·p0.999 sample 1.883 ms/op
CASBench.sync:sync·p0.9999 sample 15.270 ms/op
CASBench.sync:sync·p1.00 sample 45.810 ms/op
我可以得出这个结论,在这种情况下,同步更好吗?
答案 0 :(得分:1)
我的测试确实不正确据我所知。首先,您的基准测试应返回一个值,如样本here中指定的或使用BlackHoles
。
有两种方法可以测试,首先是contention
但没有。{/ p>
让我们看看在争用中会发生什么,它更容易掌握:
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class Contention {
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.jvmArgs("-ea")
.shouldFailOnError(true)
.include(Contention.class.getSimpleName()).build();
new Runner(opt).run();
}
private AtomicInteger atomic;
private Object lock = new Object();
private int i = 0;
@Setup
public void setUp() {
atomic = new AtomicInteger(0);
}
@Fork(1)
@Threads(10)
@Benchmark
public int incrementAtomic() {
return atomic.incrementAndGet();
}
@Fork(1)
@Threads(10)
@Benchmark
public int incrementSync() {
synchronized (lock) {
++i;
}
return i;
}
}
代码应该是不言自明的;在这里稍作解释:
State(Scope.Benchmark)
如果您将其更改为:State(Scope.Thread)
每个线程都会获得自己的锁,因此此代码会被biased-locking
扭曲。
这意味着如果您将使用以下代码运行此代码:
State(Scope.Thread)
你的输出会非常相似。像这样:
Benchmark Mode Cnt Score Error Units
casVSsynchronized.Contention.incrementAtomic avgt 5 36.526 ± 6.548 ns/op
casVSsynchronized.Contention.incrementSync avgt 5 23.655 ± 3.393 ns/op
用:
运行它@State(Scope.Benchmark)
显示完整的不同图片。 在争用情况下CAS会更好地,您可以从结果中看到:
Benchmark Mode Cnt Score Error Units
casVSsynchronized.Contention.incrementAtomic avgt 5 212.997 ± 42.902 ns/op
casVSsynchronized.Contention.incrementSync avgt 5 457.896 ± 46.811 ns/op
比我有一个更复杂的测试(可能需要jmh devs的更多限制性评论):
import java.util.concurrent.TimeUnit;
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
public class CASSync {
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.jvmArgs("-ea")
.shouldFailOnError(true)
.include(CASSync.class.getSimpleName()).build();
new Runner(opt).run();
}
@State(Scope.Thread)
static public class AtomicHolder {
AtomicInteger i = null;
@Setup(Level.Invocation)
public void setUp() {
i = new AtomicInteger(0);
}
@TearDown(Level.Invocation)
public void tearDown() {
assert i.intValue() == 1;
i = null;
}
}
@State(Scope.Thread)
static public class SyncHolder {
int i = 0;
Object lock = null;
@Setup(Level.Invocation)
public void setUp() {
lock = new Object();
i = 0;
}
@TearDown(Level.Invocation)
public void tearDown() {
assert i == 1;
lock = null;
}
}
@Benchmark
@Fork(1)
public boolean cas(AtomicHolder holder) {
return holder.i.compareAndSet(0, 1);
}
@Benchmark
@Fork(1)
public boolean sync(SyncHolder holder) {
synchronized (holder.lock) {
++holder.i;
}
return holder.i == 1;
}
}
这个测试的情况是根本没有争用(就像第一个一样),但这次摆脱了biased-locking
。结果:
Benchmark Mode Cnt Score Error Units
casVSsynchronized.CASSync.cas avgt 5 44.003 ± 1.343 ns/op
casVSsynchronized.CASSync.sync avgt 5 50.744 ± 1.370 ns/o
我的结论:对于竞争环境,CAS更好。对于其他人来说,这是值得商榷的。