在尝试JDK 8 Streaming功能时,我决定尝试并行/串行流性能测试。我尝试使用在单位正方形上投掷随机飞镖来解决pi的值,并检查单位圆内有多少次着陆。我找到了apache-spark的例子。
这是代码。
package org.sample;
import java.util.concurrent.TimeUnit;
import java.util.stream.IntStream;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
public class MyBenchmark {
@Param({
"1000000",
"10000000"
}) int MAX_COUNT;
@Benchmark
public double parallelPiTest() {
long count = IntStream.range(1, MAX_COUNT).parallel().filter(i -> {
double x= Math.random();
double y= Math.random();
return (x*x + y* y) < 1.0 ;
}).count();
double pi = 4 * count * 1.0 /MAX_COUNT;
return pi;
}
@Benchmark
public double sequentialPiTest() {
long count = IntStream.range(1, MAX_COUNT).filter(i -> {
double x= Math.random();
double y= Math.random();
return (x*x + y* y) < 1.0 ;
}).count();
double pi = 4 * count * 1.0 /MAX_COUNT;
return pi;
}
在我的8核计算机(Windows 7笔记本电脑)上进行简单测试时,并行执行的时间几乎是串行的5倍,所有核心的CPU利用率几乎都是100%。另一方面,串行使用大约20%的一个核心!由于混淆了结果,我尝试使用JMH(上面的代码)和JunitBenchmarks进行基准测试。结果几乎与串行执行一致,总是比并行执行好5倍。我也尝试了100次迭代,但结果仍然类似于下面的5次迭代。我在这里遗漏了一些基本的东西吗?
JMH基准测试结果:
C:\Users\local\lunaeeworkspace\benchmarktest>mvn clean install
"******::" C:\Progra~1\Java\jdk1.8.0_20
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Auto-generated JMH benchmark 1.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ benchmarktest ---
[INFO] Deleting C:\Users\local\lunaeeworkspace\benchmarktest\target
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ benchmarktest ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory C:\Users\local\lunaeeworkspace\benchmarktest\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ benchmarktest ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 1 source file to C:\Users\local\lunaeeworkspace\benchmarktest\target\classes
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ benchmarktest ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory C:\Users\local\lunaeeworkspace\benchmarktest\src\test\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ benchmarktest ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-surefire-plugin:2.17:test (default-test) @ benchmarktest ---
[INFO] No tests to run.
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ benchmarktest ---
[INFO] Building jar: C:\Users\local\lunaeeworkspace\benchmarktest\target\benchmarktest-1.0.jar
[INFO]
[INFO] --- maven-shade-plugin:2.2:shade (default) @ benchmarktest ---
[INFO] Including org.openjdk.jmh:jmh-core:jar:1.1 in the shaded jar.
[INFO] Including net.sf.jopt-simple:jopt-simple:jar:4.6 in the shaded jar.
[INFO] Including org.apache.commons:commons-math3:jar:3.2 in the shaded jar.
[INFO] Replacing C:\Users\local\lunaeeworkspace\benchmarktest\target\benchmarks.jar with C:\Users\local\lunaeework
space\benchmarktest\target\benchmarktest-1.0-shaded.jar
[INFO]
[INFO] --- maven-install-plugin:2.5.1:install (default-install) @ benchmarktest ---
[INFO] Installing C:\Users\local\lunaeeworkspace\benchmarktest\target\benchmarktest-1.0.jar to C:\Users\local\.m2\
repository\org\sample\benchmarktest\1.0\benchmarktest-1.0.jar
[INFO] Installing C:\Users\local\lunaeeworkspace\benchmarktest\pom.xml to C:\Users\local\.m2\repository\org\sample
\benchmarktest\1.0\benchmarktest-1.0.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16.070 s
[INFO] Finished at: 2014-09-15T13:25:03-07:00
[INFO] Final Memory: 22M/221M
[INFO] ------------------------------------------------------------------------
C:\Users\local\lunaeeworkspace\benchmarktest>java -jar target/benchmarks.jar
# VM invoker: C:\Program Files\Java\jre1.8.0_20\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.parallelPiTest
# Parameters: (MAX_COUNT = 1000000)
# Run progress: 0.00% complete, ETA 00:00:40
# Fork: 1 of 1
# Warmup Iteration 1: 2810219990.000 ns/op
# Warmup Iteration 2: 679604930.000 ns/op
# Warmup Iteration 3: 708517299.500 ns/op
# Warmup Iteration 4: 613861141.500 ns/op
# Warmup Iteration 5: 747273386.500 ns/op
Iteration 1: 636085288.500 ns/op
Iteration 2: 726300915.500 ns/op
Iteration 3: 720032270.000 ns/op
Iteration 4: 758523073.500 ns/op
Iteration 5: 776964284.500 ns/op
Result: 723581166.400 ¦(99.9%) 208666306.733 ns/op [Average]
Statistics: (min, avg, max) = (636085288.500, 723581166.400, 776964284.500), stdev = 54189977.210
Confidence interval (99.9%): [514914859.667, 932247473.133]
# VM invoker: C:\Program Files\Java\jre1.8.0_20\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.parallelPiTest
# Parameters: (MAX_COUNT = 10000000)
# Run progress: 25.00% complete, ETA 00:00:52
# Fork: 1 of 1
# Warmup Iteration 1: 9589247518.000 ns/op
# Warmup Iteration 2: 8049867519.000 ns/op
# Warmup Iteration 3: 7864790757.000 ns/op
# Warmup Iteration 4: 7766442122.000 ns/op
# Warmup Iteration 5: 7723210219.000 ns/op
Iteration 1: 7525308107.000 ns/op
Iteration 2: 8067847130.000 ns/op
Iteration 3: 7647547652.000 ns/op
Iteration 4: 6964833740.000 ns/op
Iteration 5: 7471811305.000 ns/op
Result: 7535469586.800 ¦(99.9%) 1523035846.762 ns/op [Average]
Statistics: (min, avg, max) = (6964833740.000, 7535469586.800, 8067847130.000), stdev = 395527572.797
Confidence interval (99.9%): [6012433740.038, 9058505433.562]
# VM invoker: C:\Program Files\Java\jre1.8.0_20\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.sequentialPiTest
# Parameters: (MAX_COUNT = 1000000)
# Run progress: 50.00% complete, ETA 00:01:37
# Fork: 1 of 1
# Warmup Iteration 1: 208653523.167 ns/op
# Warmup Iteration 2: 171440852.571 ns/op
# Warmup Iteration 3: 176369103.714 ns/op
# Warmup Iteration 4: 172637171.571 ns/op
# Warmup Iteration 5: 168770237.714 ns/op
Iteration 1: 171262591.714 ns/op
Iteration 2: 168976818.714 ns/op
Iteration 3: 174889950.143 ns/op
Iteration 4: 171272031.714 ns/op
Iteration 5: 167857761.571 ns/op
Result: 170851830.771 ¦(99.9%) 10391714.091 ns/op [Average]
Statistics: (min, avg, max) = (167857761.571, 170851830.771, 174889950.143), stdev = 2698695.149
Confidence interval (99.9%): [160460116.681, 181243544.862]
# VM invoker: C:\Program Files\Java\jre1.8.0_20\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.sequentialPiTest
# Parameters: (MAX_COUNT = 10000000)
# Run progress: 75.00% complete, ETA 00:00:37
# Fork: 1 of 1
# Warmup Iteration 1: 1898167075.000 ns/op
# Warmup Iteration 2: 1734706264.000 ns/op
# Warmup Iteration 3: 1705265893.000 ns/op
# Warmup Iteration 4: 1704804614.000 ns/op
# Warmup Iteration 5: 1781362794.000 ns/op
Iteration 1: 1725992648.000 ns/op
Iteration 2: 1721125803.000 ns/op
Iteration 3: 1714455544.000 ns/op
Iteration 4: 1719110033.000 ns/op
Iteration 5: 1719564255.000 ns/op
Result: 1720049656.600 ¦(99.9%) 15980153.846 ns/op [Average]
Statistics: (min, avg, max) = (1714455544.000, 1720049656.600, 1725992648.000), stdev = 4149995.207
Confidence interval (99.9%): [1704069502.754, 1736029810.446]
# Run complete. Total time: 00:02:10
Benchmark (MAX_COUNT) Mode Samples Score Score error Units
o.s.MyBenchmark.parallelPiTest 1000000 avgt 5 723581166.400 208666306.733 ns/op
o.s.MyBenchmark.parallelPiTest 10000000 avgt 5 7535469586.800 1523035846.762 ns/op
o.s.MyBenchmark.sequentialPiTest 1000000 avgt 5 170851830.771 10391714.091 ns/op
o.s.MyBenchmark.sequentialPiTest 10000000 avgt 5 1720049656.600 15980153.846 ns/op
答案 0 :(得分:5)
问题在于:Math.random()
。这个方法是synchronized,因此您可以获得并行处理任务的开销+争用的开销,但不会进行并行处理。
如果您尝试使用a ThreadLocalRandom
,则应该会看到性能提升:
ThreadLocalRandom r = ThreadLocalRandom.current();
double x = r.nextDouble(1);
double y = r.nextDouble(1);