与串行相比,Java 8 Stream并行性能和CPU资源消耗似乎非常差

时间:2014-09-16 18:33:54

标签: java parallel-processing java-8 java-stream microbenchmark

在尝试JDK 8 Streaming功能时,我决定尝试并行/串行流性能测试。我尝试使用在单位正方形上投掷随机飞镖来解决pi的值,并检查单位圆内有多少次着陆。我找到了apache-spark的例子。

这是代码。

    package org.sample;

    import java.util.concurrent.TimeUnit;
    import java.util.stream.IntStream;

    import org.openjdk.jmh.annotations.Benchmark;
    import org.openjdk.jmh.annotations.BenchmarkMode;
    import org.openjdk.jmh.annotations.Fork;
    import org.openjdk.jmh.annotations.Measurement;
    import org.openjdk.jmh.annotations.Mode;
    import org.openjdk.jmh.annotations.OutputTimeUnit;
    import org.openjdk.jmh.annotations.Param;
    import org.openjdk.jmh.annotations.Scope;
    import org.openjdk.jmh.annotations.State;
    import org.openjdk.jmh.annotations.Warmup;

    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
    @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
    @Fork(1)
    @State(Scope.Benchmark)
    public class MyBenchmark {

        @Param({
            "1000000",
            "10000000"
        }) int MAX_COUNT;

        @Benchmark
        public double parallelPiTest() {
            long count = IntStream.range(1, MAX_COUNT).parallel().filter(i -> {
                double x= Math.random(); 
                double y= Math.random(); 
                return (x*x + y* y) < 1.0 ;
            }).count();
            double pi = 4 * count * 1.0 /MAX_COUNT;
            return pi;

        }

        @Benchmark
        public double sequentialPiTest() {
            long count = IntStream.range(1, MAX_COUNT).filter(i -> {
                double x= Math.random(); 
                double y= Math.random(); 
                return (x*x + y* y) < 1.0 ;
            }).count();
            double pi = 4 * count * 1.0 /MAX_COUNT;
            return pi;
        }

在我的8核计算机(Windows 7笔记本电脑)上进行简单测试时,并行执行的时间几乎是串行的5倍,所有核心的CPU利用率几乎都是100%。另一方面,串行使用大约20%的一个核心!由于混淆了结果,我尝试使用JMH(上面的代码)和JunitBenchmarks进行基准测试。结果几乎与串行执行一致,总是比并行执行好5倍。我也尝试了100次迭代,但结果仍然类似于下面的5次迭代。我在这里遗漏了一些基本的东西吗?

JMH基准测试结果:

    C:\Users\local\lunaeeworkspace\benchmarktest>mvn clean install
    "******::" C:\Progra~1\Java\jdk1.8.0_20
    [INFO] Scanning for projects...
    [INFO]
    [INFO] ------------------------------------------------------------------------
    [INFO] Building Auto-generated JMH benchmark 1.0
    [INFO] ------------------------------------------------------------------------
    [INFO]
    [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ benchmarktest ---
    [INFO] Deleting C:\Users\local\lunaeeworkspace\benchmarktest\target
    [INFO]
    [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ benchmarktest ---
    [INFO] Using 'UTF-8' encoding to copy filtered resources.
    [INFO] skip non existing resourceDirectory C:\Users\local\lunaeeworkspace\benchmarktest\src\main\resources
    [INFO]
    [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ benchmarktest ---
    [INFO] Changes detected - recompiling the module!
    [INFO] Compiling 1 source file to C:\Users\local\lunaeeworkspace\benchmarktest\target\classes
    [INFO]
    [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ benchmarktest ---
    [INFO] Using 'UTF-8' encoding to copy filtered resources.
    [INFO] skip non existing resourceDirectory C:\Users\local\lunaeeworkspace\benchmarktest\src\test\resources
    [INFO]
    [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ benchmarktest ---
    [INFO] No sources to compile
    [INFO]
    [INFO] --- maven-surefire-plugin:2.17:test (default-test) @ benchmarktest ---
    [INFO] No tests to run.
    [INFO]
    [INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ benchmarktest ---
    [INFO] Building jar: C:\Users\local\lunaeeworkspace\benchmarktest\target\benchmarktest-1.0.jar
    [INFO]
    [INFO] --- maven-shade-plugin:2.2:shade (default) @ benchmarktest ---
    [INFO] Including org.openjdk.jmh:jmh-core:jar:1.1 in the shaded jar.
    [INFO] Including net.sf.jopt-simple:jopt-simple:jar:4.6 in the shaded jar.
    [INFO] Including org.apache.commons:commons-math3:jar:3.2 in the shaded jar.
    [INFO] Replacing C:\Users\local\lunaeeworkspace\benchmarktest\target\benchmarks.jar with C:\Users\local\lunaeework
    space\benchmarktest\target\benchmarktest-1.0-shaded.jar
    [INFO]
    [INFO] --- maven-install-plugin:2.5.1:install (default-install) @ benchmarktest ---
    [INFO] Installing C:\Users\local\lunaeeworkspace\benchmarktest\target\benchmarktest-1.0.jar to C:\Users\local\.m2\
    repository\org\sample\benchmarktest\1.0\benchmarktest-1.0.jar
    [INFO] Installing C:\Users\local\lunaeeworkspace\benchmarktest\pom.xml to C:\Users\local\.m2\repository\org\sample
    \benchmarktest\1.0\benchmarktest-1.0.pom
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 16.070 s
    [INFO] Finished at: 2014-09-15T13:25:03-07:00
    [INFO] Final Memory: 22M/221M
    [INFO] ------------------------------------------------------------------------
    C:\Users\local\lunaeeworkspace\benchmarktest>java -jar target/benchmarks.jar
    # VM invoker: C:\Program Files\Java\jre1.8.0_20\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1 s each
    # Measurement: 5 iterations, 1 s each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: org.sample.MyBenchmark.parallelPiTest
    # Parameters: (MAX_COUNT = 1000000)

    # Run progress: 0.00% complete, ETA 00:00:40
    # Fork: 1 of 1
    # Warmup Iteration   1: 2810219990.000 ns/op
    # Warmup Iteration   2: 679604930.000 ns/op
    # Warmup Iteration   3: 708517299.500 ns/op
    # Warmup Iteration   4: 613861141.500 ns/op
    # Warmup Iteration   5: 747273386.500 ns/op
    Iteration   1: 636085288.500 ns/op
    Iteration   2: 726300915.500 ns/op
    Iteration   3: 720032270.000 ns/op
    Iteration   4: 758523073.500 ns/op
    Iteration   5: 776964284.500 ns/op


    Result: 723581166.400 ¦(99.9%) 208666306.733 ns/op [Average]
      Statistics: (min, avg, max) = (636085288.500, 723581166.400, 776964284.500), stdev = 54189977.210
      Confidence interval (99.9%): [514914859.667, 932247473.133]


    # VM invoker: C:\Program Files\Java\jre1.8.0_20\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1 s each
    # Measurement: 5 iterations, 1 s each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: org.sample.MyBenchmark.parallelPiTest
    # Parameters: (MAX_COUNT = 10000000)

    # Run progress: 25.00% complete, ETA 00:00:52
    # Fork: 1 of 1
    # Warmup Iteration   1: 9589247518.000 ns/op
    # Warmup Iteration   2: 8049867519.000 ns/op
    # Warmup Iteration   3: 7864790757.000 ns/op
    # Warmup Iteration   4: 7766442122.000 ns/op
    # Warmup Iteration   5: 7723210219.000 ns/op
    Iteration   1: 7525308107.000 ns/op
    Iteration   2: 8067847130.000 ns/op
    Iteration   3: 7647547652.000 ns/op
    Iteration   4: 6964833740.000 ns/op
    Iteration   5: 7471811305.000 ns/op


    Result: 7535469586.800 ¦(99.9%) 1523035846.762 ns/op [Average]
      Statistics: (min, avg, max) = (6964833740.000, 7535469586.800, 8067847130.000), stdev = 395527572.797
      Confidence interval (99.9%): [6012433740.038, 9058505433.562]


    # VM invoker: C:\Program Files\Java\jre1.8.0_20\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1 s each
    # Measurement: 5 iterations, 1 s each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: org.sample.MyBenchmark.sequentialPiTest
    # Parameters: (MAX_COUNT = 1000000)

    # Run progress: 50.00% complete, ETA 00:01:37
    # Fork: 1 of 1
    # Warmup Iteration   1: 208653523.167 ns/op
    # Warmup Iteration   2: 171440852.571 ns/op
    # Warmup Iteration   3: 176369103.714 ns/op
    # Warmup Iteration   4: 172637171.571 ns/op
    # Warmup Iteration   5: 168770237.714 ns/op
    Iteration   1: 171262591.714 ns/op
    Iteration   2: 168976818.714 ns/op
    Iteration   3: 174889950.143 ns/op
    Iteration   4: 171272031.714 ns/op
    Iteration   5: 167857761.571 ns/op


    Result: 170851830.771 ¦(99.9%) 10391714.091 ns/op [Average]
      Statistics: (min, avg, max) = (167857761.571, 170851830.771, 174889950.143), stdev = 2698695.149
      Confidence interval (99.9%): [160460116.681, 181243544.862]


    # VM invoker: C:\Program Files\Java\jre1.8.0_20\bin\java.exe
    # VM options: <none>
    # Warmup: 5 iterations, 1 s each
    # Measurement: 5 iterations, 1 s each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: org.sample.MyBenchmark.sequentialPiTest
    # Parameters: (MAX_COUNT = 10000000)

    # Run progress: 75.00% complete, ETA 00:00:37
    # Fork: 1 of 1
    # Warmup Iteration   1: 1898167075.000 ns/op
    # Warmup Iteration   2: 1734706264.000 ns/op
    # Warmup Iteration   3: 1705265893.000 ns/op
    # Warmup Iteration   4: 1704804614.000 ns/op
    # Warmup Iteration   5: 1781362794.000 ns/op
    Iteration   1: 1725992648.000 ns/op
    Iteration   2: 1721125803.000 ns/op
    Iteration   3: 1714455544.000 ns/op
    Iteration   4: 1719110033.000 ns/op
    Iteration   5: 1719564255.000 ns/op


    Result: 1720049656.600 ¦(99.9%) 15980153.846 ns/op [Average]
      Statistics: (min, avg, max) = (1714455544.000, 1720049656.600, 1725992648.000), stdev = 4149995.207
      Confidence interval (99.9%): [1704069502.754, 1736029810.446]


    # Run complete. Total time: 00:02:10

    Benchmark                           (MAX_COUNT)  Mode  Samples           Score     Score error  Units
    o.s.MyBenchmark.parallelPiTest          1000000  avgt        5   723581166.400   208666306.733  ns/op
    o.s.MyBenchmark.parallelPiTest         10000000  avgt        5  7535469586.800  1523035846.762  ns/op
    o.s.MyBenchmark.sequentialPiTest        1000000  avgt        5   170851830.771    10391714.091  ns/op
    o.s.MyBenchmark.sequentialPiTest       10000000  avgt        5  1720049656.600    15980153.846  ns/op

1 个答案:

答案 0 :(得分:5)

问题在于:Math.random()。这个方法是synchronized,因此您可以获得并行处理任务的开销+争用的开销,但不会进行并行处理。

如果您尝试使用a ThreadLocalRandom,则应该会看到性能提升:

ThreadLocalRandom r = ThreadLocalRandom.current();
double x = r.nextDouble(1);
double y = r.nextDouble(1);