Question

我只是在评估，哪些代码片段在java 8中表现得更好。

Snippet 1 （在主线程中处理）：

public long doSequence() {
    DoubleStream ds = IntStream.range(0, 100000).asDoubleStream();
    long startTime = System.currentTimeMillis();
    final AtomicLong al = new AtomicLong();
    ds.forEach((num) -> {
        long n1 = new Double (Math.pow(num, 3)).longValue();
        long n2 = new Double (Math.pow(num, 2)).longValue();
        al.addAndGet(n1 + n2);
    });
    System.out.println("Sequence");
    System.out.println(al.get());
    long endTime = System.currentTimeMillis();
    return (endTime - startTime);
}

Snippet 2 （并行线程处理）：

public long doParallel() {
    long startTime = System.currentTimeMillis();
    final AtomicLong al = new AtomicLong();
    DoubleStream ds = IntStream.range(0, 100000).asDoubleStream();
    ds.parallel().forEach((num) -> {
        long n1 = new Double (Math.pow(num, 3)).longValue();
        long n2 = new Double (Math.pow(num, 2)).longValue();
        al.addAndGet(n1 + n2);
    });
    System.out.println("Parallel");
    System.out.println(al.get());
    long endTime = System.currentTimeMillis();
    return (endTime - startTime);
}

代码段3 （从线程池并行处理线程）：

public long doThreadPoolParallel() throws InterruptedException, ExecutionException {
    ForkJoinPool customThreadPool = new ForkJoinPool(4);
    DoubleStream ds = IntStream.range(0, 100000).asDoubleStream();
    long startTime = System.currentTimeMillis();
    final AtomicLong al = new AtomicLong();
    customThreadPool.submit(() -> ds.parallel().forEach((num) -> {
        long n1 = new Double (Math.pow(num, 3)).longValue();
        long n2 = new Double (Math.pow(num, 2)).longValue();
        al.addAndGet(n1 + n2);
    })).get();
    System.out.println("Thread Pool");
    System.out.println(al.get());
    long endTime = System.currentTimeMillis();
    return (endTime - startTime);
}

输出在这里：

Parallel
6553089257123798384
34 <--34 milli seconds

Thread Pool
6553089257123798384
23 <--23 milli seconds

Sequence
6553089257123798384
12 <--12 milli seconds!

我的期望

1）使用线程池进行处理的时间应该是最小的，但不是真的。（注意我没有包含线程池创建时间，所以它应该很快）< / em>的

2）从不期望按顺序运行的代码是最快的，应该是什么原因。

我正在使用四核处理器。

感谢任何帮助解释上述歧义！

Answer 1

你的比较并不完美，当然是因为缺乏VM热身。当我简单地重复执行时，我会得到不同的结果：

System.out.println(doParallel());
System.out.println(doThreadPoolParallel());
System.out.println(doSequence());
System.out.println("-------");
System.out.println(doParallel());
System.out.println(doThreadPoolParallel());
System.out.println(doSequence());
System.out.println("-------");
System.out.println(doParallel());
System.out.println(doThreadPoolParallel());
System.out.println(doSequence());

结果：

Parallel
6553089257123798384
65
Thread Pool
6553089257123798384
13
Sequence
6553089257123798384
14
-------
Parallel
6553089257123798384
9
Thread Pool
6553089257123798384
4
Sequence
6553089257123798384
8
-------
Parallel
6553089257123798384
8
Thread Pool
6553089257123798384
3
Sequence
6553089257123798384
8

正如@Erwin在评论中指出的那样，请查看this question（本例中的规则1）的答案，以获取有关如何正确执行此基准测试的建议。

并行流的默认并行性不一定与fork-join池提供的并行性相同，其中线程与计算机上的核心一样多，但当我从您切换时，结果之间的差异仍然可以忽略不计自定义池到公共fork连接池。

Answer 2

AtomicLong.addAndGet需要线程同步 - 每个线程都必须查看前一个addAndGet的结果 - 您可以指望总数是正确的。

虽然这不是传统的synchronized同步，但仍然存在开销。在JDK7中，addAndGet在Java代码中使用了自旋锁。在JDK8中，它被转换为内在函数，然后由HotSpot在英特尔平台上发出的LOCK:XADD指令实现。

它需要CPU之间的缓存同步，这会产生开销。它甚至可能需要刷新内存并从主内存中读取，与不需要执行此操作的代码相比，这非常慢。

很可能，因为在测试中每次迭代都会发生这种同步开销，所以开销大于并行化带来的任何性能提升。

参考文献：

并行流处理与线程池处理Vs顺序处理

2 个答案: