Question

我最近开始使用collectAndThen，发现与其他编码程序相比，它需要花费相当长的时间，我用它来执行类似的任务。

这是我的代码：

        System.out.println("CollectingAndThen");
        Long t = System.currentTimeMillis();
        String personWithMaxAge = persons.stream()
                                        .collect(Collectors.collectingAndThen(
                                                                Collectors.maxBy(Comparator.comparing(Person::getAge)),
                                                                (Optional<Person> p) -> p.isPresent() ? p.get().getName() : "none"
                                                ));


        System.out.println("personWithMaxAge - "+personWithMaxAge + " time taken = "+(System.currentTimeMillis() - t));
        Long t2 = System.currentTimeMillis();
        String personWithMaxAge2 = persons.stream().sorted(Comparator.comparing(Person::getAge).reversed())
                                                    .findFirst().get().name;
        System.out.println("personWithMaxAge2 : "+personWithMaxAge2+ " time taken = "+(System.currentTimeMillis() - t2));

这是输出：

CollectingAndThen
personWithMaxAge - Peter time taken = 17
personWithMaxAge2 : Peter time taken = 1

表明收集和收集时间相对较长。

所以我的问题是 - 我应该继续收集和其他建议吗？

Answer 1

collectingAndThen添加一个仅在集合结束时执行的操作。

所以

String personWithMaxAge = persons.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.maxBy(Comparator.comparing(Person::getAge)),
        (Optional<Person> p) -> p.isPresent() ? p.get().getName() : "none"
    ));

与

没有区别

Optional<Person> p = persons.stream()
    .collect(Collectors.maxBy(Comparator.comparing(Person::getAge)));
String personWithMaxAge = p.isPresent() ? p.get().getName() : "none";

在收集器中指定操作的实际优点显示当您使用生成的收集器作为另一个收集器的输入时，例如groupingBy(f1, collectingAndThen(collector, f2))。

由于这是在结束时执行一次的单个微不足道的动作，因此对性能没有影响。此外，对于任何非平凡的输入，基于sorted的解决方案不可能比maxBy操作更快。

您只是使用违反“How do I write a correct micro-benchmark in Java?”中列出的几条规则的破坏基准。最值得注意的是，您正在测量Stream框架第一次使用的初始初始化开销。只是交换两个操作的顺序将给你一个完全不同的结果。

但是，没有必要使操作不必要地复杂化。如上所述，镜像现有Stream操作的收集器的优点是它们可以与其他收集器组合。如果不需要这样的组合，只需使用直接代码

String personWithMaxAge = persons.stream()
    .max(Comparator.comparing(Person::getAge))
    .map(Person::getName).orElse("none");

这比收集器使用更简单，并且比基于sorted的解决方案更有效。

Answer 2

TL; DR;你衡量事情的方式可能已经开始了。

我使用JMH创建了一个更有效的测试性能平台，设置如下（人员列表应该以不同方式初始化，以进一步增强对结果的信心）：

# Run complete. Total time: 00:02:41

Benchmark                        Mode  Cnt        Score       Error  Units
Benchmark.SO.collectingAndThen  thrpt    8  1412304,072 ± 53963,266  ops/s
Benchmark.SO.sortFindFirst      thrpt    8   331214,270 ±  7966,082  ops/s

这项测试的结果是毋庸置疑的（列表中有100人）：

Benchmark                        Mode  Cnt         Score        Error  Units
Benchmark.SO.collectingAndThen  thrpt    8  14529905,529 ± 423196,066  ops/s
Benchmark.SO.sortFindFirst      thrpt    8   7645716,643 ± 538730,614  ops/s

收集和然后快4倍。

如果您将人员名单缩小到5人，则数字会完全改变：

 var frameblock = new THREE.Shape();
frameblock.moveTo(topleft.x - 2*99/Math.SQRT2,topleft.y + 2*99/Math.SQRT2); // move to topleft x: -570 y: 410
frameblock.lineTo(topleft.x - 2*99/Math.SQRT2,bottomleft.y - 99/Math.SQRT2); //draw to bottomleft x: -570 y:-570
frameblock.lineTo(bottomright.x + 2* 99/Math.SQRT2  ,bottomright.y - 99/Math.SQRT2 ); //draw to bottomright x: 1092 y: -570
frameblock.lineTo(topright.x + 99/Math.SQRT2,topright.y + 2*99/Math.SQRT2); // draw to topright x: 1092 y: 410
frameblock.lineTo(topleft.x - 2*99/Math.SQRT2,topleft.y + 2*99/Math.SQRT2); // draw to topleft x: -570 y: 410

var framehole = new THREE.Path();
framehole.moveTo(topleft.x - 99/Math.SQRT2,topleft.y + 99/Math.SQRT2); // move to topleft x: -500 y: 340
framehole.lineTo(topleft.x - 99/Math.SQRT2,bottomleft.y); //draw to bottomleft x: -500 y: -500
framehole.lineTo(bottomright.x + 99/Math.SQRT2  ,bottomright.y); //draw to bottomright x:1022 y:-500
framehole.lineTo(topright.x,topright.y + 99/Math.SQRT2); // draw to topright x:1022 y:340
framehole.lineTo(topleft.x - 99/Math.SQRT2,topleft.y + 99/Math.SQRT2); // draw to topleft x: -500 y: 340



frameblock.holes.push(framehole);

但是收集并且然后仍然快2倍。

我怀疑你的测试已关闭。有许多可能的原因，例如，类加载，JIT编译和其他预热，......

正如@assylias在评论中指出的那样，你应该依靠更精心设计的微基准来测量这种“小”方法的时间，以避免前面提到的副作用。请参阅：How do I write a correct micro-benchmark in Java?

Answer 3

不，从效率的角度来看，collectingAndThen很好。

考虑使用此代码生成随机整数列表：

List<Integer> list =
    new Random().ints(5).boxed().collect(Collectors.toList());

您可以使用以下两种方法从此列表中获取最大值：

    list.stream().collect(Collectors.collectingAndThen(
        Collectors.maxBy(Comparator.naturalOrder()),
        (Optional<Integer> n) -> n.orElse(0)));

和

    list.stream().sorted().findFirst().get();

如果您只是单次执行这两种方法，您可能会得到类似这样的时间（ideone）：

collectAndThen 2.884806
sortFindFirst  1.898522

这些是以毫秒为单位的时间。

但是继续迭代，你会发现时间变化很大。经过100次迭代：

collectAndThen 0.00792
sortFindFirst  0.010873

仍然以毫秒为单位。

因此，如前所述，您只是没有正确对两种方法进行基准测试。

collectAndThen方法是否足够有效？

3 个答案: