Question

假设你有一个像这样的方法来计算某些Collection的{{1}}的最大值：

ToIntFunction

使用Java 8，可以将其转换为

static <T> void foo1(Collection<? extends T> collection, ToIntFunction<? super T> function) {
    if (collection.isEmpty())
        throw new NoSuchElementException();
    int max = Integer.MIN_VALUE;
    T maxT = null;
    for (T t : collection) {
        int result = function.applyAsInt(t);
        if (result >= max) {
            max = result;
            maxT = t;
        }
    }
    // do something with maxT
}

新版本的一个缺点是，对于static <T> void foo2(Collection<? extends T> collection, ToIntFunction<? super T> function) { T maxT = collection.stream() .max(Comparator.comparingInt(function)) .get(); // do something with maxT }的相同值，会重复调用function.applyAsInt。（特别是如果集合的大小为T，n会调用foo1 applyAsInt次，而n会调用foo2次。

第一种方法的缺点是代码不太清晰，您无法修改它以使用并行性。

假设您希望使用并行流和执行此操作，则每个元素只调用2n - 2一次。这可以用简单的方式写出来吗？

Answer 1

您可以使用自定义收集器来保持运行最大值和最大元素对：

static <T> void foo3(Collection<? extends T> collection, ToIntFunction<? super T> function) {
    class Pair {
        int max = Integer.MIN_VALUE;
        T maxT = null;
    }
    T maxT = collection.stream().collect(Collector.of(
        Pair::new,
        (p, t) -> {
            int result = function.applyAsInt(t);
            if (result >= p.max) {
                p.max = result;
                p.maxT = t;
            }
        }, 
        (p1, p2) -> p2.max > p1.max ? p2 : p1,
        p -> p.maxT
    ));
    // do something with maxT
}

一个优点是，这会创建一个在整个收集过程中使用的单个Pair中间对象。每次接受元素时，都会使用新的最大值更新此持有者。整理器操作只返回最大元素并且不考虑最大值。

Answer 2

正如我在评论中所述，我建议引入一个中间数据结构，如：

static <T> void foo2(Collection<? extends T> collection, ToIntFunction<? super T> function) {
  if (collection.isEmpty()) {
    throw new IllegalArgumentException();
  }
  class Pair {
    final T value;
    final int result;

    public Pair(T value, int result) {
      this.value = value;
      this.result = result;
    }

    public T getValue() {
      return value;
    }

    public int getResult() {
      return result;
    }
  }
  T maxT = collection.stream().map(t -> new Pair(t, function.applyAsInt(t)))
                     .max(Comparator.comparingInt(Pair::getResult)).get().getValue();
  // do something with maxT
}

Answer 3

另一种方法是使用memoized version of function：

static <T> void foo2(Collection<? extends T> collection, 
    ToIntFunction<? super T> function, T defaultValue) {

    T maxT = collection.parallelStream()
        .max(Comparator.comparingInt(ToIntMemoizer.memoize(function)))
        .orElse(defaultValue);

    // do something with maxT

}

ToIntMemoizer.memoize(function)代码如下：

public class ToIntMemoizer<T> {

    private final Map<T, Integer> cache = new ConcurrentHashMap<>();

    private ToIntMemoizer() {
    }

    private ToIntFunction<T> doMemoize(ToIntFunction<T> function) {
        return input -> cache.computeIfAbsent(input, function::apply);
    }

    public static <T> ToIntFunction<T> memoize(ToIntFunction<T> function) {
        return new ToIntMemoizer<T>().doMemoize(function);
    }
}

这使用ConcurrentHashMap来缓存已计算的结果。如果您不需要支持并行性，则可以完美地使用HashMap。

一个缺点是函数的结果需要装箱/取消装箱。另一方面，当函数被记忆时，对于集合的每个重复元素仅计算一次结果。然后，如果使用重复的输入值调用该函数，则结果将从缓存中返回。

Answer 4

如果您不介意使用第三方库，我的StreamEx会使用maxByInt等特殊方法优化所有这些情况。所以你可以简单地使用：

static <T> void foo3(Collection<? extends T> collection, ToIntFunction<? super T> function) {
    T maxT = StreamEx.of(collection).parallel()
                       .maxByInt(function)
                       .get();
    // do something with maxT
}

implementation使用带有可变容器的reduce。这可能会稍微滥用API，但对于顺序和并行流工作正常，并且与collect解决方案不同，将容器分配推迟到第一个累积元素（因此，如果并行子任务不包含经常出现的元素，则不会分配容器上游有过滤操作。）

如何在应用某些函数后有效地计算集合的最大值

4 个答案: