Question

我正在尝试收集丢弃很少使用的项目的流，例如：

import java.util.*;
import java.util.function.Function;
import static java.util.stream.Collectors.*;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.containsInAnyOrder;
import org.junit.Test;

@Test
public void shouldFilterCommonlyUsedWords() {
    // given
    List<String> allWords = Arrays.asList(
       "call", "feel", "call", "very", "call", "very", "feel", "very", "any");

    // when
    Set<String> commonlyUsed = allWords.stream()
            .collect(groupingBy(Function.identity(), counting()))
            .entrySet().stream().filter(e -> e.getValue() > 2)
            .map(Map.Entry::getKey).collect(toSet());

    // then
    assertThat(commonlyUsed, containsInAnyOrder("call", "very"));
}

我觉得有可能做得更简单 - 我是对的吗？

Answer 1

除非您希望接受非常高的CPU复杂性，否则无法创建Map。

但是，您可以删除第二个 collect操作：

Map<String,Long> map = allWords.stream()
    .collect(groupingBy(Function.identity(), HashMap::new, counting()));
map.values().removeIf(l -> l<=2);
Set<String> commonlyUsed=map.keySet();

请注意，在Java 8中，HashSet仍然包含HashMap，因此当您需要keySet()时，使用HashMap的{{1}}首先，考虑到当前的实施，不会浪费空间。

当然，如果感觉更“流畅”，您可以隐藏Set中的后期处理：

Collector

Answer 2

前段时间我wrote为我的图书馆提供了实验distinct(atLeast)方法：

public StreamEx<T> distinct(long atLeast) {
    if (atLeast <= 1)
        return distinct();
    AtomicLong nullCount = new AtomicLong();
    ConcurrentHashMap<T, Long> map = new ConcurrentHashMap<>();
    return filter(t -> {
        if (t == null) {
            return nullCount.incrementAndGet() == atLeast;
        }
        return map.merge(t, 1L, (u, v) -> (u + v)) == atLeast;
    });
}

所以我的想法是这样使用它：

Set<String> commonlyUsed = StreamEx.of(allWords).distinct(3).toSet();

这会执行有状态的过滤，看起来有点难看。我怀疑这个功能是否有用，因此我没有将它合并到主分支中。然而，它在单流传递中完成了工作。可能我应该重振它。同时，您可以将此代码复制到静态方法中并使用它：

Set<String> commonlyUsed = distinct(allWords.stream(), 3).collect(Collectors.toSet());

更新（2015/05/31）：我将distinct(atLeast)方法添加到StreamEx 0.3.1。它是使用custom spliterator实现的。基准测试显示，对于顺序流，此实现比上述有状态过滤快得多，并且在许多情况下，它也比本主题中提出的其他解决方案更快。如果在流中遇到null（groupingBy收集器不支持null作为类，它也可以正常工作，因此基于groupingBy的解决方案将失败{{1遇到）。

Answer 3

我个人更喜欢Holger的解决方案（+1），但是，我不会从groupingBy地图中删除元素，而是过滤其entrySet和 map 结果到终结器中的一个Set（感觉更加流动给我）

    Set<String> commonlyUsed = allWords.stream().collect(
            collectingAndThen(
                groupingBy(identity(), counting()), 
                (map) -> map.entrySet().stream().
                            filter(e -> e.getValue() > 2).
                            map(e -> e.getKey()).
                            collect(Collectors.toSet())));

使用分组，计数和过滤操作收集流

3 个答案: