Question

我确实有像描述here这样的类似问题。但是首先我有两个不同之处，我确实使用了流api，第二个我已经有了equals()和hashCode()方法。但是在流中，博客的平等性在此上下文中与Blog类中定义的不同。

Collection<Blog> elements = x.stream()
    ... // a lot of filter and map stuff
    .peek(p -> sysout(p)) // a stream of Blog
    .? // how to remove duplicates - .distinct() doesn't work

我有一个具有相等方法的类，可以使用方法

调用它ContextBlogEqual

public boolean equal(Blog a, Blog b);

有没有办法根据ContextBlogEqual#equal方法使用我当前的流方法删除所有重复的条目？

我认为已经在分组，但这也不起作用，因为blogA和blogB相等的原因不仅仅是一个参数。此外，我不知道如何使用.reduce（..），因为还有一个以上的元素。

Answer 1

实质上，您必须定义hashCode以使您的数据使用哈希表，或者使用总顺序使其与二叉搜索树一起使用。

对于哈希表，您需要声明一个覆盖类，该类将覆盖equals和hashCode。

对于二叉树，您可以定义Comparator<Blog>，它尊重您的等式定义，并添加任意但一致的排序标准。然后你可以收集到new TreeSet<Blog>(yourComparator)。

Answer 2

首先，请注意equal(Blog, Blog)方法对于大多数情况来说还不够，因为您需要成对比较所有效率不高的条目。最好定义从博客条目中提取新密钥的功能。例如，让我们考虑以下Blog类：

static class Blog {
    final String name;
    final int id;
    final long time;

    public Blog(String name, int id, long time) {
        this.name = name;
        this.id = id;
        this.time = time;
    }

    @Override
    public int hashCode() {
        return Objects.hash(name, id, time);
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null || getClass() != obj.getClass())
            return false;
        Blog other = (Blog) obj;
        return id == other.id && time == other.time && Objects.equals(name, other.name);
    }

    public String toString() {
        return name+":"+id+":"+time;
    }
}

我们有一些测试数据：

List<Blog> blogs = Arrays.asList(new Blog("foo", 1, 1234), 
        new Blog("bar", 2, 1345), new Blog("foo", 1, 1345), 
        new Blog("bar", 2, 1345));
List<Blog> distinctBlogs = blogs.stream().distinct().collect(Collectors.toList());
System.out.println(distinctBlogs);

此处distinctBlogs包含三个条目：[foo:1:1234, bar:2:1345, foo:1:1345]。假设它是不受欢迎的，因为我们不想比较time字段。创建新密钥的最简单方法是使用Arrays.asList：

Function<Blog, Object> keyExtractor = b -> Arrays.asList(b.name, b.id);

生成的密钥已经具有正确的equals和hashCode实现。

现在，如果您对终端操作不满意，可以创建一个这样的自定义收集器：

List<Blog> distinctByNameId = blogs.stream().collect(
        Collectors.collectingAndThen(Collectors.toMap(
                keyExtractor, Function.identity(), 
                (a, b) -> a, LinkedHashMap::new),
                map -> new ArrayList<>(map.values())));
System.out.println(distinctByNameId);

这里我们使用keyExtractor来生成密钥，合并函数是(a, b) -> a，这意味着当重复键出现时选择先前添加的条目。我们使用LinkedHashMap来保留订单（如果您不关心订单，请省略此参数）。最后，我们将地图值转储到新的ArrayList中。您可以将此类收集器创建移动到单独的方法并对其进行概括：

public static <T> Collector<T, ?, List<T>> distinctBy(
        Function<? super T, ?> keyExtractor) {
    return Collectors.collectingAndThen(
        Collectors.toMap(keyExtractor, Function.identity(), (a, b) -> a, LinkedHashMap::new),
        map -> new ArrayList<>(map.values()));
}

这样使用会更简单：

List<Blog> distinctByNameId = blogs.stream()
           .collect(distinctBy(b -> Arrays.asList(b.name, b.id)));

Answer 3

基本上，你需要一个像这样的辅助方法：

static <T, U> Stream<T> distinct(
    Stream<T> stream, 
    Function<? super T, ? extends U> keyExtractor
) {
    final Map<U, String> seen = new ConcurrentHashMap<>();
    return stream.filter(t -> seen.put(keyExtractor.apply(t), "") == null);
}

它需要Stream，并返回一个新的Stream，其中只包含keyExtractor的不同值。一个例子：

class O {
    final int i;
    O(int i) {
        this.i = i;
    }
    @Override
    public String toString() {
        return "O(" + i + ")";
    }
}

distinct(Stream.of(new O(1), new O(1), new O(2)), o -> o.i)
    .forEach(System.out::println);

这会产生

O(1)
O(2)

声明

由Tagir Valeev here和in this similar answer by Stuart Marks评论，这种方法存在缺陷。这里实施的操作......

对于有序并行流不稳定
对顺序流不是最佳的
违反Stream.filter()

将上述内容包装在您自己的库中

您当然可以使用自己的功能扩展Stream，并在其中实施新的distinct()功能，例如例如jOOλ或Javaslang：

Seq.of(new O(1), new O(1), new O(2))
   .distinct(o -> o.i)
   .forEach(System.out::println);

如何根据自己的Equal类消除流中的重复条目

3 个答案:

声明

将上述内容包装在您自己的库中