Question

我正在寻找一种优雅的方法来仅过滤唯一元素的列表。一个例子：

   [1, 2, 2, 3, 1, 4]
-> [3, 4] // 1 and 2 occur more than once

我发现大多数解决方案都是手动计算所有元素的出现次数，然后按恰好出现一次的元素进行过滤。

这对我来说听起来不太优雅，也许有更好的解决方案，最佳实践或已经解决了这一问题的数据结构的名称？我也在考虑也许利用流，但是我不知道如何。

请注意，我并不是要删除重复项，即[1, 2, 3, 4]，而是只保留唯一元素，因此[3, 4]。

对我来说，结果列表的顺序或Collection的确切类型无关紧要。

Answer 1

我怀疑有比实际计数和过滤只出现一次的方法更好的方法。至少，我能想到的所有方法都将使用类似的。

还不清楚您所说的优雅，可读性或性能是什么意思？因此，我将转储一些方法。

`Stream`计数

这里是一个流变量，它计算出现次数（Map），然后过滤仅出现一次的元素。它与您已经描述的内容或Bag在幕后所做的基本上相同：

List<E> result = elements.stream() // Stream<E>
    .collect(Collectors.groupingBy(Function.identity(), Collectors.counting())) // Map<E, Long>
    .entries() // Set<Entry<E, Long>>
    .stream()  // Stream<Entry<E, Long>>
    .filter(entry -> entry.getValue() == 1)
    .map(Entry::getKey)
    .collect(Collectors.toList());

它需要对数据集进行两次完整的迭代。由于它使用Stream-API，因此操作从一开始就支持多线程。因此，如果您有很多元素，那么这样做可能会很快。

手册`Set`

这是另一种方法，它通过手动收集到Set中以尽可能快地识别重复项来减少迭代和查找时间：

Set<E> result = new HashSet<>();
Set<E> appeared = new HashSet<>();

for (E element : elements) {
    if (result.contains(element)) { // 2nd occurrence
        result.remove(element);
        appeared.add(element);
        continue;
    }
    if (appeared.contains(element)) { // >2nd occurrence
        continue;
    }

    result.add(element); // 1st occurrence
}

如您所见，这只需要在elements上进行一次迭代，而不是多次。

从某种意义上说，这种方法很优雅，因为它不会计算不必要的信息。对于您想要的，计算元素出现的确切频率完全无关紧要。我们只关心“它出现一次或多次吗？” ，而不是出现5次或11次。

Answer 2

您可以使用Bag对事件进行计数（getCount(1)用于唯一性计数）

袋是一个集合，可以存储多个项目及其重复计数：

public void whenAdded_thenCountIsKept() {
   Bag<Integer> bag = new HashBag<>(
   Arrays.asList(1, 2, 3, 3, 3, 1, 4));         
   assertThat(2, equalTo(bag.getCount(1)));
}

或CollectionBag

Apache Collections的库提供了一个称为CollectionBag的装饰器。我们可以使用它来使我们的bag collection符合Java Collection合同：

并获得unique set：

bag.uniqueSet();

返回包中的一组唯一元素。

Answer 3

首先需要收集所有元素，最后删除超过1个元素的组。

Map<String, Long> map = Stream.of("a", "b", "a", "a", "c", "d", "c")
            .collect(Collectors.groupingBy(Function.identity(), 
                     Collectors.counting()));
map.entrySet()
    .stream()
    .filter(e -> e.getValue() == 1L)
    .map(e -> e.getKey())
    .forEach(System.out::println);

或一口气：

        Stream.of("a", "b", "a", "a", "c", "d", "c")
                .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
                .entrySet()
                .stream()
                .filter(e -> e.getValue() == 1L)
                .map(e -> e.getKey())
                .forEach(System.out::println);

Answer 4

使用映射来累加频率计数的想法听起来不错：它运行在大致线性（O（ n ））时间内，只需要O（ n ）多余的空间。

这是一种需要 零个额外空间的算法，但要花费O（ n ^ 2 ）时间：

public static <T> void retainSingletons(List<T> list)
{
    int i = 0;
    while (i < list.size()) {
        boolean foundDup = false;
        int j = i + 1;
        while (j < list.size()) {
            if (list.get(i).equals(list.get(j))) {
                list.remove(j);
                foundDup = true;
            } else {
                ++j;
            }
        }
        if (foundDup) {
            list.remove(i);
        } else {
            ++i;
        }
    }
}

这个想法很简单：在列表上移动一个缓慢的指针i，直到结束为止。对于i的每个值，从j运行一个快速指针i+1到列表末尾，删除与list[j]重复的任何list[i]； j用完后，如果找到并删除了list[i]的任何重复项，则也删除list[i]。

Answer 5

以下将适用于Eclipse Collections：

IntList list = IntLists.mutable.with(1, 2, 2, 3, 1, 4);
IntSet unique = list.toBag().selectUnique();
System.out.println(unique);

使用IntList无需对int值和Integer对象进行包装。

注意：我是Eclipse Collections的提交者。

唯一元素的过滤列表

5 个答案:

`Stream`计数

手册`Set`

唯一元素的过滤列表

5 个答案:

Stream计数

手册Set

`Stream`计数

手册`Set`