python查找具有共享元素的集合

时间:2017-02-10 13:29:51

标签: python duplicates set

我的数据是一组冷冻集,例如

data = set([frozenset([1,2,3,4]), frozenset([3,4,5,6,7,8]), frozenset([100,200]), frozenset([1,1000, 2000])])

,预期结果是具有重复元素的冻结集,即

result = set([frozenset([1,2,3,4]), frozenset([3,4,5,6,7,8]),  frozenset([1,1000, 2000])])

此处frozenset([100,200])已删除,因为它不与其他frozensets共享任何元素。什么是实现这一目标的有效方法?

3 个答案:

答案 0 :(得分:2)

您可以构建dict个设置元素来计算他们找到的次数,然后删除所有元素的计数为1的frozenset collections.Counter。{{1}这会很方便。

这样做的好处是O(n)其中n是所有集合中元素的总数。

from collections import Counter

data = set([frozenset([1,2,3,4]), frozenset([3,4,5,6,7,8]), frozenset([100,200]), frozenset([1,1000, 2000])])
counts = Counter(elt for fs in data for elt in fs)
result = {fs for fs in data if any(counts[elt] > 1 for elt in fs)}

# {frozenset({1, 2, 3, 4}), frozenset({1000, 1, 2000}), frozenset({3, 4, 5, 6, 7, 8})}

答案 1 :(得分:1)

我使用类似的支票进行集合理解(对于每个项目,检查它是否具有至少一个其他元素的共同元素):

data = set([frozenset([1,2,3,4]), frozenset([3,4,5,6,7,8]), frozenset([100,200]), frozenset([1,1000, 2000])])

new_data = {x for x in data if any(not x.isdisjoint(y) for y in data if y!=x)}

print(new_data)

结果:

{frozenset({1, 2, 3, 4}), frozenset({3, 4, 5, 6, 7, 8}), frozenset({1000, 1, 2000})}

可能有更高效的解决方案,但至少disjoint部分由高效set例程处理

答案 2 :(得分:0)

这是我的版本,它没有任何特别的优势,但您可能会发现它更具可读性。

data = set([frozenset([1,2,3,4]), frozenset([3,4,5,6,7,8]), frozenset([100,200]), frozenset([1,1000, 2000])])
result = set()

for item in data:
    for element in item:
        for other_item in data:
            if item != other_item and item not in result:
                if element in other_item:
                    result.add(item)
                    break
>>>print(result)
>>>{frozenset({1, 2, 3, 4}), frozenset({1000, 1, 2000}), frozenset({3, 4, 5, 6, 7, 8})}