我的数据是一组冷冻集,例如
data = set([frozenset([1,2,3,4]), frozenset([3,4,5,6,7,8]), frozenset([100,200]), frozenset([1,1000, 2000])])
,预期结果是具有重复元素的冻结集,即
result = set([frozenset([1,2,3,4]), frozenset([3,4,5,6,7,8]), frozenset([1,1000, 2000])])
此处frozenset([100,200])
已删除,因为它不与其他frozensets共享任何元素。什么是实现这一目标的有效方法?
答案 0 :(得分:2)
您可以构建dict
个设置元素来计算他们找到的次数,然后删除所有元素的计数为1的frozenset
collections.Counter
。{{1}这会很方便。
这样做的好处是O(n)
其中n
是所有集合中元素的总数。
from collections import Counter
data = set([frozenset([1,2,3,4]), frozenset([3,4,5,6,7,8]), frozenset([100,200]), frozenset([1,1000, 2000])])
counts = Counter(elt for fs in data for elt in fs)
result = {fs for fs in data if any(counts[elt] > 1 for elt in fs)}
# {frozenset({1, 2, 3, 4}), frozenset({1000, 1, 2000}), frozenset({3, 4, 5, 6, 7, 8})}
答案 1 :(得分:1)
我使用类似的支票进行集合理解(对于每个项目,检查它是否具有至少一个其他元素的共同元素):
data = set([frozenset([1,2,3,4]), frozenset([3,4,5,6,7,8]), frozenset([100,200]), frozenset([1,1000, 2000])])
new_data = {x for x in data if any(not x.isdisjoint(y) for y in data if y!=x)}
print(new_data)
结果:
{frozenset({1, 2, 3, 4}), frozenset({3, 4, 5, 6, 7, 8}), frozenset({1000, 1, 2000})}
可能有更高效的解决方案,但至少disjoint
部分由高效set
例程处理
答案 2 :(得分:0)
这是我的版本,它没有任何特别的优势,但您可能会发现它更具可读性。
data = set([frozenset([1,2,3,4]), frozenset([3,4,5,6,7,8]), frozenset([100,200]), frozenset([1,1000, 2000])])
result = set()
for item in data:
for element in item:
for other_item in data:
if item != other_item and item not in result:
if element in other_item:
result.add(item)
break
>>>print(result)
>>>{frozenset({1, 2, 3, 4}), frozenset({1000, 1, 2000}), frozenset({3, 4, 5, 6, 7, 8})}