Question

我想在python中计算有限整数集合（这里实现为列表列表）的所有（不同）交集（为了避免混淆，正式定义在问题的最后）：< / p>

> A = [[0,1,2,3],[0,1,4],[1,2,4],[2,3,4],[0,3,4]]
> all_intersections(A) # desired output
[[], [0], [1], [2], [3], [4], [0, 1], [0, 3], [0, 4], [1, 2], [1, 4], [2, 3], [2, 4], [3, 4], [0, 1, 4], [0, 3, 4], [1, 2, 4], [2, 3, 4], [0, 1, 2, 3]]

我有一个算法可以迭代地执行它，但它相当慢（我应该发布它吗？），测试用例将是

[[0, 1, 2, 3, 4, 9], [0, 1, 4, 5, 6, 10], [0, 2, 4, 5, 7, 11], [1, 3, 4, 6, 8, 12], [2, 3, 4, 7, 8, 13], [4, 5, 6, 7, 8, 14], [0, 1, 9, 10, 15, 16], [0, 2, 9, 11, 15, 17], [1, 3, 9, 12, 16, 18], [2, 3, 9, 13, 17, 18], [9, 15, 16, 17, 18, 19], [0, 5, 10, 11, 15, 20], [1, 6, 10, 12, 16, 21], [10, 15, 16, 19, 20, 21], [5, 6, 10, 14, 20, 21], [11, 15, 17, 19, 20, 22], [5, 7, 11, 14, 20, 22], [2, 7, 11, 13, 17, 22], [7, 8, 13, 14, 22, 23], [3, 8, 12, 13, 18, 23], [13, 17, 18, 19, 22, 23], [14, 19, 20, 21, 22, 23], [6, 8, 12, 14, 21, 23], [12, 16, 18, 19, 21, 23]]

这需要我大约2.5秒来计算。

任何想法如何快速完成？

正式定义（实际上没有乳胶模式很难）：让A = {A1，...，An}是非负整数的有限集合Ai的有限集合。然后输出应该是集合{A的B：B子集中的集合的交集}。

因此，正式算法将采用A的所有子集的所有交叉点的并集。但这显然是永远的。

非常感谢！

Answer 1

这是一个递归解决方案。在您的测试示例中几乎是即时的：

def allIntersections(frozenSets):
    if len(frozenSets) == 0:
        return []
    else:
        head = frozenSets[0]
        tail = frozenSets[1:]
        tailIntersections = allIntersections(tail)
        newIntersections = [head]
        newIntersections.extend(tailIntersections)
        newIntersections.extend(head & s for s in tailIntersections)
        return list(set(newIntersections))

def all_intersections(lists):
    sets = allIntersections([frozenset(s) for s in lists])
    return [list(s) for s in sets]

在编辑这是一个更清晰，非递归的相同想法的实现。

如果将空集合的集合定义为通用集合，则问题最容易，并且可以通过获取所有元素的并集来获得足够的通用集合。这是格理论中的标准运动，并且将空集合的集合作为空集合是双重的。如果你不想要它，你总是可以抛弃这个通用集：

def allIntersections(frozenSets):
    universalSet = frozenset.union(*frozenSets)
    intersections = set([universalSet])
    for s in frozenSets:
        moreIntersections = set(s & t for t in intersections)
        intersections.update(moreIntersections)
    return intersections

def all_intersections(lists):
    sets = allIntersections([frozenset(s) for s in lists])
    return [list(s) for s in sets]

你的测试示例如此之快的原因是，即使你的收藏品有24套，因此有2 ** 24（1680万）个潜在的交叉点，实际上只有242个（如果你不穿，则为241个）不计算空的交叉点）不同的交叉点。因此，每次通过循环的交叉点数量最多只有几百个。

可以选择24组，以便所有2 ** 24个可能的交叉点实际上都是不同的，因此很容易看出最坏情况的行为是指数的。但是，如果在测试示例中，交叉点的数量很少，则此方法可以让您快速计算它们。

潜在的优化可能是在循环之前对集合的大小进行排序。前面处理较小的设置可能导致更早出现的空交叉点，从而使不同交叉点的总数保持较小，直到循环结束。

Answer 2

我的机器上需要大约3.5毫秒的迭代解决方案，用于您的大型测试输入：

from itertools import starmap, product
from operator import and_

def all_intersections(sets):
    # Convert to set of frozensets for uniquification/type correctness
    last = new = sets = set(map(frozenset, sets))
    # Keep going until further intersections add nothing to results
    while new:
        # Compute intersection of old values with newly found values
        new = set(starmap(and_, product(last, new)))
        last = sets.copy()  # Save off prior state
        new -= last         # Determine truly newly added values
        sets |= new         # Accumulate newly added values in complete set
    # No more intersections being generated, convert results to canonical
    # form, list of lists, where each sublist is displayed in order, and
    # the top level list is ordered first by size of sublist, then by contents
    return sorted(map(sorted, sets), key=lambda x: (len(x), x))

基本上，它只是在旧结果集和新发现的交叉点之间进行双向交叉，直到一轮交叉点没有改变任何东西，然后就完成了。

注意：这实际上不是最好的解决方案（递归在算法上足以更好地赢得测试数据，其中John Coleman的解决方案，在向外包装器添加排序之后所以它匹配格式，大约0.94毫秒，而我的3.5毫秒。我主要以其他方式解决问题为例。

如何在python中快速获取集合的所有交集

2 个答案: