确定集合联合的有效方法

时间:2017-05-15 19:31:11

标签: python python-3.x set

我有一个(非常大的)集合列表,包含一对值,例如:

SetList = [{1,2},{2,3},{4,5},{5,6},{1,7}]

我想有效地确定上述对中关系的传递性所暗示的不相交的值集。例如,1与2相关联,2与3相关,因此1,2,3相关联。类似地,1与7相关联,因此1,2,3和7相关联。在上面,4,5和6是相关联的,但不与剩余的值相关联。结果应如下所示:

DisjointSets = [{1,2,3,7},{4,5,6}]

是否有简单而有效的方法来执行我缺少的操作?谢谢!

2 个答案:

答案 0 :(得分:5)

将原始列表转换为元组:

TupleList = [(1,2),(2,3),(4,5),(5,6),(1,7)]

我使用networkx via(感谢@ user2357112):

import networkx as nx
G = nx.path_graph(0)
G.add_edges_from(TupleList)
DisjointSets = list(nx.connected_components(G))

这是解决问题的最有效方法吗?还有其他想法吗?

答案 1 :(得分:0)

图表方法可能比递归更快,但对于那些对纯Python感兴趣的人:

def get_disjoints(lst):
    """Return disjoints."""
    def rec_disjoints(lst):
        if not lst:
            return disjoints
        else:
            chosen = lst[0]
            # Iterat/Mutate list trick using indicies
            for i, s in reversed(list(enumerate(lst[:]))):
                if not chosen.isdisjoint(s):
                    chosen.update(s)
                    del lst[i]
        disjoints.append(chosen)
        return rec_disjoints(lst)

    disjoints = []
    return rec_disjoints(lst)

lst = [{1,2}, {2,3}, {4,5}, {5,6}, {1,7}]
get_disjoints(lst)
# [{1, 2, 3, 7}, {4, 5, 6}]

这利用了有用的isdisjoint方法。虽然,迭代+函数调用+递归会降低性能。

以下是稳健性测试,适用于其他贡献者:

import nose.tools as nt

def test_disjoint(f):
    "Verify the test function generates expected disjoints."
    def verify(lst1, lst2):
        actual, expected = lst1, lst2
        nt.eq_(actual, expected)

    verify(f([{1,2}, {2,3}, {4,5}, {5,6}, {1,7}]),
             [{1,2,3,7}, {4,5,6}])
    verify(f([{4,5}, {5,6}, {1,7}]),
             [{4,5,6}, {1,7}])
    verify(f([{1,7}]),
             [{1,7}])
    verify(f([{1,2}, {2,3}, {4,5}, {5,6}, {1,7}, {10, 11}]),
             [{1,2,3,7}, {4,5,6}, {10,11}])
    verify(f([{4,5}, {5,6}, {1,7}, {10, 11}]),
             [{4,5,6}, {1,7}, {10,11}])
    verify(f([{1,2}, {4,5}, {6,7}]),
             [{1,2}, {4,5}, {6,7}])


test_disjoint(f=get_disjoints)