合并列表与交集

时间:2015-01-06 05:34:25

标签: python algorithm set

鉴于:

g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]

如何比较g中的每个列表,以便对于共享的列表,任何公共号码都可以合并到一组?

e.g。
0g[2]中存在g[4] 所以他们合并到一组{0,2,3,7}

我尝试过以下操作但不起作用:

for i in g:
    for j in g:
        if k in i == l in j:
            m=set(i+j)

我想做出最大可能的设定。

3 个答案:

答案 0 :(得分:1)

这是一个快速列表,它将列出所有相交的集合:

sets = [set(i+j) for i in g for j in g if i!=j and (set(i) & set(j))]

请注意,每个结果都会重复,因为每个列表都会被比较两次,一次在左边,一次在右边。

答案 1 :(得分:1)

快得多方式您可以先创建len多个项目(s)的项目列表。然后浏览您的列表并使用union功能进行更新!

s=map(set,g)
def find_intersection(m_list):
    for i,v in enumerate(m_list) : 
        for j,k in enumerate(m_list[i+1:],i+1):
           if v & k:
              m_list[i]=v.union(m_list.pop(j))
              return find_intersection(m_list)
    return m_list

演示:

g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]
s=map(set,g)
print find_intersection(s)

[set([0, 2, 3, 7]), set([1, 4, 5, 6])]

g=[[1,2,3],[3,4,5],[5,6],[6,7],[9,10],[10,11]]
s=map(set,g)
print find_intersection(s)

[set([1, 2, 3, 4, 5, 6, 7]), set([9, 10, 11])]

g=[[], [1], [0,2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]
s=map(set,g)
print find_intersection(s)

[set([1, 4, 5, 6]), set([0, 2, 3, 7])]

与@Mark的回答基准:

from timeit import timeit


s1="""g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]
sets = [set(i+j) for i in g for j in g if i!=j and (set(i) & set(j))]
    """
s2="""g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]

s=map(set,g)

def find_intersection(m_list):
    for i,v in enumerate(m_list) : 
        for j,k in enumerate(m_list[i+1:],i+1):
           if v & k:
              s[i]=v.union(m_list.pop(j))
              return find_intersection(m_list)
    return m_list 
    """

print ' first: ' ,timeit(stmt=s1, number=100000)
print 'second : ',timeit(stmt=s2, number=100000)

first:  3.8284008503
second :  0.213887929916

答案 2 :(得分:1)

如果gg的元素很大,您可以使用不相交集来提高效率。

此数据结构可用于将每个元素分类到它应属于的集合中。

第一步是构建一个Disjoint Set集合,其中所有g个集合都用g中的索引标记:

g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7],[99]]
g = map(set, g)
dss = CDisjointSets()
for i in xrange(len(g)):
    dss.MakeSet(i)

然后,每当交集不为空时,集合就会加入:

for i in xrange(len(g)):
    for j in xrange(i+1, len(g)):
        if g[i].intersection(g[j]):
            dss.Join(i,j)

此时dss为您提供了应该加在一起的g套的公共标签:

print(dss)
  

父(0)= 0   parent(1)= 1   parent(2)= 2   parent(3)= 3   parent(4)= 2   parent(5)= 3   parent(6)= 3   parent(7)= 7   parent(8)= 8   parent(9)= 2   parent(10)= 10

现在你只需构建新的集合,加入那些具有相同标签的集合:

l2set = dict()
for i in xrange(len(g)):
    label = dss.FindLabel(i).getLabel()
    l2set[label] = l2set.get(label, set()).union(g[i])
print(l2set)

导致:

{0: set([]), 1: set([]), 2: set([0, 2, 3, 7]), 3: set([1, 4, 5, 6]), 7: set([]), 8:   set([]), 10: set([99])}

这是我使用的Disjoint Sets的实现,但你肯定可以找到另一个更好的sintax:

""" Disjoint Sets
    -------------
    Pablo Francisco Pérez Hidalgo
    December,2012. """
class CDisjointSets:

    #Class to represent each set
    class DSet:
        def __init__(self, label_value):
            self.__label = label_value
            self.rank = 1
            self.parent = self
        def getLabel(self):
            return self.__label

    #CDisjointSets Private attributes
    __sets = None

    #CDisjointSets Constructors and public methods.
    def __init__(self):
        self.__sets = {}

    def MakeSet(self, label):
        if label in self.__sets: #This check slows the operation a lot,
            return False         #it should be removed if it is sure that
                                 #two sets with the same label are not goind
                                 #to be created.
        self.__sets[label] = self.DSet(label)

    #Pre: 'labelA' and 'labelB' are labels or existing disjoint sets.
    def Join(self, labelA, labelB):
        a = self.__sets[labelA]
        b = self.__sets[labelB]
        pa = self.Find(a)
        pb = self.Find(b)
        if pa == pb: 
            return #They are already joined
        parent = pa
        child = pb
        if pa.rank < pb.rank:
            parent = pb
            child = pa
        child.parent = parent
        parent.rank = max(parent.rank, child.rank+1)

    def Find(self,x):
        if x == x.parent:
            return x
        x.parent = self.Find(x.parent)
        return x.parent

    def FindLabel(self, label):
        return self.Find(self.__sets[label])

    def __str__(self):
        ret = ""
        for e in self.__sets:
            ret = ret + "parent("+self.__sets[e].getLabel().__str__()+") = "+self.FindLabel(e).parent.getLabel().__str__() + "\n"
        return ret