查找四组交叉点的所有段成员(ala Venn图)

时间:2015-08-25 12:52:04

标签: python algorithm

我有四组数据:

A=range(10,20) 
B=range(5,17) 
C=range(15,25) 
D=range(18,30)
sets = [A, B, C, D]

我想要做的是获得可以的交集的内容 被视为获取维恩图的所有部分(这是完整的案例):

enter image description here

使用上面的示例,分区填充如下:

()  ----> set()
('A',)  ----> set()
('B',)  ----> {8, 9, 5, 6, 7}
('C',)  ----> set()
('D',)  ----> {25, 26, 27, 28, 29}
('A', 'B')  ----> {10, 11, 12, 13, 14}
('A', 'C')  ----> {17}
('A', 'D')  ----> set()
('B', 'C')  ----> set()
('B', 'D')  ----> set()
('C', 'D')  ----> {24, 20, 21, 22, 23}
('A', 'B', 'C')     ----> {16, 15}
('A', 'B', 'D')     ----> set()
('A', 'C', 'D')     ----> {18, 19}
('B', 'C', 'D')     ----> set()
('A', 'B', 'C', 'D')    ----> set()

这些是预期的答案。

我坚持使用下面的代码,只能找到必须的交集 存在于所有给定的集合中:

# only gives ACD members
test = [tuple([A[0],A[-1]]), tuple([C[0],C[-1]]), tuple([D[0],D[-1]])]
starts, ends = zip(*test)
result = range(max(starts), min(ends) + 1)
# Gives 18,19

这样做的方法是什么? 请注意,我对绘图图表不感兴趣。 让我担心的是获得每个细分的成员。

5 个答案:

答案 0 :(得分:1)

我在这里用解决方案写了一篇关于此类问题的博客:http://paddy3118.blogspot.de/2013/07/set-divisionspartitions.html

您需要将x..y语法扩展为整数集,但如果这种形式的输出对您有用,那么您可能希望将输出与这种函数接口:http://rosettacode.org/wiki/Range_extraction

P.S。这是一个漂亮的维恩图。

答案 1 :(得分:1)

最好使用具有线性复杂度的扫描线算法(好,再加上输出的长度),而不是指数。

A=range(10,20) 
B=range(5,17) 
C=range(15,25) 
D=range(18,30)
sets = [A, B, C, D]
import string
events = []
for letter, set_ in zip(string.ascii_uppercase, sets):
    events.append((set_.start, True, letter))
    events.append((set_.stop, False, letter))
events.sort()
intersection = set()
intersections = []
last_t = None
for t, insert, letter in events:
    if t != last_t and intersection:
        intersections.append((''.join(sorted(intersection)), range(last_t, t))) 
    last_t = t
    if insert:
        intersection.add(letter)
    else:
        intersection.remove(letter)
print(intersections)

答案 2 :(得分:1)

import itertools

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return itertools.chain.from_iterable(itertools.combinations(s, r) for r in range(len(s)+1))

A = set(range(10,20)) 
B = set(range(5,17)) 
C = set(range(15,25)) 
D = set(range(18,30))

titles = (partition for partition in powerset(['A', 'B', 'C', 'D']))
source = (partition for partition in powerset([A, B, C, D]))

for elt in (zip(titles, source)):
    try:
        res = elt[1][0]
        for el in elt[1]:
            res.intersection(el)
    except IndexError:
        pass
    print(elt[0], ' = ', res)

输出=每组之间的交叉点

()  =  {5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29}
('A',)  =  {10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
('B',)  =  {5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
('C',)  =  {15, 16, 17, 18, 19, 20, 21, 22, 23, 24}
('D',)  =  {18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29}
('A', 'B')  =  {10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
('A', 'C')  =  {10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
('A', 'D')  =  {10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
('B', 'C')  =  {5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
('B', 'D')  =  {5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
('C', 'D')  =  {15, 16, 17, 18, 19, 20, 21, 22, 23, 24}
('A', 'B', 'C')  =  {10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
('A', 'B', 'D')  =  {10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
('A', 'C', 'D')  =  {10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
('B', 'C', 'D')  =  {5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
('A', 'B', 'C', 'D')  =  {10, 11, 12, 13, 14, 15, 16, 17, 18, 19}

答案 3 :(得分:1)

这是:输出是属于每个分区的元素集。它适用于任意数量的集合。

import itertools

def intersect(d):
    """
    d is an iterable collection of sets or frozensets
    returns the intersection of the sets in d"
    """
    res = set()
    try:
        res = set(d[0])
    except IndexError:
        pass
    for elt in d:
        elt = set(elt)
        res = res.intersection(elt)
    return res

A = frozenset(range(10,20))
B = frozenset(range(5,17))
C = frozenset(range(15,25))
D = frozenset(range(18,30))

titles = ('A','B','C','D')
data = (A, B, C, D)

dataset = set(data)
titles_comb, data_comb = [], []

for n in range(len(data)+1):
    titles_comb.append(list(itertools.combinations(titles, n)))
    data_comb.append(list(itertools.combinations(data, n)))

for title, dat in zip(titles_comb, data_comb):
    for t, d in zip(title, dat):
        #intersect(d) = elements in the intersection of the sets (what we want, but has overlap)
        #complement = sets from data that were not used in intersect(d) (the overlap we want to discard)
        result = intersect(d)
        complement = dataset.difference(set(d))
        comp = set()
        for elt in complement:
            for e in elt:
                comp.add(e)

        print(t, "\t---->", result.difference(comp))

输出=每个分区的内容(不包括所有其他分区)

()  ----> set()
('A',)  ----> set()
('B',)  ----> {8, 9, 5, 6, 7}
('C',)  ----> set()
('D',)  ----> {25, 26, 27, 28, 29}
('A', 'B')  ----> {10, 11, 12, 13, 14}
('A', 'C')  ----> {17}
('A', 'D')  ----> set()
('B', 'C')  ----> set()
('B', 'D')  ----> set()
('C', 'D')  ----> {24, 20, 21, 22, 23}
('A', 'B', 'C')     ----> {16, 15}
('A', 'B', 'D')     ----> set()
('A', 'C', 'D')     ----> {18, 19}
('B', 'C', 'D')     ----> set()
('A', 'B', 'C', 'D')    ----> set()

答案 4 :(得分:-1)

您是否尝试过使用python sets

JSON