多范围的python联合

时间:2013-03-07 14:25:24

标签: python range union

我有这些范围:

7,10
11,13
11,15
14,20
23,39

我需要执行重叠范围的并集以给出不重叠的范围,所以在示例中:

7,20
23,39

我在Ruby中已经完成了这个操作,我已经在数组中推动了范围的开始和结束并对它们进行了排序,然后执行重叠范围的并集。有什么快速的方法在Python中执行此操作吗?

由于

5 个答案:

答案 0 :(得分:10)

我们说(7, 10)(11, 13)会导致(7, 13)

a = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]
b = []
for begin,end in sorted(a):
    if b and b[-1][1] >= begin - 1:
        b[-1] = (b[-1][0], end)
    else:
        b.append((begin, end))

b现在

[(7, 20), (23, 39)]

修改

正如@CentAu正确注意到,[(2,4), (1,6)]将返回(1,4)而不是(1,6)。以下是正确处理此案例的新版本:

a = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]
b = []
for begin,end in sorted(a):
    if b and b[-1][1] >= begin - 1:
        b[-1][1] = max(b[-1][1], end)
    else:
        b.append([begin, end])

答案 1 :(得分:6)

老问题。但是我想为将来的参考添加这个答案。 sympy可用于实现间隔的结合:

from sympy import Interval, Union
def union(data):
    """ Union of a list of intervals e.g. [(1,2),(3,4)] """
    intervals = [Interval(begin, end) for (begin, end) in data]
    u = Union(*intervals)
    return [list(u.args[:2])] if isinstance(u, Interval) \
       else list(u.args)

如果Union的输出超过两个间隔是Union个对象,而当存在单个间隔时,输出是Interval个对象。这就是返回行中if statement的原因。

的示例:

In [26]: union([(10, 12), (14, 16), (15, 22)])
Out[26]: [[10, 12], [14, 22]]

In [27]: union([(10, 12), (9, 16)])
Out[27]: [[9, 16]]

答案 2 :(得分:1)

我尝试了存在(45,46)和(45,45)的特定情况 并且还测试在您的应用中不太可能发生的情况:存在(11,6),存在(-1,-5),存在(-9,5),存在(-3,10)。登记/> 无论如何,结果适合所有这些情况,这是重点。

算法:

def yi(li):
    gen = (x for a,b in li for x in xrange(a,b+1))
    start = p = gen.next()
    for x in gen:
        if x>p+2:
            yield (start,p)
            start = p = x
        else:
            p = x
    yield (start,x)

如果以下代码中的aff设置为True,则会显示执行步骤。

def yi(li):
    aff = 0
    gen = (x for a,b in li for x in xrange(a,b+1))
    start = p = gen.next()
    for x in gen:
        if aff:
            print ('start %s     p %d  p+2 %d     '
                   'x==%s' % (start,p,p+2,x))
        if x>p+2:
            if aff:
                print 'yield range(%d,%d)' % (start,p+1)
            yield (start,p)
            start = p = x
        else:
            p = x
    if aff:
        print 'yield range(%d,%d)' % (start,x+1)
    yield (start,x)



for li in ([(7,10),(23,39),(11,13),(11,15),(14,20),(45,46)],
           [(7,10),(23,39),(11,13),(11,15),(14,20),(45,46),(45,45)],
           [(7,10),(23,39),(11,13),(11,15),(14,20),(45,45)],

           [(7,10),(23,39),(11,13),(11,6),(14,20),(45,46)], 
           #1 presence of (11, 6)
           [(7,10),(23,39),(11,13),(-1,-5),(14,20),(45,45)], 
           #2  presence of (-1,-5)
           [(7,10),(23,39),(11,13),(-9,-5),(14,20),(45,45)], 
           #3  presence of (-9, -5)
           [(7,10),(23,39),(11,13),(-3,10),(14,20),(45,45)]): 
           #4  presence of (-3, 10)

    li.sort()
    print 'sorted li    %s'%li
    print '\n'.join('  (%d,%d)   %r' % (a,b,range(a,b)) 
                     for a,b in li)
    print 'list(yi(li)) %s\n' % list(yi(li))

结果

sorted li    [(7, 10), (11, 13), (11, 15), (14, 20),
              (23, 39), (45, 46)]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (11,15)   [11, 12, 13, 14]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 
             35, 36, 37, 38]
  (45,46)   [45]
list(yi(li)) [(7, 20), (23, 39), (45, 46)]

sorted li    [(7, 10), (11, 13), (11, 15), (14, 20), 
              (23, 39), (45, 45), (45, 46)]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (11,15)   [11, 12, 13, 14]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
             35, 36, 37, 38]
  (45,45)   []
  (45,46)   [45]
list(yi(li)) [(7, 20), (23, 39), (45, 46)]

sorted li    [(7, 10), (11, 13), (11, 15), (14, 20), 
              (23, 39), (45, 45)]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (11,15)   [11, 12, 13, 14]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
             35, 36, 37, 38]
  (45,45)   []
list(yi(li)) [(7, 20), (23, 39), (45, 45)]

sorted li    [(7, 10), (11, 6), (11, 13), (14, 20), 
              (23, 39), (45, 46)]
  (7,10)   [7, 8, 9]
  (11,6)   []
  (11,13)   [11, 12]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 
             35, 36, 37, 38]
  (45,46)   [45]
list(yi(li)) [(7, 20), (23, 39), (45, 46)]

sorted li    [(-1, -5), (7, 10), (11, 13), (14, 20), 
              (23, 39), (45, 45)]
  (-1,-5)   []
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
             35, 36, 37, 38]
  (45,45)   []
list(yi(li)) [(7, 20), (23, 39), (45, 45)]

sorted li    [(-9, -5), (7, 10), (11, 13), (14, 20), 
              (23, 39), (45, 45)]
  (-9,-5)   [-9, -8, -7, -6]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
             35, 36, 37, 38]
  (45,45)   []
list(yi(li)) [(-9, -5), (7, 20), (23, 39), (45, 45)]

sorted li    [(-3, 10), (7, 10), (11, 13), (14, 20), 
              (23, 39), (45, 45)]
  (-3,10)   [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 
             35, 36, 37, 38]
  (45,45)   []
list(yi(li)) [(-3, 20), (23, 39), (45, 45)]

答案 3 :(得分:0)

以下功能适用于:

a = [(7,10),(11,13),(11,15),(14,20),(23,39)]

a = [(2,4),(1,6)]

def range_overlap_adjust(list_ranges):
    overlap_corrected   =   []
    for start, stop in sorted(list_ranges):
        if  overlap_corrected and start-1 <= overlap_corrected [-1][1] and stop >= overlap_corrected [-1][1]:
            overlap_corrected [-1] = min(overlap_corrected [-1][0], start), stop
        elif overlap_corrected and start <= overlap_corrected [-1][1] and stop <= overlap_corrected [-1][1]:
            break
        else:
            overlap_corrected.append((start,stop))
    return overlap_corrected

测试

list_ranges = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]   


print range_overlap_adjust(list_ranges)

给出:

[(7,20),(23,39)]

答案 4 :(得分:0)

这里是使用functools.reduce的单线(假设(x,10)和(11,y)重叠):

reduce(
    lambda acc, el: acc[:-1:] + [(min(*acc[-1], *el), max(*acc[-1], *el))]
        if acc[-1][1] >= el[0] - 1
        else acc + [el],
    ranges[1::],
    ranges[0:1]
)

这从第一个范围开始,并使用reduce遍历其余范围。它将最后一个元素(acc[-1])与下一个范围(el)比较。如果它们重叠,它将用两个范围的最小值和最大值(acc[:-1:] + [min, max])替换最后一个元素。如果它们不重叠,则仅将这个新范围放在列表的末尾(acc + [el])。

示例:

from functools import reduce

example_ranges = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]

def combine_overlaps(ranges):
    return reduce(
        lambda acc, el: acc[:-1:] + [(min(*acc[-1], *el), max(*acc[-1], *el))]
            if acc[-1][1] >= el[0] - 1
            else acc + [el],
        ranges[1::],
        ranges[0:1],
    )

print(combine_overlaps(example_ranges))

输出:

[(7, 20), (23, 39)]