python - 唯一的范围集合,在需要时合并

时间:2015-07-21 13:22:44

标签: python range

是否有一个数据结构将维护一组唯一的范围,合并添加的连续或重叠范围?我需要跟踪已处理的范围,但这可能以任意顺序发生。 E.g:

range_set = RangeSet() # doesn't exist that I know of, this is what I need help with

def process_data(start, end):
    global range_set
    range_set.add_range(start, end)
    # ...

process_data(0, 10)
process_data(20, 30)
process_data(5, 15)
process_data(50, 60)

print(range_set.missing_ranges())
# [[16,19], [31, 49]]

print(range_set.ranges())
# [[0,15], [20,30], [50, 60]]

请注意,重叠或连续范围会合并在一起。做这个的最好方式是什么?我查看了使用bisect模块,但它的使用似乎并不十分清楚。

6 个答案:

答案 0 :(得分:1)

另一种方法基于sympy.sets

>>> import sympy as sym
>>> a = sym.Interval(1, 2, left_open=False, right_open=False)
>>> b = sym.Interval(3, 4, left_open=False, right_open=False)
>>> domain = sym.Interval(0, 10, left_open=False, right_open=False)
>>> missing = domain - a - b
>>> missing
[0, 1) U (2, 3) U (4, 10]
>>> 2 in missing
False
>>> missing.complement(domain)
[1, 2] U [3, 4]

答案 1 :(得分:0)

你可以使用pythons内置set数据结构获得一些类似的功能;假设只有整数值对startend有效。

>>> whole_domain = set(range(12))
>>> A = set(range(0,1))
>>> B = set(range(4,9))
>>> C = set(range(3,6))  # processed range(3,5) twice
>>> done = A | B | C
>>> print done
set([0, 3, 4, 5, 6, 7, 8])
>>> missing = whole_domain - done
>>> print missing
set([1, 2, 9, 10, 11])

这仍然缺少许多“范围”特征,但可能就足够了。

如果已处理某个范围的简单查询可能如下所示:

>>> isprocessed = [foo in done for foo in set(range(2,6))]
>>> print isprocessed
[False, True, True, True]

答案 2 :(得分:0)

我只是轻轻地测试了它,但听起来你正在寻找这样的东西。您需要自己添加方法来获取范围和缺失范围,但它应该非常直接,因为RangeSet.ranges是按排序顺序维护的Range对象列表。例如,对于更愉快的界面,您可以编写一个方便的方法,将其转换为2元组的列表。

编辑:我刚修改它以使用小于或等于比较进行合并。但请注意,这不会合并相邻的"条目(例如,它不会合并(1, 5)(6, 10))。为此,您只需修改Range.check_merge()中的条件。

import bisect


class Range(object):

    # Reduces memory usage, overkill unless you're using a lot of these.
    __slots__ = ["start", "end"]

    def __init__(self, start, end):
        """Initialise this range."""
        self.start = start
        self.end = end

    def __cmp__(self, other):
        """Sort ranges by their initial item."""
        return cmp(self.start, other.start)

    def check_merge(self, other):
        """Merge in specified range and return True iff it overlaps."""
        if other.start <= self.end and other.end >= self.start:
            self.start = min(other.start, self.start)
            self.end = max(other.end, self.end)
            return True
        return False


class RangeSet(object):

    def __init__(self):
        self.ranges = []

    def add_range(self, start, end):
        """Merge or insert the specified range as appropriate."""
        new_range = Range(start, end)
        offset = bisect.bisect_left(self.ranges, new_range)
        # Check if we can merge backwards.
        if offset > 0 and self.ranges[offset - 1].check_merge(new_range):
            new_range = self.ranges[offset - 1]
            offset -= 1
        else:
            self.ranges.insert(offset, new_range)
        # Scan for forward merges.
        check_offset = offset + 1
        while (check_offset < len(self.ranges) and
                new_range.check_merge(self.ranges[offset+1])):
            check_offset += 1
        # Remove any entries that we've just merged.
        if check_offset - offset > 1:
            self.ranges[offset+1:check_offset] = []

答案 3 :(得分:0)

您在示例用例中找到了一个很好的解决方案。而不是尝试维护一组已使用的范围,而是跟踪避免使用的范围。这使问题变得非常简单。

class RangeSet:
    def __init__(self, min, max):
        self.__gaps = [(min, max)]
        self.min = min
        self.max = max

    def add(self, lo, hi):
        new_gaps = []
        for g in self.__gaps:
            for ng in (g[0],min(g[1],lo)),(max(g[0],hi),g[1]):
                if ng[1] > ng[0]: new_gaps.append(ng)
        self.__gaps = new_gaps

    def missing_ranges(self):
        return self.__gaps

    def ranges(self):
        i = iter([self.min] + [x for y in self.__gaps for x in y] + [self.max])
        return [(x,y) for x,y in zip(i,i) if y > x]

魔法在add方法中,它会检查每个现有的差距以查看它是否受到新范围的影响,并相应地调整间隙列表。

请注意,此处用于范围的元组的行为与Python的range个对象相同,即它们包含start值并且不包括stop价值。此类的行为与您在问题中描述的方式完全相同,其范围似乎包含两者。

答案 4 :(得分:0)

看看portionhttps://pypi.org/project/portion/)。我是该库的维护者,它支持开箱即用的连续间隔的分离。它会自动简化相邻和重叠的间隔。

考虑示例中提供的间隔:

>>> import portion as P
>>> i = P.closed(0, 10) | P.closed(20, 30) | P.closed(5, 15) | P.closed(50, 60)

>>> # get "used ranges"
>>> i
[0,15] | [20,30] | [50,60]

>>> # get "missing ranges"
>>> i.enclosure - i
(15,20) | (30,50)

答案 5 :(得分:0)

与DavidT的答案类似-也是基于sympy's sets,但在一次操作中使用了任意长度和加法(联合)的列表:

import sympy

intervals = [[1,4], [6,10], [3,5], [7,8]] # pairs of left,right
print(intervals)

symintervals = [sympy.Interval(i[0],i[1], left_open=False, right_open=False) for i in intervals]
print(symintervals)

merged = sympy.Union(*symintervals) # one operation; adding to an union one by one is much slower for a large number of intervals
print(merged)

for i in merged.args: # assumes that the "merged" result is an union, not a single interval
    print(i.left, i.right) # getting bounds of merged intervals