是否有一个数据结构将维护一组唯一的范围,合并添加的连续或重叠范围?我需要跟踪已处理的范围,但这可能以任意顺序发生。 E.g:
range_set = RangeSet() # doesn't exist that I know of, this is what I need help with
def process_data(start, end):
global range_set
range_set.add_range(start, end)
# ...
process_data(0, 10)
process_data(20, 30)
process_data(5, 15)
process_data(50, 60)
print(range_set.missing_ranges())
# [[16,19], [31, 49]]
print(range_set.ranges())
# [[0,15], [20,30], [50, 60]]
请注意,重叠或连续范围会合并在一起。做这个的最好方式是什么?我查看了使用bisect模块,但它的使用似乎并不十分清楚。
答案 0 :(得分:1)
另一种方法基于sympy.sets。
>>> import sympy as sym
>>> a = sym.Interval(1, 2, left_open=False, right_open=False)
>>> b = sym.Interval(3, 4, left_open=False, right_open=False)
>>> domain = sym.Interval(0, 10, left_open=False, right_open=False)
>>> missing = domain - a - b
>>> missing
[0, 1) U (2, 3) U (4, 10]
>>> 2 in missing
False
>>> missing.complement(domain)
[1, 2] U [3, 4]
答案 1 :(得分:0)
你可以使用pythons内置set
数据结构获得一些类似的功能;假设只有整数值对start
和end
有效。
>>> whole_domain = set(range(12))
>>> A = set(range(0,1))
>>> B = set(range(4,9))
>>> C = set(range(3,6)) # processed range(3,5) twice
>>> done = A | B | C
>>> print done
set([0, 3, 4, 5, 6, 7, 8])
>>> missing = whole_domain - done
>>> print missing
set([1, 2, 9, 10, 11])
这仍然缺少许多“范围”特征,但可能就足够了。
如果已处理某个范围的简单查询可能如下所示:
>>> isprocessed = [foo in done for foo in set(range(2,6))]
>>> print isprocessed
[False, True, True, True]
答案 2 :(得分:0)
我只是轻轻地测试了它,但听起来你正在寻找这样的东西。您需要自己添加方法来获取范围和缺失范围,但它应该非常直接,因为RangeSet.ranges
是按排序顺序维护的Range
对象列表。例如,对于更愉快的界面,您可以编写一个方便的方法,将其转换为2元组的列表。
编辑:我刚修改它以使用小于或等于比较进行合并。但请注意,这不会合并相邻的"条目(例如,它不会合并(1, 5)
和(6, 10)
)。为此,您只需修改Range.check_merge()
中的条件。
import bisect
class Range(object):
# Reduces memory usage, overkill unless you're using a lot of these.
__slots__ = ["start", "end"]
def __init__(self, start, end):
"""Initialise this range."""
self.start = start
self.end = end
def __cmp__(self, other):
"""Sort ranges by their initial item."""
return cmp(self.start, other.start)
def check_merge(self, other):
"""Merge in specified range and return True iff it overlaps."""
if other.start <= self.end and other.end >= self.start:
self.start = min(other.start, self.start)
self.end = max(other.end, self.end)
return True
return False
class RangeSet(object):
def __init__(self):
self.ranges = []
def add_range(self, start, end):
"""Merge or insert the specified range as appropriate."""
new_range = Range(start, end)
offset = bisect.bisect_left(self.ranges, new_range)
# Check if we can merge backwards.
if offset > 0 and self.ranges[offset - 1].check_merge(new_range):
new_range = self.ranges[offset - 1]
offset -= 1
else:
self.ranges.insert(offset, new_range)
# Scan for forward merges.
check_offset = offset + 1
while (check_offset < len(self.ranges) and
new_range.check_merge(self.ranges[offset+1])):
check_offset += 1
# Remove any entries that we've just merged.
if check_offset - offset > 1:
self.ranges[offset+1:check_offset] = []
答案 3 :(得分:0)
您在示例用例中找到了一个很好的解决方案。而不是尝试维护一组已使用的范围,而是跟踪避免使用的范围。这使问题变得非常简单。
class RangeSet:
def __init__(self, min, max):
self.__gaps = [(min, max)]
self.min = min
self.max = max
def add(self, lo, hi):
new_gaps = []
for g in self.__gaps:
for ng in (g[0],min(g[1],lo)),(max(g[0],hi),g[1]):
if ng[1] > ng[0]: new_gaps.append(ng)
self.__gaps = new_gaps
def missing_ranges(self):
return self.__gaps
def ranges(self):
i = iter([self.min] + [x for y in self.__gaps for x in y] + [self.max])
return [(x,y) for x,y in zip(i,i) if y > x]
魔法在add
方法中,它会检查每个现有的差距以查看它是否受到新范围的影响,并相应地调整间隙列表。
请注意,此处用于范围的元组的行为与Python的range
个对象相同,即它们包含start
值并且不包括stop
价值。此类不的行为与您在问题中描述的方式完全相同,其范围似乎包含两者。
答案 4 :(得分:0)
看看portion
(https://pypi.org/project/portion/)。我是该库的维护者,它支持开箱即用的连续间隔的分离。它会自动简化相邻和重叠的间隔。
考虑示例中提供的间隔:
>>> import portion as P
>>> i = P.closed(0, 10) | P.closed(20, 30) | P.closed(5, 15) | P.closed(50, 60)
>>> # get "used ranges"
>>> i
[0,15] | [20,30] | [50,60]
>>> # get "missing ranges"
>>> i.enclosure - i
(15,20) | (30,50)
答案 5 :(得分:0)
与DavidT的答案类似-也是基于sympy's sets,但在一次操作中使用了任意长度和加法(联合)的列表:
import sympy
intervals = [[1,4], [6,10], [3,5], [7,8]] # pairs of left,right
print(intervals)
symintervals = [sympy.Interval(i[0],i[1], left_open=False, right_open=False) for i in intervals]
print(symintervals)
merged = sympy.Union(*symintervals) # one operation; adding to an union one by one is much slower for a large number of intervals
print(merged)
for i in merged.args: # assumes that the "merged" result is an union, not a single interval
print(i.left, i.right) # getting bounds of merged intervals