我有一组看起来像这样的范围:
[(0, 100), (150, 220), (500, 1000)]
然后我会添加一个范围,比如(250, 400)
,列表看起来像这样:
[(0, 100), (150, 220), (250, 400), (500, 1000)]
然后我会尝试添加范围(399, 450)
,因为重叠(250, 400)
会错误。
当我添加新范围时,我需要搜索以确保新范围不与现有范围重叠。并且列表中的任何范围都不会与列表中的另一个范围重叠。
为此,我想要一个以排序顺序廉价维护其元素的数据结构,并且很快允许我在给定元素之前或之后找到该元素。
有没有更好的方法来解决这个问题?是否有像Python中可用的数据结构?我知道bisect
模块存在,这可能是我将使用的。但我希望有更好的东西。
编辑:我使用bisect
模块解决了这个问题。这是代码的链接。这有点长,所以我不会在这里发布:
答案 0 :(得分:13)
看起来你想要像bisect's insort_right / insort_left这样的东西。 bisect模块使用列表和元组。
import bisect
l = [(0, 100), (150, 300), (500, 1000)]
bisect.insort_right(l, (250, 400))
print l # [(0, 100), (150, 300), (250, 400), (500, 1000)]
bisect.insort_right(l, (399, 450))
print l # [(0, 100), (150, 300), (250, 400), (399, 450), (500, 1000)]
您可以编写自己的overlaps
函数,在使用insort
之前可以使用该函数进行检查。
我认为你的号码错误,(250, 400)
重叠(150, 300)
。
overlaps()
可以这样写:
def overlaps(inlist, inrange):
for min, max in inlist:
if min < inrange[0] < max and max < inrange[1]:
return True
return False
答案 1 :(得分:10)
使用SortedDict中的SortedCollection。
SortedDict提供与dict相同的方法。此外,SortedDict有效地按排序顺序维护其键。因此,keys方法将按排序顺序返回键,popitem方法将删除具有最高键的项目等。
我用过它 - 它有效。不幸的是,我现在没有时间进行适当的性能比较,但主观上它似乎比 bisect 模块更快。
答案 2 :(得分:4)
便宜的搜索和便宜的插入往往是不一致的。您可以使用linked list作为数据结构。然后搜索以找到新元素的插入点是O(n),并且随后在正确位置插入新元素是O(1)。
但是你最好只使用一个简单的Python列表。随机访问(即找到你的位置)需要恒定的时间。插入正确的位置以维持排序在理论上更昂贵,但这取决于dynamic array的实现方式。在重新分配底层数组之前,你并没有真正为插入支付高昂的代价。
关于检查日期范围重叠,我碰巧过去遇到了同样的问题。这是我使用的代码。我最初在博客文章中找到它,从SO答案链接,但该网站似乎不再存在。我实际上在我的范围中使用日期时间,但它对你的数值值同样有效。
def dt_windows_intersect(dt1start, dt1end, dt2start, dt2end):
'''Returns true if two ranges intersect. Note that if two
ranges are adjacent, they do not intersect.
Code based on:
http://beautifulisbetterthanugly.com/posts/2009/oct/7/datetime-intersection-python/
http://stackoverflow.com/questions/143552/comparing-date-ranges
'''
if dt2end <= dt1start or dt2start >= dt1end:
return False
return dt1start <= dt2end and dt1end >= dt2start
以下是证明其有效的单元测试:
from nose.tools import eq_, assert_equal, raises
class test_dt_windows_intersect():
"""
test_dt_windows_intersect
Code based on:
http://beautifulisbetterthanugly.com/posts/2009/oct/7/datetime-intersection-python/
http://stackoverflow.com/questions/143552/comparing-date-ranges
|-------------------| compare to this one
1 |---------| contained within
2 |----------| contained within, equal start
3 |-----------| contained within, equal end
4 |-------------------| contained within, equal start+end
5 |------------| overlaps start but not end
6 |-----------| overlaps end but not start
7 |------------------------| overlaps start, but equal end
8 |-----------------------| overlaps end, but equal start
9 |------------------------------| overlaps entire range
10 |---| not overlap, less than
11 |-------| not overlap, end equal
12 |---| not overlap, bigger than
13 |---| not overlap, start equal
"""
def test_contained_within(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,6,30), datetime(2009,10,1,6,40),
)
def test_contained_within_equal_start(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,6,0), datetime(2009,10,1,6,30),
)
def test_contained_within_equal_end(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,6,30), datetime(2009,10,1,7,0),
)
def test_contained_within_equal_start_and_end(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
)
def test_overlaps_start_but_not_end(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,5,30), datetime(2009,10,1,6,30),
)
def test_overlaps_end_but_not_start(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,6,30), datetime(2009,10,1,7,30),
)
def test_overlaps_start_equal_end(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,5,30), datetime(2009,10,1,7,0),
)
def test_equal_start_overlaps_end(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,6,0), datetime(2009,10,1,7,30),
)
def test_overlaps_entire_range(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,5,0), datetime(2009,10,1,8,0),
)
def test_not_overlap_less_than(self):
assert not dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,5,0), datetime(2009,10,1,5,30),
)
def test_not_overlap_end_equal(self):
assert not dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,5,0), datetime(2009,10,1,6,0),
)
def test_not_overlap_greater_than(self):
assert not dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,7,30), datetime(2009,10,1,8,0),
)
def test_not_overlap_start_equal(self):
assert not dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,7,0), datetime(2009,10,1,8,0),
)
答案 3 :(得分:1)
也许模块 bisect 可能比简单的跟随功能更好? :
li = [(0, 100), (150, 220), (250, 400), (500, 1000)]
def verified_insertion(x,L):
u,v = x
if v<L[0][0]:
return [x] + L
elif u>L[-1][0]:
return L + [x]
else:
for i,(a,b) in enumerate(L[0:-1]):
if a<u and v<L[i+1][0]:
return L[0:i+1] + [x] + L[i+1:]
return L
lo = verified_insertion((-10,-2),li)
lu = verified_insertion((102,140),li)
le = verified_insertion((222,230),li)
lee = verified_insertion((234,236),le) # <== le
la = verified_insertion((408,450),li)
ly = verified_insertion((2000,3000),li)
for w in (lo,lu,le,lee,la,ly):
print li,'\n',w,'\n'
该函数返回一个列表而不修改作为参数传递的列表。
结果
[(0, 100), (150, 220), (250, 400), (500, 1000)]
[(-10, -2), (0, 100), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (102, 140), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (222, 230), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (222, 230), (234, 236), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (408, 450), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (500, 1000), (2000, 3000)]
答案 4 :(得分:0)
回答你的问题:
Is there a data structure like that available in Python?
不,没有。但是您可以使用列表作为基础结构和bisect模块中的代码轻松构建一个,以使列表保持顺序并检查重叠。
class RangeList(list):
"""Maintain ordered list of non-overlapping ranges"""
def add(self, range):
"""Add a range if no overlap else reject it"""
lo = 0; hi = len(self)
while lo < hi:
mid = (lo + hi)//2
if range < self[mid]: hi = mid
else: lo = mid + 1
if overlaps(range, self[lo]):
print("range overlap, not added")
else:
self.insert(lo, range)
我将overlaps
功能作为练习。
(此代码未经测试,可能需要一些推文)