是否存在标准的Python数据结构,使事物按排序顺序排列?

时间:2011-04-03 04:45:20

标签: python data-structures

我有一组看起来像这样的范围:

[(0, 100), (150, 220), (500, 1000)]

然后我会添加一个范围,比如(250, 400),列表看起来像这样:

[(0, 100), (150, 220), (250, 400), (500, 1000)]

然后我会尝试添加范围(399, 450),因为重叠(250, 400)会错误。

当我添加新范围时,我需要搜索以确保新范围不与现有范围重叠。并且列表中的任何范围都不会与列表中的另一个范围重叠。

为此,我想要一个以排序顺序廉价维护其元素的数据结构,并且很快允许我在给定元素之前或之后找到该元素。

有没有更好的方法来解决这个问题?是否有像Python中可用的数据结构?我知道bisect模块存在,这可能是我将使用的。但我希望有更好的东西。

编辑:我使用bisect模块解决了这个问题。这是代码的链接。这有点长,所以我不会在这里发布:

Implementation of byte range list

5 个答案:

答案 0 :(得分:13)

看起来你想要像bisect's insort_right / insort_left这样的东西。 bisect模块使用列表和元组。

import bisect

l = [(0, 100), (150, 300), (500, 1000)]
bisect.insort_right(l, (250, 400))
print l # [(0, 100), (150, 300), (250, 400), (500, 1000)]
bisect.insort_right(l, (399, 450))
print l # [(0, 100), (150, 300), (250, 400), (399, 450), (500, 1000)]

您可以编写自己的overlaps函数,在使用insort之前可以使用该函数进行检查。

我认为你的号码错误,(250, 400)重叠(150, 300)overlaps()可以这样写:

def overlaps(inlist, inrange):
    for min, max in inlist:
        if min < inrange[0] < max and max < inrange[1]:
            return True
    return False

答案 1 :(得分:10)

使用SortedDict中的SortedCollection

  

SortedDict提供与dict相同的方法。此外,SortedDict有效地按排序顺序维护其键。因此,keys方法将按排序顺序返回键,popitem方法将删除具有最高键的项目等。

我用过它 - 它有效。不幸的是,我现在没有时间进行适当的性能比较,但主观上它似乎比 bisect 模块更快。

答案 2 :(得分:4)

便宜的搜索和便宜的插入往往是不一致的。您可以使用linked list作为数据结构。然后搜索以找到新元素的插入点是O(n),并且随后在正确位置插入新元素是O(1)。

但是你最好只使用一个简单的Python列表。随机访问(即找到你的位置)需要恒定的时间。插入正确的位置以维持排序在理论上更昂贵,但这取决于dynamic array的实现方式。在重新分配底层数组之前,你并没有真正为插入支付高昂的代价。

关于检查日期范围重叠,我碰巧过去遇到了同样的问题。这是我使用的代码。我最初在博客文章中找到它,从SO答案链接,但该网站似乎不再存在。我实际上在我的范围中使用日期时间,但它对你的数值值同样有效。

def dt_windows_intersect(dt1start, dt1end, dt2start, dt2end):
    '''Returns true if two ranges intersect. Note that if two
    ranges are adjacent, they do not intersect.

    Code based on:
    http://beautifulisbetterthanugly.com/posts/2009/oct/7/datetime-intersection-python/
    http://stackoverflow.com/questions/143552/comparing-date-ranges  
    '''

    if dt2end <= dt1start or dt2start >= dt1end:
        return False

    return  dt1start <= dt2end and dt1end >= dt2start

以下是证明其有效的单元测试:

from nose.tools import eq_, assert_equal, raises

class test_dt_windows_intersect():
    """
    test_dt_windows_intersect
    Code based on: 
    http://beautifulisbetterthanugly.com/posts/2009/oct/7/datetime-intersection-python/
    http://stackoverflow.com/questions/143552/comparing-date-ranges  

               |-------------------|         compare to this one
    1               |---------|              contained within
    2          |----------|                  contained within, equal start
    3                  |-----------|         contained within, equal end
    4          |-------------------|         contained within, equal start+end
    5     |------------|                     overlaps start but not end
    6                      |-----------|     overlaps end but not start
    7     |------------------------|         overlaps start, but equal end
    8          |-----------------------|     overlaps end, but equal start
    9     |------------------------------|   overlaps entire range

    10 |---|                                 not overlap, less than
    11 |-------|                             not overlap, end equal
    12                              |---|    not overlap, bigger than
    13                             |---|     not overlap, start equal
    """


    def test_contained_within(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,6,30),   datetime(2009,10,1,6,40),
        )

    def test_contained_within_equal_start(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,6,0),    datetime(2009,10,1,6,30),
        )

    def test_contained_within_equal_end(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,6,30),   datetime(2009,10,1,7,0),
        )

    def test_contained_within_equal_start_and_end(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
        )

    def test_overlaps_start_but_not_end(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,5,30),   datetime(2009,10,1,6,30),
        )

    def test_overlaps_end_but_not_start(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,6,30),   datetime(2009,10,1,7,30),
        )

    def test_overlaps_start_equal_end(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,5,30),   datetime(2009,10,1,7,0),
        )

    def test_equal_start_overlaps_end(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,30),
        )

    def test_overlaps_entire_range(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,5,0),    datetime(2009,10,1,8,0),
        )

    def test_not_overlap_less_than(self):
        assert not dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,5,0),    datetime(2009,10,1,5,30),
        )

    def test_not_overlap_end_equal(self):
        assert not dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,5,0),    datetime(2009,10,1,6,0),
        )

    def test_not_overlap_greater_than(self):
        assert not dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,7,30),    datetime(2009,10,1,8,0),
        )

    def test_not_overlap_start_equal(self):
        assert not dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,7,0),    datetime(2009,10,1,8,0),
        )

答案 3 :(得分:1)

也许模块 bisect 可能比简单的跟随功能更好? :

li = [(0, 100), (150, 220), (250, 400), (500, 1000)]


def verified_insertion(x,L):
    u,v = x
    if v<L[0][0]:
        return [x] + L
    elif u>L[-1][0]:
        return L + [x]
    else:
        for i,(a,b) in enumerate(L[0:-1]):
            if a<u and v<L[i+1][0]:
                return L[0:i+1] + [x] + L[i+1:]
    return L 


lo = verified_insertion((-10,-2),li)

lu = verified_insertion((102,140),li)

le = verified_insertion((222,230),li)

lee = verified_insertion((234,236),le) # <== le

la = verified_insertion((408,450),li)

ly = verified_insertion((2000,3000),li)

for w in (lo,lu,le,lee,la,ly):
    print li,'\n',w,'\n'

该函数返回一个列表而不修改作为参数传递的列表。

结果

[(0, 100), (150, 220), (250, 400), (500, 1000)] 
[(-10, -2), (0, 100), (150, 220), (250, 400), (500, 1000)] 

[(0, 100), (150, 220), (250, 400), (500, 1000)] 
[(0, 100), (102, 140), (150, 220), (250, 400), (500, 1000)] 

[(0, 100), (150, 220), (250, 400), (500, 1000)] 
[(0, 100), (150, 220), (222, 230), (250, 400), (500, 1000)] 

[(0, 100), (150, 220), (250, 400), (500, 1000)] 
[(0, 100), (150, 220), (222, 230), (234, 236), (250, 400), (500, 1000)] 

[(0, 100), (150, 220), (250, 400), (500, 1000)] 
[(0, 100), (150, 220), (250, 400), (408, 450), (500, 1000)] 

[(0, 100), (150, 220), (250, 400), (500, 1000)] 
[(0, 100), (150, 220), (250, 400), (500, 1000), (2000, 3000)] 

答案 4 :(得分:0)

回答你的问题:

Is there a data structure like that available in Python?

不,没有。但是您可以使用列表作为基础结构和bisect模块中的代码轻松构建一个,以使列表保持顺序并检查重叠。

class RangeList(list):
"""Maintain ordered list of non-overlapping ranges"""
    def add(self, range):
    """Add a range if no overlap else reject it"""
        lo = 0; hi = len(self)
        while lo < hi:
            mid = (lo + hi)//2
            if range < self[mid]: hi = mid
            else: lo = mid + 1
        if overlaps(range, self[lo]):
            print("range overlap, not added")
        else:
            self.insert(lo, range)

我将overlaps功能作为练习。 (此代码未经测试,可能需要一些推文)