按另一个日期列表拆分日期列表

时间:2009-09-02 16:34:17

标签: python performance

我在网络中有许多节点。节点每小时发送一次状态信息,表明它们还活着。所以我有一个节点列表和它们最后一次活着的时间。我想绘制一段时间内活动节点的数量。

节点列表按照它们最后一次存活的时间排序,但是我无法找出计算每个日期有多少节点的好方法。

from datetime import datetime, timedelta

seen = [ n.last_seen for n in c.nodes ] # a list of datetimes
seen.sort()
start = seen[0]
end = seen[-1]

diff = end - start
num_points = 100

step = diff / num_points

num = len( c.nodes )

dates = [ start + i * step for i in range( num_points ) ]

我想要的基本上是

alive = [ len([ s for s in seen if s > date]) for date in dates ]

但那不是很有效率。解决方案应该使用seen列表已排序的事实,而不是在每个日期的整个列表中循环。

3 个答案:

答案 0 :(得分:2)

此生成器仅遍历列表一次:

def get_alive(seen, dates):
    c = len(seen)
    for date in dates:
        for s in seen[-c:]:
            if s >= date:      # replaced your > for >= as it seems to make more sense
                yield c
                break
            else:
                c -= 1

答案 1 :(得分:1)

python bisect module会为您找到正确的索引,您可以扣除之前和之后的项目数。

如果我理解正确,那就是O(日期)* O(log(see))


编辑1

应该可以一次性完成,就像SilentGhost演示一样。但是,itertools.groupby对排序数据工作正常,它应该可以在这里做一些事情,也许是这样的(这可能超过O(n)但可以改进):

import itertools

# numbers are easier to make up now
seen = [-1, 10, 12, 15, 20, 75]
dates = [5, 15, 25, 50, 100]

def finddate(s, dates):
    """Find the first date in @dates larger than s"""
    for date in dates:
        if s < date:
            break
    return date


for date, group in itertools.groupby(seen, key=lambda s: finddate(s, dates)):
    print date, list(group)

答案 2 :(得分:1)

我使用显式迭代器进一步使用了SilentGhosts生成器解决方案。这是我想到的线性时间解决方案。

def splitter( items, breaks ):
    """ assuming `items` and `breaks` are sorted """
    c = len( items )

    items = iter(items)
    item = items.next()
    breaks = iter(breaks)
    breaker = breaks.next()

    while True:
        if breaker > item:
            for it in items:
                c -= 1
                if it >= breaker:
                    item = it
                    yield c
                    break
            else:# no item left that is > the current breaker
                yield 0 # 0 items left for the current breaker
                # and 0 items left for all other breaks, since they are > the current
                for _ in breaks:
                    yield 0 
                break # and done
        else:
            yield c
            for br in breaks:
                if br > item:
                    breaker = br
                    break
                yield c
            else:
                # there is no break > any item in the list
                break