您如何按给定持续时间的块分组时间戳?

时间:2012-06-04 02:27:46

标签: python grouping

假设您有一个n个时间戳(Python datetime个对象)的排序列表。如何生成(t, count)形式的元组列表,其中tdatetime对象,count是列表中元素的数量最多{{1}来自x的分钟?

例如,给定日期(字符串,为简洁起见;实际上是t个对象):

datetime

如果timestamps = ["13:00", "13:01", "13:03", "13:04", "13:05", "13:06", "13:09"] 是两分钟,那么屈服

x

我要做的是在资源上制作一个更粗略的命中列表,我唯一拥有的数据是每次命中的访问时间(粒度化为毫秒级,我希望它能够精确到达分钟,或十分钟)

我会发布我的尝试,但我感到惭愧......

编辑:这是我到目前为止...测试是否有效...

[("13:00", 2), ("13:03":3), ("13:06":1), ("13:09", 1)]]

3 个答案:

答案 0 :(得分:3)

这应该有效:

current = timestamps[0]
count = 0
res = []
for t in timestamps:
    if (t - current) <=  timedelta(minutes= 2): 
         count = count + 1
    else:
         res.append((current,count))
         current = t
         count = 1
res.append(current,count) #add last tuple

按照你的例子:

timestamps = [datetime(hours=13,minutes=00), datetime(hours=13,minutes=01), datetime(hours=13,minutes=03), datetime(hours=13,minutes=04), datetime(hours=13,minutes=05), datetime(hours=13,minutes=06), datetime(hours=13,minutes=09)]

res = [(datetime(hours=13,minutes=00),2),(datetime(hours=13,minutes=03),3),(datetime(hours=13,minutes=06),1),(datetime(hours=13,minutes=09),1)]

答案 1 :(得分:1)

这是我的解决方案版本:

from datetime import datetime

# SAMPLE TIMESTAMP DATA
timestamps = []
timestamps.append(datetime.utcfromtimestamp(1338777480))
timestamps.append(datetime.utcfromtimestamp(1338777580))
timestamps.append(datetime.utcfromtimestamp(1338777610))
timestamps.append(datetime.utcfromtimestamp(1338777680))
timestamps.append(datetime.utcfromtimestamp(1338777780))
timestamps.append(datetime.utcfromtimestamp(1338777980))
timestamps.append(datetime.utcfromtimestamp(1338778180))
timestamps.append(datetime.utcfromtimestamp(1338778230))
timestamps.append(datetime.utcfromtimestamp(1338778480))

MIN_THRSH = 2  # Range in minutes within to chunk data.

def chunk_time(timestamp_list):
    chunk_list = []
    current_chunk_idx = None
    for i, dt in enumerate(timestamp_list):
        if (i == 0 or
            ((dt - timestamp_list[current_chunk_idx]).seconds / 60) > MIN_THRSH):
            chunk_list.append([dt.strftime('%H:%M'), 1])
            current_chunk_idx = i
        else:
            chunk_list[-1][1] += 1
    return chunk_list

if __name__ == "__main__":
    for t in timestamps:
        print t.strftime('%H:%M')
    print chunk_time(timestamps)

输出:

02:38
02:39
02:40
02:41
02:43
02:46
02:49
02:50
02:54
[['02:38', 3], ['02:41', 2], ['02:46', 1], ['02:49', 2], ['02:54', 1]]

答案 2 :(得分:0)

如果您只需要计数,则可以在unix时间戳上使用直方图。例如numpy.histogram