如何使用Pandas计算具有设置起点/终点的均匀间隔之间的出现次数?

时间:2018-03-20 15:14:48

标签: python pandas time-series

我正在尝试使用一系列日期创建过去一天的事件时间表。每个日期都被视为事件的发生。出现次数需要按小时分组。我需要在时间轴中包含零值。

示例数据

items = [
    datetime.datetime(2018, 3, 19, 16, 51, 48),
    datetime.datetime(2018, 3, 19, 17, 25, 19),
    datetime.datetime(2018, 3, 20, 6, 33, 35),
    datetime.datetime(2018, 3, 19, 23, 21, 35),
    datetime.datetime(2018, 3, 19, 15, 8, 41),
    datetime.datetime(2018, 3, 19, 21, 44, 16),
    datetime.datetime(2018, 3, 19, 18, 21, 28),
    datetime.datetime(2018, 3, 20, 7, 20, 22),
    datetime.datetime(2018, 3, 20, 11, 15, 43)
]

现在,我有它工作,但这不是正确的方法来做到这一点。有什么建议吗?

当前解决方案

import pandas as pd

def _generate_timeseries(items, start_ts, end_ts):
    # add start/end times to the data
    items.insert(0, start_ts)
    items.append(end_ts)

    # value each datetime as one occurrence
    data = [1 for x in range(len(items))]
    timeseries = pd.Series(data, index=items)
    hourly_data = timeseries.resample('H').sum()

    timeline = hourly_data.tolist()
    return [{'mentions': x} for x in timeline[1:-1]]

结果示例

timeline =[
    {'mentions': 4}, {'mentions': 2}, {'mentions': 1}, {'mentions': 0}, {'mentions': 3}, {'mentions': 2}, {'mentions': 2}, {'mentions': 1}, {'mentions': 1}, {'mentions': 0}, {'mentions': 1}, {'mentions': 0}, {'mentions': 14}, {'mentions': 1}, {'mentions': 4}, {'mentions': 2}, {'mentions': 3}, {'mentions': 2}, {'mentions': 1}, {'mentions': 2}, {'mentions': 6}, {'mentions': 2}, {'mentions': 2}
]

1 个答案:

答案 0 :(得分:1)

IIUC,你可以用这个:

df = pd.DataFrame({'Event':items})
df.groupby(pd.Grouper(key='Event',freq='H'))['Event'].count()

输出:

Event
2018-03-19 15:00:00    1
2018-03-19 16:00:00    1
2018-03-19 17:00:00    1
2018-03-19 18:00:00    1
2018-03-19 19:00:00    0
2018-03-19 20:00:00    0
2018-03-19 21:00:00    1
2018-03-19 22:00:00    0
2018-03-19 23:00:00    1
2018-03-20 00:00:00    0
2018-03-20 01:00:00    0
2018-03-20 02:00:00    0
2018-03-20 03:00:00    0
2018-03-20 04:00:00    0
2018-03-20 05:00:00    0
2018-03-20 06:00:00    1
2018-03-20 07:00:00    1
2018-03-20 08:00:00    0
2018-03-20 09:00:00    0
2018-03-20 10:00:00    0
2018-03-20 11:00:00    1
Freq: H, Name: Event, dtype: int64

编辑以获得完整的日期:

df.groupby(pd.Grouper(key='Event',freq='H'))['Event'].count()
  .reindex(pd.date_range(df.Event.dt.floor('D').min(), 
                         df.Event.dt.ceil('D').max(), 
                         freq='H')).fillna(0)

输出:

2018-03-19 00:00:00    0.0
2018-03-19 01:00:00    0.0
2018-03-19 02:00:00    0.0
2018-03-19 03:00:00    0.0
2018-03-19 04:00:00    0.0
2018-03-19 05:00:00    0.0
2018-03-19 06:00:00    0.0
2018-03-19 07:00:00    0.0
2018-03-19 08:00:00    0.0
2018-03-19 09:00:00    0.0
2018-03-19 10:00:00    0.0
2018-03-19 11:00:00    0.0
2018-03-19 12:00:00    0.0
2018-03-19 13:00:00    0.0
2018-03-19 14:00:00    0.0
2018-03-19 15:00:00    1.0
2018-03-19 16:00:00    1.0
2018-03-19 17:00:00    1.0
2018-03-19 18:00:00    1.0
2018-03-19 19:00:00    0.0
2018-03-19 20:00:00    0.0
2018-03-19 21:00:00    1.0
2018-03-19 22:00:00    0.0
2018-03-19 23:00:00    1.0
2018-03-20 00:00:00    0.0
2018-03-20 01:00:00    0.0
2018-03-20 02:00:00    0.0
2018-03-20 03:00:00    0.0
2018-03-20 04:00:00    0.0
2018-03-20 05:00:00    0.0
2018-03-20 06:00:00    1.0
2018-03-20 07:00:00    1.0
2018-03-20 08:00:00    0.0
2018-03-20 09:00:00    0.0
2018-03-20 10:00:00    0.0
2018-03-20 11:00:00    1.0
2018-03-20 12:00:00    0.0
2018-03-20 13:00:00    0.0
2018-03-20 14:00:00    0.0
2018-03-20 15:00:00    0.0
2018-03-20 16:00:00    0.0
2018-03-20 17:00:00    0.0
2018-03-20 18:00:00    0.0
2018-03-20 19:00:00    0.0
2018-03-20 20:00:00    0.0
2018-03-20 21:00:00    0.0
2018-03-20 22:00:00    0.0
2018-03-20 23:00:00    0.0
2018-03-21 00:00:00    0.0
Freq: H, Name: Event, dtype: float64