我正在尝试使用一系列日期创建过去一天的事件时间表。每个日期都被视为事件的发生。出现次数需要按小时分组。我需要在时间轴中包含零值。
示例数据
items = [
datetime.datetime(2018, 3, 19, 16, 51, 48),
datetime.datetime(2018, 3, 19, 17, 25, 19),
datetime.datetime(2018, 3, 20, 6, 33, 35),
datetime.datetime(2018, 3, 19, 23, 21, 35),
datetime.datetime(2018, 3, 19, 15, 8, 41),
datetime.datetime(2018, 3, 19, 21, 44, 16),
datetime.datetime(2018, 3, 19, 18, 21, 28),
datetime.datetime(2018, 3, 20, 7, 20, 22),
datetime.datetime(2018, 3, 20, 11, 15, 43)
]
现在,我有它工作,但这不是正确的方法来做到这一点。有什么建议吗?
当前解决方案
import pandas as pd
def _generate_timeseries(items, start_ts, end_ts):
# add start/end times to the data
items.insert(0, start_ts)
items.append(end_ts)
# value each datetime as one occurrence
data = [1 for x in range(len(items))]
timeseries = pd.Series(data, index=items)
hourly_data = timeseries.resample('H').sum()
timeline = hourly_data.tolist()
return [{'mentions': x} for x in timeline[1:-1]]
结果示例
timeline =[
{'mentions': 4}, {'mentions': 2}, {'mentions': 1}, {'mentions': 0}, {'mentions': 3}, {'mentions': 2}, {'mentions': 2}, {'mentions': 1}, {'mentions': 1}, {'mentions': 0}, {'mentions': 1}, {'mentions': 0}, {'mentions': 14}, {'mentions': 1}, {'mentions': 4}, {'mentions': 2}, {'mentions': 3}, {'mentions': 2}, {'mentions': 1}, {'mentions': 2}, {'mentions': 6}, {'mentions': 2}, {'mentions': 2}
]
答案 0 :(得分:1)
IIUC,你可以用这个:
df = pd.DataFrame({'Event':items})
df.groupby(pd.Grouper(key='Event',freq='H'))['Event'].count()
输出:
Event
2018-03-19 15:00:00 1
2018-03-19 16:00:00 1
2018-03-19 17:00:00 1
2018-03-19 18:00:00 1
2018-03-19 19:00:00 0
2018-03-19 20:00:00 0
2018-03-19 21:00:00 1
2018-03-19 22:00:00 0
2018-03-19 23:00:00 1
2018-03-20 00:00:00 0
2018-03-20 01:00:00 0
2018-03-20 02:00:00 0
2018-03-20 03:00:00 0
2018-03-20 04:00:00 0
2018-03-20 05:00:00 0
2018-03-20 06:00:00 1
2018-03-20 07:00:00 1
2018-03-20 08:00:00 0
2018-03-20 09:00:00 0
2018-03-20 10:00:00 0
2018-03-20 11:00:00 1
Freq: H, Name: Event, dtype: int64
df.groupby(pd.Grouper(key='Event',freq='H'))['Event'].count()
.reindex(pd.date_range(df.Event.dt.floor('D').min(),
df.Event.dt.ceil('D').max(),
freq='H')).fillna(0)
输出:
2018-03-19 00:00:00 0.0
2018-03-19 01:00:00 0.0
2018-03-19 02:00:00 0.0
2018-03-19 03:00:00 0.0
2018-03-19 04:00:00 0.0
2018-03-19 05:00:00 0.0
2018-03-19 06:00:00 0.0
2018-03-19 07:00:00 0.0
2018-03-19 08:00:00 0.0
2018-03-19 09:00:00 0.0
2018-03-19 10:00:00 0.0
2018-03-19 11:00:00 0.0
2018-03-19 12:00:00 0.0
2018-03-19 13:00:00 0.0
2018-03-19 14:00:00 0.0
2018-03-19 15:00:00 1.0
2018-03-19 16:00:00 1.0
2018-03-19 17:00:00 1.0
2018-03-19 18:00:00 1.0
2018-03-19 19:00:00 0.0
2018-03-19 20:00:00 0.0
2018-03-19 21:00:00 1.0
2018-03-19 22:00:00 0.0
2018-03-19 23:00:00 1.0
2018-03-20 00:00:00 0.0
2018-03-20 01:00:00 0.0
2018-03-20 02:00:00 0.0
2018-03-20 03:00:00 0.0
2018-03-20 04:00:00 0.0
2018-03-20 05:00:00 0.0
2018-03-20 06:00:00 1.0
2018-03-20 07:00:00 1.0
2018-03-20 08:00:00 0.0
2018-03-20 09:00:00 0.0
2018-03-20 10:00:00 0.0
2018-03-20 11:00:00 1.0
2018-03-20 12:00:00 0.0
2018-03-20 13:00:00 0.0
2018-03-20 14:00:00 0.0
2018-03-20 15:00:00 0.0
2018-03-20 16:00:00 0.0
2018-03-20 17:00:00 0.0
2018-03-20 18:00:00 0.0
2018-03-20 19:00:00 0.0
2018-03-20 20:00:00 0.0
2018-03-20 21:00:00 0.0
2018-03-20 22:00:00 0.0
2018-03-20 23:00:00 0.0
2018-03-21 00:00:00 0.0
Freq: H, Name: Event, dtype: float64