我正在组合一堆不同的数据集来创建一个聚合,以 15 分钟为间隔进行分析。
我目前拥有的数据框如下所示,
<bound method NDFrame.to_clipboard of id user_id sentiment magnitude \
2020-10-04 14:06:00 10.0 cPL1Fg7BqRXvSFKeU1mJT7KCCTq2 -0.1 0.1
2020-10-04 14:06:05 11.0 cPL1Fg7BqRXvSFKeU1mJT7KCCTq2 -0.8 0.8
2020-10-05 12:28:58 12.0 cPL1Fg7BqRXvSFKeU1mJT7KCCTq2 -0.2 0.2
2020-10-05 12:29:16 13.0 cPL1Fg7BqRXvSFKeU1mJT7KCCTq2 -0.2 0.2
2020-10-05 12:29:31 14.0 cPL1Fg7BqRXvSFKeU1mJT7KCCTq2 0.2 0.2
angry disgusted fearful happy neutral sad \
2020-10-04 14:06:00 NaN NaN NaN NaN NaN NaN
2020-10-04 14:06:05 NaN NaN NaN NaN NaN NaN
2020-10-05 12:28:58 NaN NaN NaN NaN NaN NaN
2020-10-05 12:29:16 NaN NaN NaN NaN NaN NaN
2020-10-05 12:29:31 NaN NaN NaN NaN NaN NaN
surprised heartRate steps
2020-10-04 14:06:00 NaN NaN NaN
2020-10-04 14:06:05 NaN NaN NaN
2020-10-05 12:28:58 NaN NaN NaN
2020-10-05 12:29:16 NaN NaN NaN
2020-10-05 12:29:31 NaN NaN NaN >
我想将数据帧聚合为 15 分钟的时间间隔。
我认为 groupby 是最好的方法?但我发现很难让它特别好地工作:/
提前致谢,
答案 0 :(得分:1)
有两种选择,我们可以使用 resample 或 pd.Grouper(这是高性能的)。
让我分享 pd.Grouper 的示例,以添加 15 分钟间隔的列值。
代码
pd.DataFrame(df.groupby([pd.Grouper(key='date', freq='15Min')]).sum()).reset_index()
输入数据样本
date id
0 2020-10-04 14:06:00 10.0
1 2020-10-04 14:06:05 11.0
2 2020-10-05 12:28:58 12.0
3 2020-10-05 12:29:16 13.0
4 2020-10-05 12:29:31 14.0
输出
date id
0 2020-10-04 14:00:00 21.0
1 2020-10-04 14:15:00 0.0
2 2020-10-04 14:30:00 0.0
3 2020-10-04 14:45:00 0.0
4 2020-10-04 15:00:00 0.0