以 15 分钟为间隔聚合时间序列数据帧

时间:2021-04-25 10:53:06

标签: python pandas

我正在组合一堆不同的数据集来创建一个聚合,以 15 分钟为间隔进行分析。

我目前拥有的数据框如下所示,

<bound method NDFrame.to_clipboard of                        id                       user_id  sentiment  magnitude  \
2020-10-04 14:06:00  10.0  cPL1Fg7BqRXvSFKeU1mJT7KCCTq2       -0.1        0.1   
2020-10-04 14:06:05  11.0  cPL1Fg7BqRXvSFKeU1mJT7KCCTq2       -0.8        0.8   
2020-10-05 12:28:58  12.0  cPL1Fg7BqRXvSFKeU1mJT7KCCTq2       -0.2        0.2   
2020-10-05 12:29:16  13.0  cPL1Fg7BqRXvSFKeU1mJT7KCCTq2       -0.2        0.2   
2020-10-05 12:29:31  14.0  cPL1Fg7BqRXvSFKeU1mJT7KCCTq2        0.2        0.2   

                     angry  disgusted  fearful  happy  neutral  sad  \
2020-10-04 14:06:00    NaN        NaN      NaN    NaN      NaN  NaN   
2020-10-04 14:06:05    NaN        NaN      NaN    NaN      NaN  NaN   
2020-10-05 12:28:58    NaN        NaN      NaN    NaN      NaN  NaN   
2020-10-05 12:29:16    NaN        NaN      NaN    NaN      NaN  NaN   
2020-10-05 12:29:31    NaN        NaN      NaN    NaN      NaN  NaN   

                     surprised  heartRate  steps  
2020-10-04 14:06:00        NaN        NaN    NaN  
2020-10-04 14:06:05        NaN        NaN    NaN  
2020-10-05 12:28:58        NaN        NaN    NaN  
2020-10-05 12:29:16        NaN        NaN    NaN  
2020-10-05 12:29:31        NaN        NaN    NaN  >

我想将数据帧聚合为 15 分钟的时间间隔。

我认为 groupby 是最好的方法?但我发现很难让它特别好地工作:/

提前致谢,

1 个答案:

答案 0 :(得分:1)

有两种选择,我们可以使用 resample 或 pd.Grouper(这是高性能的)。

让我分享 pd.Grouper 的示例,以添加 15 分钟间隔的列值。

代码

pd.DataFrame(df.groupby([pd.Grouper(key='date', freq='15Min')]).sum()).reset_index()

输入数据样本

    date                 id
0   2020-10-04 14:06:00 10.0
1   2020-10-04 14:06:05 11.0
2   2020-10-05 12:28:58 12.0
3   2020-10-05 12:29:16 13.0
4   2020-10-05 12:29:31 14.0

输出

    date           id
0   2020-10-04 14:00:00 21.0
1   2020-10-04 14:15:00 0.0
2   2020-10-04 14:30:00 0.0
3   2020-10-04 14:45:00 0.0
4   2020-10-04 15:00:00 0.0
相关问题