按日期分组汇总

时间:2019-03-19 09:13:58

标签: python-3.x pandas pandas-groupby

我有一个数据框,正在尝试对值进行扩展和按日期分组。

具体地说,我的数据如下:

creationDateTime    OK  Fail    
2017-01-06 21:30:00 4   0
2017-01-06 21:35:00 4   0
2017-01-06 21:36:00 4   0

2017-01-07 21:48:00 3   1
2017-01-07 21:53:00 4   0

2017-01-08 21:22:00 3   1
2017-01-08 21:27:00 3   1

2017-01-09 21:49:00 3   1

我正在尝试获得类似于以下内容的东西:

creationDateTime    OK  Fail  RollingOK  RollingFail
2017-01-06 21:30:00 4   0     4          0
2017-01-06 21:35:00 4   0     8          0
2017-01-06 21:36:00 4   0     12         0

2017-01-07 21:48:00 3   1     3          1
2017-01-07 21:53:00 4   0     7          1

2017-01-08 21:22:00 3   1     3          1
2017-01-08 21:27:00 3   1     6          2

2017-01-09 21:49:00 3   1     3          1

我已经弄清楚了如何使用以下方法对值进行滚动求和:

data_aggregated['RollingOK'] = data_aggregated['OK'].expanding(0).sum()       
data_aggregated['RollingFail'] = data_aggregated['Fail'].expanding(0).sum()

但是我不确定如何更改此值以将滚动总和按天分组,因为上面的代码对所有行进行了滚动总和,而没有按天分组。

非常感谢您的帮助。

2 个答案:

答案 0 :(得分:2)

您可以使用(如果第一列:creationDateTime是一列):

df['RollingOK']=df.groupby(df.creationDateTime.dt.date)['OK'].cumsum()
df['RollingFail']=df.groupby(df.creationDateTime.dt.date)['Fail'].cumsum()
print(df)

    creationDateTime  OK  Fail  RollingOK  RollingFail
0 2017-01-06 21:30:00  4   0     4          0          
1 2017-01-06 21:35:00  4   0     8          0          
2 2017-01-06 21:36:00  4   0     12         0          
3 2017-01-07 21:48:00  3   1     3          1          
4 2017-01-07 21:53:00  4   0     7          1          
5 2017-01-08 21:22:00  3   1     3          1          
6 2017-01-08 21:27:00  3   1     6          2          
7 2017-01-09 21:49:00  3   1     3          1   

答案 1 :(得分:2)

DataFrameGroupBy.cumsumgroupby之后的指定列一起使用:

#if DatetimeIndex
idx = data_aggregated.index.date
#if column
#idx = data_aggregated['creationDateTime'].dt.date
data_aggregated[['RollingOK','RollingFail']] = (data_aggregated.groupby(idx)['OK','Fail']
                                                               .cumsum())
print (data_aggregated)
                     OK  Fail  RollingOK  RollingFail
creationDateTime                                     
2017-01-06 21:30:00   4     0          4            0
2017-01-06 21:35:00   4     0          8            0
2017-01-06 21:36:00   4     0         12            0
2017-01-07 21:48:00   3     1          3            1
2017-01-07 21:53:00   4     0          7            1
2017-01-08 21:22:00   3     1          3            1
2017-01-08 21:27:00   3     1          6            2
2017-01-09 21:49:00   3     1          3            1

您还可以处理所有列:

data_aggregated = (data_aggregated.join(data_aggregated.groupby(idx)
                                                       .cumsum()
                                                       .add_prefix('Rolling')))
print (data_aggregated)
                     OK  Fail  RollingOK  RollingFail
creationDateTime                                     
2017-01-06 21:30:00   4     0          4            0
2017-01-06 21:35:00   4     0          8            0
2017-01-06 21:36:00   4     0         12            0
2017-01-07 21:48:00   3     1          3            1
2017-01-07 21:53:00   4     0          7            1
2017-01-08 21:22:00   3     1          3            1
2017-01-08 21:27:00   3     1          6            2
2017-01-09 21:49:00   3     1          3            1

您的解决方案应更改:

data_aggregated[['RollingOK','RollingFail']] = (data_aggregated.groupby(idx)['OK','Fail']
                                                           .expanding(0)
                                                           .sum()
                                                           .reset_index(level=0, drop=True))