我有一个数据框,正在尝试对值进行扩展和按日期分组。
具体地说,我的数据如下:
creationDateTime OK Fail
2017-01-06 21:30:00 4 0
2017-01-06 21:35:00 4 0
2017-01-06 21:36:00 4 0
2017-01-07 21:48:00 3 1
2017-01-07 21:53:00 4 0
2017-01-08 21:22:00 3 1
2017-01-08 21:27:00 3 1
2017-01-09 21:49:00 3 1
我正在尝试获得类似于以下内容的东西:
creationDateTime OK Fail RollingOK RollingFail
2017-01-06 21:30:00 4 0 4 0
2017-01-06 21:35:00 4 0 8 0
2017-01-06 21:36:00 4 0 12 0
2017-01-07 21:48:00 3 1 3 1
2017-01-07 21:53:00 4 0 7 1
2017-01-08 21:22:00 3 1 3 1
2017-01-08 21:27:00 3 1 6 2
2017-01-09 21:49:00 3 1 3 1
我已经弄清楚了如何使用以下方法对值进行滚动求和:
data_aggregated['RollingOK'] = data_aggregated['OK'].expanding(0).sum()
data_aggregated['RollingFail'] = data_aggregated['Fail'].expanding(0).sum()
但是我不确定如何更改此值以将滚动总和按天分组,因为上面的代码对所有行进行了滚动总和,而没有按天分组。
非常感谢您的帮助。
答案 0 :(得分:2)
您可以使用(如果第一列:creationDateTime
是一列):
df['RollingOK']=df.groupby(df.creationDateTime.dt.date)['OK'].cumsum()
df['RollingFail']=df.groupby(df.creationDateTime.dt.date)['Fail'].cumsum()
print(df)
creationDateTime OK Fail RollingOK RollingFail
0 2017-01-06 21:30:00 4 0 4 0
1 2017-01-06 21:35:00 4 0 8 0
2 2017-01-06 21:36:00 4 0 12 0
3 2017-01-07 21:48:00 3 1 3 1
4 2017-01-07 21:53:00 4 0 7 1
5 2017-01-08 21:22:00 3 1 3 1
6 2017-01-08 21:27:00 3 1 6 2
7 2017-01-09 21:49:00 3 1 3 1
答案 1 :(得分:2)
将DataFrameGroupBy.cumsum
与groupby
之后的指定列一起使用:
#if DatetimeIndex
idx = data_aggregated.index.date
#if column
#idx = data_aggregated['creationDateTime'].dt.date
data_aggregated[['RollingOK','RollingFail']] = (data_aggregated.groupby(idx)['OK','Fail']
.cumsum())
print (data_aggregated)
OK Fail RollingOK RollingFail
creationDateTime
2017-01-06 21:30:00 4 0 4 0
2017-01-06 21:35:00 4 0 8 0
2017-01-06 21:36:00 4 0 12 0
2017-01-07 21:48:00 3 1 3 1
2017-01-07 21:53:00 4 0 7 1
2017-01-08 21:22:00 3 1 3 1
2017-01-08 21:27:00 3 1 6 2
2017-01-09 21:49:00 3 1 3 1
您还可以处理所有列:
data_aggregated = (data_aggregated.join(data_aggregated.groupby(idx)
.cumsum()
.add_prefix('Rolling')))
print (data_aggregated)
OK Fail RollingOK RollingFail
creationDateTime
2017-01-06 21:30:00 4 0 4 0
2017-01-06 21:35:00 4 0 8 0
2017-01-06 21:36:00 4 0 12 0
2017-01-07 21:48:00 3 1 3 1
2017-01-07 21:53:00 4 0 7 1
2017-01-08 21:22:00 3 1 3 1
2017-01-08 21:27:00 3 1 6 2
2017-01-09 21:49:00 3 1 3 1
您的解决方案应更改:
data_aggregated[['RollingOK','RollingFail']] = (data_aggregated.groupby(idx)['OK','Fail']
.expanding(0)
.sum()
.reset_index(level=0, drop=True))