我试图在大熊猫中进行分组,移动和滚动。我一直在寻找解决方案,但是没有运气。我有一个解决方法,但这不是最好的方法。特别是因为将来我需要滚动标准偏差。有人可以帮助我,并提出一种更好的方法吗?
输入数据:
df = pd.DataFrame({'month': [201912, 202001, 202001, 202002, 202002, 202003, 202003, 202004],
'target': [0, 1, 0, 1, 1, 0, 0, 1]
},
index = [14, 15, 16, 17, 18, 19, 20, 21])
2个月滚动平均值和1个月班次的预期产量:
df = pd.DataFrame({'month': [201912, 202001, 202002, 202003, 202004],
'roll_2m': [np.nan, np.nan, 0.33, 0.75, 0.5]
},
index = [1, 2, 3, 4, 5])
解决此问题的错误解决方法是:
rolling_count = df.shift(1).target['count'].rolling(2).sum()
rolling_sum = df.shift(1).target['sum'].rolling(2).sum()
rolling_mean = rolling_sum/rolling_count
df['roll_2m'] = rolling_mean
答案 0 :(得分:2)
我首先要做一个groupby().agg()
:
(df.groupby('month').target.agg(['sum','count'])
.rolling(2)
.sum().shift()
.assign(roll_2m=lambda x: x['sum']/x['count'])
)
输出:
sum count roll_2m
month
201912 NaN NaN NaN
202001 NaN NaN NaN
202002 1.0 3.0 0.333333
202003 3.0 4.0 0.750000
202004 2.0 4.0 0.500000