我正在尝试通过对几列进行分组来找到滚动平均值。下面是我的数据集的样子:
category, sub_category,value
fruit, apple, 10
fruit, apple, 2
fruit, apple, 5
fruit, apple, 1
fruit, banana, 3
fruit, orange, 5
fruit, orange, 5
fruit, orange, 3
fruit, orange, 8
预期输出:
category, sub_category,value, rolling_average
fruit, apple, 10, 10
fruit, apple, 2, 6
fruit, apple, 5, 5.66
fruit, apple, 1, 2.66
fruit, banana, 3, 3
fruit, orange, 5, 5
fruit, orange, 5, 5
fruit, orange, 3, 4.33
fruit, orange, 8, 5.33
我能够在没有任何分组的情况下执行滚动平均,但是不确定如何在同一数据帧内按分组进行分组
答案 0 :(得分:2)
我相信您每组需要Expanding.mean
:
df['expanding_average'] = (df.groupby(['category', 'sub_category'])['value']
.expanding()
.mean()
.reset_index(level=[0,1], drop=True))
print (df)
category sub_category value expanding_average
0 fruit apple 10 10.000000
1 fruit apple 2 6.000000
2 fruit apple 5 5.666667
3 fruit apple 1 4.500000
4 fruit banana 3 3.000000
5 fruit orange 5 5.000000
6 fruit orange 5 5.000000
7 fruit orange 3 4.333333
8 fruit orange 8 5.250000
N=3
的滚动平均值解决方案:
df['rolling_average'] = (df.groupby(['category', 'sub_category'])['value']
.rolling(3, min_periods=1)
.mean()
.reset_index(level=[0,1], drop=True))
print (df)
category sub_category value rolling_average
0 fruit apple 10 10.000000
1 fruit apple 2 6.000000
2 fruit apple 5 5.666667
3 fruit apple 1 2.666667
4 fruit banana 3 3.000000
5 fruit orange 5 5.000000
6 fruit orange 5 5.000000
7 fruit orange 3 4.333333
8 fruit orange 8 5.333333