我正在尝试按日期和产品查找最近5天的平均值。以下是我的数据框的外观:
df=pd.DataFrame({
'day':['day_1','day_2','day_3','day_4','day_5','day_2','day_3','day_4','day_5','day_6','day_1'],
'product':['prod_a','prod_a','prod_a','prod_a','prod_a','prod_b','prod_b','prod_b','prod_b','prod_b','prod_b'],
'sale':[10,15,4,17,12,1,50,70,30,70,10]
})
要按产品按天查找最近5天的平均值,我做了以下操作:
df_average = df.groupby(['day', 'product']).tail(5).groupby(['day', 'product']).mean()
执行上述操作只会返回当天该产品当天的实际价值,而不会取最近5天的平均值。
预期输出:
day, product, sale, last_5_average
day_1, prod_a , 10, 11.6
day_2, prod_a , 15, 12
day_3, prod_a , 4, 11
day_4, prod_a , 17, 14.5
day_5, prod_a , 12, 12
day_1, prod_b , 1, 44.2
day_2, prod_b , 50, 54
day_3, prod_b , 70, 55
day_4, prod_b , 30, 50
day_5, prod_b , 70, 60
day_6, prod_c , 50, 50
答案 0 :(得分:2)
我希望这会有所帮助!
#original data frame
df=pd.DataFrame({
'day':['day_1','day_2','day_3','day_4','day_5','day_2','day_3','day_4','day_5','day_6','day_1'],
'product':['prod_a','prod_a','prod_a','prod_a','prod_a','prod_b','prod_b','prod_b','prod_b','prod_b','prod_b'],
'sale':[10,15,4,17,12,1,50,70,30,70,10]
})
#sort by product and day
df=df.sort_values(by=['product','day'])
#drop the sorted index
df=df.reset_index(drop=True)
#take rolling past 5 record's mean by product group
df['rolling_mean_sale']=df.groupby('product')['sale'].rolling(5).mean().reset_index()['sale']