熊猫-过去5天的平均实际回报

时间:2020-06-29 03:50:28

标签: pandas pandas-groupby

我正在尝试按日期和产品查找最近5天的平均值。以下是我的数据框的外观:

 df=pd.DataFrame({
    'day':['day_1','day_2','day_3','day_4','day_5','day_2','day_3','day_4','day_5','day_6','day_1'],
    'product':['prod_a','prod_a','prod_a','prod_a','prod_a','prod_b','prod_b','prod_b','prod_b','prod_b','prod_b'],
    'sale':[10,15,4,17,12,1,50,70,30,70,10]   
})

要按产品按天查找最近5天的平均值,我做了以下操作:

df_average = df.groupby(['day', 'product']).tail(5).groupby(['day', 'product']).mean()

执行上述操作只会返回当天该产品当天的实际价值,而不会取最近5天的平均值。

预期输出:

day, product, sale, last_5_average
day_1, prod_a , 10, 11.6
day_2, prod_a , 15, 12
day_3, prod_a , 4, 11
day_4, prod_a , 17, 14.5
day_5, prod_a , 12, 12
day_1, prod_b , 1, 44.2
day_2, prod_b , 50, 54
day_3, prod_b , 70, 55
day_4, prod_b , 30, 50
day_5, prod_b , 70, 60
day_6, prod_c , 50, 50

1 个答案:

答案 0 :(得分:2)

我希望这会有所帮助!

#original data frame


  df=pd.DataFrame({
    'day':['day_1','day_2','day_3','day_4','day_5','day_2','day_3','day_4','day_5','day_6','day_1'],
    'product':['prod_a','prod_a','prod_a','prod_a','prod_a','prod_b','prod_b','prod_b','prod_b','prod_b','prod_b'],
    'sale':[10,15,4,17,12,1,50,70,30,70,10]   
})
   

 
#sort by product and day 
df=df.sort_values(by=['product','day'])
#drop the sorted index 
df=df.reset_index(drop=True)

#take rolling past 5 record's mean by product group
df['rolling_mean_sale']=df.groupby('product')['sale'].rolling(5).mean().reset_index()['sale']