如何计算每种产品的滚动平均值?

时间:2019-04-12 08:37:00

标签: python pandas pandas-groupby

我在熊猫的数据框中有前三列。我要计算第4列中显示的每种产品的3天移动平均值。

数据

print (df)
       Date     Product  Demand  mov Avg
0  1-Jan-19  Product-01       3      NaN
1  2-Jan-19  Product-01       4      NaN
2  3-Jan-19  Product-01       5      4.0
3  4-Jan-19  Product-01       6      5.0
4  5-Jan-19  Product-01       7      6.0
5  3-Jan-19  Product-02       2      NaN
6  4-Jan-19  Product-02       3      NaN
7  5-Jan-19  Product-02       4      3.0
8  6-Jan-19  Product-02       5      4.0
9  7-Jan-19  Product-02       8      5.7

我尝试使用groupby和滚动均值,但似乎不起作用。

df['mov_avg'] =df.set_index('Date').groupby('Product').rolling('Demand',window=7).mean().reset_index(drop=True)

1 个答案:

答案 0 :(得分:1)

使用:

df['Date'] = pd.to_datetime(df['Date'], format='%d-%b-%y')

您的解决方案应通过rolling(3, freq='d')进行更改:

#sorting if not sorted DataFrame by both columns
df = df.sort_values(['Date','Product']).reset_index(drop=True)

df['mov_avg'] = (df.set_index('Date')
                   .groupby('Product')['Demand']
                   .rolling(3, freq='d')
                   .mean()
                   .reset_index(drop=True))

另一个更好的解决方案是使用DataFrame.join

s = df.set_index('Date').groupby('Product')['Demand'].rolling(3, freq='d').mean()
df = df.join(s.rename('mov_avg'), on=['Product','Date'])

print (df)
        Date     Product  Demand  mov Avg   mov_avg
0 2019-01-01  Product-01       3      NaN       NaN
1 2019-01-02  Product-01       4      NaN       NaN
2 2019-01-03  Product-01       5      4.0  4.000000
3 2019-01-04  Product-01       6      5.0  5.000000
4 2019-01-05  Product-01       7      6.0  6.000000
5 2019-01-03  Product-02       2      NaN       NaN
6 2019-01-04  Product-02       3      NaN       NaN
7 2019-01-05  Product-02       4      3.0  3.000000
8 2019-01-06  Product-02       5      4.0  4.000000
9 2019-01-07  Product-02       8      5.7  5.666667