我在熊猫的数据框中有前三列。我要计算第4列中显示的每种产品的3天移动平均值。
数据
print (df)
Date Product Demand mov Avg
0 1-Jan-19 Product-01 3 NaN
1 2-Jan-19 Product-01 4 NaN
2 3-Jan-19 Product-01 5 4.0
3 4-Jan-19 Product-01 6 5.0
4 5-Jan-19 Product-01 7 6.0
5 3-Jan-19 Product-02 2 NaN
6 4-Jan-19 Product-02 3 NaN
7 5-Jan-19 Product-02 4 3.0
8 6-Jan-19 Product-02 5 4.0
9 7-Jan-19 Product-02 8 5.7
我尝试使用groupby和滚动均值,但似乎不起作用。
df['mov_avg'] =df.set_index('Date').groupby('Product').rolling('Demand',window=7).mean().reset_index(drop=True)
答案 0 :(得分:1)
使用:
df['Date'] = pd.to_datetime(df['Date'], format='%d-%b-%y')
您的解决方案应通过rolling(3, freq='d')
进行更改:
#sorting if not sorted DataFrame by both columns
df = df.sort_values(['Date','Product']).reset_index(drop=True)
df['mov_avg'] = (df.set_index('Date')
.groupby('Product')['Demand']
.rolling(3, freq='d')
.mean()
.reset_index(drop=True))
另一个更好的解决方案是使用DataFrame.join
:
s = df.set_index('Date').groupby('Product')['Demand'].rolling(3, freq='d').mean()
df = df.join(s.rename('mov_avg'), on=['Product','Date'])
print (df)
Date Product Demand mov Avg mov_avg
0 2019-01-01 Product-01 3 NaN NaN
1 2019-01-02 Product-01 4 NaN NaN
2 2019-01-03 Product-01 5 4.0 4.000000
3 2019-01-04 Product-01 6 5.0 5.000000
4 2019-01-05 Product-01 7 6.0 6.000000
5 2019-01-03 Product-02 2 NaN NaN
6 2019-01-04 Product-02 3 NaN NaN
7 2019-01-05 Product-02 4 3.0 3.000000
8 2019-01-06 Product-02 5 4.0 4.000000
9 2019-01-07 Product-02 8 5.7 5.666667