首先,我想按name
,group
和place
列进行分组。
然后,我想获得相邻两个月的平均值y
。
最后,我想将平均值添加到原始数据框。
The origin dataframe:
import pandas as pd
df = pd.DataFrame({"name":["Amy", "Amy", "Amy", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob"],
"group":[1, 1, 1, 1, 1, 1, 2, 2, 2],
"place":['a', 'a', "a", 'b', 'b', 'b', 'b', 'b', 'b' ],
"yearmonth": ["2019-01", "2019-02", "2019-03", "2019-01", "2019-02", "2019-03", "2019-01", "2019-02", "2019-03"],
"y":[1, 2, 3, 1, 2, 0, 2, 0, 0]
})
print(df)
Dataframe:
name group place yearmonth y
0 Amy 1 a 2019-01 1
1 Amy 1 a 2019-02 2
2 Amy 1 a 2019-03 3
3 Bob 1 b 2019-01 1
4 Bob 1 b 2019-02 2
5 Bob 1 b 2019-03 0
6 Bob 2 b 2019-01 2
7 Bob 2 b 2019-02 0
8 Bob 2 b 2019-03 0
Expected Result:
name group place yearmonth y average_2months
0 Amy 1 a 2019-01 1 nan
1 Amy 1 a 2019-02 2 1.5
2 Amy 1 a 2019-03 3 2.5
3 Bob 1 b 2019-01 1 nan
4 Bob 1 b 2019-02 2 1.5
5 Bob 1 b 2019-03 0 1.0
6 Bob 2 b 2019-01 2 nan
7 Bob 2 b 2019-02 0 1.0
8 Bob 2 b 2019-03 0 0.0
What I tried:
现在,我现在如何获取相邻两个月的平均值。但是,我不知道如何将其添加到原始数据框。
tmp = df.groupby(['name', 'group', 'place'])['y'].rolling(2).mean()
print(tmp)
tmp:
name group place
Amy 1 a 0 NaN
1 1.5
2 2.5
Bob 1 b 3 NaN
4 1.5
5 1.0
2 b 6 NaN
7 1.0
8 0.0
Name: y, dtype: float64
答案 0 :(得分:1)
索引的第4级是您的原始索引
df['new']=temp.reset_index(level=[0,1,2], drop=True)