我有一个如下所示的数据框
ID Sector Usage Price
1 A R 20
2 A C 100
3 A R 40
4 A R 1
5 A C 200
6 A C 1
7 A C 1
8 A R 1
1 B R 40
2 B C 200
3 B R 60
4 B R 1
5 B C 400
6 B C 1
7 B C 1
8 B R 1
从上面,我想用除1以外的部门和使用组合的平均价格替换Price = 1。
预期输出:
ID Sector Usage Price
1 A R 20
2 A C 100
3 A R 40
4 A R 30
5 A C 200
6 A C 150
7 A C 150
8 A R 30
1 B R 40
2 B C 200
3 B R 60
4 B R 50
5 B C 400
6 B C 300
7 B C 300
8 B R 50
例如在第4行中,Sector = A,Usage = R价格= 1,必须用Sector = A和Usage = R的平均值代替,即(20 + 40)/ 2 = 30
答案 0 :(得分:1)
想法是先用将
1
替换为缺失值,然后使用Series.mask
表示用于替换的组的均值:
m = df['Price'] == 1
s = df.assign(Price=df['Price'].mask(m)).groupby(['Sector','Usage'])['Price'].transform('mean')
df['Price'] = np.where(m, s, df['Price']).astype(int)
或者:
s = df['Price'].mask(df['Price'] == 1)
mean = df.assign(Price=s).groupby(['Sector','Usage'])['Price'].transform('mean')
df['Price'] = s.fillna(mean).astype(int)
print (df)
ID Sector Usage Price
0 1 A R 20
1 2 A C 100
2 3 A R 40
3 4 A R 30
4 5 A C 200
5 6 A C 150
6 7 A C 150
7 8 A R 30
8 1 B R 40
9 2 B C 200
10 3 B R 60
11 4 B R 50
12 5 B C 400
13 6 B C 300
14 7 B C 300
15 8 B R 50