我试图替换"gps_height"
列中的零值。它们应该是其类"ward"
的平均值。但是,我运行此代码并说错误。
df.groupby('ward')['gps_height'].transform(lambda x: df.gps_height.mean() if x == 0 else x)
谢谢!
答案 0 :(得分:0)
这可以进一步改进,但它有效。
df=pd.DataFrame([[1, 10], [1, 0], [1, 24], [2, 15], [2, 0], [3, 23]], columns=['ward','gps_height'])
df['gps_height']=df['gps_height'].replace(0, np.nan) #replace 0 with NaN to remove 0 from average
df2=df.groupby(['ward'], as_index=False).mean() #get mean group by ward
df = df.merge(df2, on='ward', how='outer') #merge both dataframes
df.loc[pd.isnull(df.gps_height_x), 'gps_height_x'] = df.gps_height_y #replace NaN from average values
df=df[['ward','gps_height_x']] #select only first two columns
df.columns=['ward','gps_height'] #renamecolumns
df
Result:
ward gps_height
0 1 10.0
1 1 17.0
2 1 24.0
3 2 15.0
4 2 15.0
5 3 23.0