用其他列的平均值替换零值

时间:2018-03-12 19:37:35

标签: python-3.x replace na

我试图替换"gps_height"列中的零值。它们应该是其类"ward"的平均值。但是,我运行此代码并说错误。

df.groupby('ward')['gps_height'].transform(lambda x: df.gps_height.mean() if x == 0 else x)

谢谢!

1 个答案:

答案 0 :(得分:0)

这可以进一步改进,但它有效。

df=pd.DataFrame([[1, 10], [1, 0], [1, 24], [2, 15], [2, 0], [3, 23]], columns=['ward','gps_height'])
df['gps_height']=df['gps_height'].replace(0, np.nan)  #replace 0 with NaN to remove 0 from average
df2=df.groupby(['ward'], as_index=False).mean()    #get mean group by ward
df = df.merge(df2, on='ward', how='outer')    #merge both dataframes
df.loc[pd.isnull(df.gps_height_x), 'gps_height_x'] = df.gps_height_y   #replace NaN from average values
df=df[['ward','gps_height_x']]      #select only first two columns
df.columns=['ward','gps_height']    #renamecolumns
df

Result:
    ward    gps_height
0   1   10.0
1   1   17.0
2   1   24.0
3   2   15.0
4   2   15.0
5   3   23.0