我已经定义了一个简单的函数来用数列中的缺失值替换列的非缺失值的平均值。该函数在语法上是正确的并生成正确的值。但是,缺失的值不会被替换
以下是代码段
def fillmissing_with_mean(df1):
df2 = df1._get_numeric_data()
for i in range(len(df2.columns)):
df2[df2.iloc[:,i].isnull()].iloc[:,i]=df2.iloc[:,i].mean()
return df2
fillmissing_with_mean(df)
传递的数据框如下所示:
age gender job name height
NaN F student alice 165.0
26.0 None student john 180.0
NaN M student eric 175.0
58.0 None manager paul NaN
33.0 M engineer julie 171.0
34.0 F scientist peter NaN
答案 0 :(得分:1)
您不必担心选择数字与否,当您进行平均值时,它只会影响到那些数字列,并且fillna
可以通过pd.Serise
df.fillna(df.mean())
Out[1398]:
age gender job name height
0 37.75 F student alice 165.00
1 26.00 None student john 180.00
2 37.75 M student eric 175.00
3 58.00 None manager paul 172.75
4 33.00 M engineer julie 171.00
5 34.00 F scientist peter 172.75
更多信息
df.mean()
Out[1399]:
age 37.75
height 172.75
dtype: float64
答案 1 :(得分:0)
这可能就是您所需要的。默认情况下为skipna=True
,但我已将其明确包含在此处,以便您了解其所做的事情。
for col in ['age', 'height']:
df[col] = df[col].fillna(df[col].mean(skipna=True))
# age gender job name height
# 0 37.75 F student alice 165.00
# 1 26.00 None student john 180.00
# 2 37.75 M student eric 175.00
# 3 58.00 None manager paul 172.75
# 4 33.00 M engineer julie 171.00
# 5 34.00 F scientist peter 172.75