如何使用大熊猫估算缺失值?

时间:2018-12-28 18:35:40

标签: python data-analysis

我试图将缺失值归为该列中其他值的平均值;但是,我的代码无效。有人知道我可能做错了吗?谢谢!

我的代码:

  from sklearn.preprocessing import Imputer
    imputer = Imputer(missing_values ='NaN', strategy = 
    'mean', axis = 0)
    imputer = imputer.fit(x[:, 1:3])
    x[:, 1:3] = imputer.transform(x[:, 1:3])
    print(dataset)

输出

Country   Age   Salary Purchased
0   France  44.0  72000.0        No
1    Spain  27.0  48000.0       Yes
2  Germany  30.0  54000.0        No
3    Spain  38.0  61000.0        No
4  Germany  40.0      NaN       Yes
5   France  35.0  58000.0       Yes
6    Spain   NaN  52000.0        No
7   France  48.0  79000.0       Yes
8  Germany  50.0  83000.0        No
9   France  37.0  67000.0       Yes

2 个答案:

答案 0 :(得分:2)

您可以执行以下操作,假设df是您的数据集:

from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values ='NaN', strategy = 'mean', axis = 0)

df[['Age','Salary']]=imputer.fit_transform(df[['Age','Salary']])

print(df)

   Country        Age        Salary Purchased
0   France  44.000000  72000.000000        No
1    Spain  27.000000  48000.000000       Yes
2  Germany  30.000000  54000.000000        No
3    Spain  38.000000  61000.000000        No
4  Germany  40.000000  63777.777778       Yes
5   France  35.000000  58000.000000       Yes
6    Spain  38.777778  52000.000000        No
7   France  48.000000  79000.000000       Yes
8  Germany  50.000000  83000.000000        No
9   France  37.000000  67000.000000       Yes

答案 1 :(得分:0)

您正在将Imputer对象分配给变量imputer:

imputer = Imputer(missing_values ='NaN', strategy = 'mean', axis = 0)

然后您在Imputer对象上调用fit()函数,然后在transform()函数上调用。

然后打印dataset变量,我不确定它来自哪里。您是要打印Imputer对象,还是打印其中一个调用的结果?