我试图将缺失值归为该列中其他值的平均值;但是,我的代码无效。有人知道我可能做错了吗?谢谢!
我的代码:
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values ='NaN', strategy =
'mean', axis = 0)
imputer = imputer.fit(x[:, 1:3])
x[:, 1:3] = imputer.transform(x[:, 1:3])
print(dataset)
输出
Country Age Salary Purchased
0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes
答案 0 :(得分:2)
您可以执行以下操作,假设df
是您的数据集:
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values ='NaN', strategy = 'mean', axis = 0)
df[['Age','Salary']]=imputer.fit_transform(df[['Age','Salary']])
print(df)
Country Age Salary Purchased
0 France 44.000000 72000.000000 No
1 Spain 27.000000 48000.000000 Yes
2 Germany 30.000000 54000.000000 No
3 Spain 38.000000 61000.000000 No
4 Germany 40.000000 63777.777778 Yes
5 France 35.000000 58000.000000 Yes
6 Spain 38.777778 52000.000000 No
7 France 48.000000 79000.000000 Yes
8 Germany 50.000000 83000.000000 No
9 France 37.000000 67000.000000 Yes
答案 1 :(得分:0)
您正在将Imputer对象分配给变量imputer:
imputer = Imputer(missing_values ='NaN', strategy = 'mean', axis = 0)
然后您在Imputer对象上调用fit()
函数,然后在transform()
函数上调用。
然后打印dataset
变量,我不确定它来自哪里。您是要打印Imputer对象,还是打印其中一个调用的结果?