Scikit学习:Imputer可以工作,SimpleImputer和IterativeImputer不能

时间:2019-07-02 17:07:52

标签: python scikit-learn missing-data

我使用的数据集缺少很多值,我认为我可以使用KNeighbors解决方案来解决该问题。为此,更简单的方法是使用sklearn.impute中的IterativeImputer。 为此,我使用了代码:

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.neighbors import KNeighborsRegressor
opened_file = pd.read_csv(input_file, sep = ",", header = 0, na_values = "NaN", dtype = str)    
opened_file.drop(opened_file.loc[opened_file[class_col] == np.nan].index, inplace = True)
input_estimator = IterativeImputer(random_state=42, estimator=KNeighborsRegressor(n_neighbors=1))
usable_data = opened_file[cols]
usable_data = input_estimator.fit_transform(usable_data)

但是,这产生了错误:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

SimpleImputer也是如此。但是,当我从sklearn.preprocessing运行(已弃用的)Imputer时,代码运行得很好:

from sklearn.preprocessing import Imputer
opened_file = pd.read_csv(input_file, sep = ",", header = 0, na_values = "NaN", dtype = str)    
opened_file.drop(opened_file.loc[opened_file[class_col] == np.nan].index, inplace = True)
usable_data = opened_file[cols]
usable_data = Imputer().fit_transform(usable_data)

因此,产生输出:

[[  0.          26.           4.         ...  48.923       72.615
  100.        ]
 [  0.          26.           4.         ...  48.923       72.615
  100.        ]
 [  0.          26.           4.         ...  48.923       72.615
  100.        ]
 ...
 [  1.          10.           3.         ...  49.63712147  73.50532432
   99.12231621]
 [  1.          10.           3.         ...  49.63712147  73.50532432
   99.12231621]
 [  0.979414    23.16310899   3.95972961 ...  49.63712147  73.50532432
   99.12231621]]

所有操作均使用pandas数据框执行。我可以使用Imputer,但我想部署一个K近邻来解决缺失值。

0 个答案:

没有答案