用fancyimpute插补测试集

时间:2018-11-15 14:57:05

标签: python missing-data imputation fancyimpute

python软件包Fancyimpute提供了几种在Python中插入缺失值的方法。该文档提供了以下示例:

# X is the complete data matrix
# X_incomplete has the same values as X except a subset have been replace with NaN

# Model each feature with missing values as a function of other features, and
# use that estimate for imputation.
X_filled_ii = IterativeImputer().fit_transform(X_incomplete)

当将插补方法应用于数据集X时,此方法工作正常。但是,如果需要training/test拆分怎么办?一次

X_train_filled = IterativeImputer().fit_transform(X_train_incomplete)
称为

,我如何估算测试集并创建X_test_filled?需要使用训练集中的信息来估算测试集。我猜想IterativeImputer()应该返回并且对象可以适合X_test_incomplete。那可能吗?

请注意,对整个数据集进行插值然后分成训练集和测试集是不正确的

1 个答案:

答案 0 :(得分:1)

该软件包看起来像是scikit-learn的API。在查看源代码之后,看起来它确实具有transform方法。

my_imputer = IterativeImputer()
X_trained_filled = my_imputer.fit_transform(X_train_incomplete)

# now transform test
X_test_filled = my_imputer.transform(X_test)

植入者将应用从训练集中学到的归因。