Question

我有几列具有NA的调查数据集。因此，我决定使用“ missForest”包执行多次插补以插补缺失值。这不是问题，但是在检查数据后，我注意到许多估算值都是数字，而以前是影响列的十进制值。

我假设missForest要求列为数字（它需要x的data.matrix）才能执行插补。

NRMSE相当好，带有插补值的列的平均值与带有NA的列的相似。

我打算将数据集的推算值用于多级线性回归，并且无论如何都会将因子列转换为数值。

这些带小数位数字的插补值会带来问题吗？

finalmatrix <- data.matrix(final)
set.seed(666)
impforest <- missForest(finalmatrix, variablewise = TRUE, parallelize = 
"forests")

Answer 1

我不知道您的数据或代码，但是missForest绝对能够处理混合类型的数据。（并且不会自动将其转换）

这是 missForest 手册中的示例：

## Nonparametric missing value imputation on mixed-type data:
## Take a look at iris definitely has a variable that is a factor 
library(missForest)
data(iris)
summary(iris)

## The data contains four continuous and one categorical variable.
## Artificially produce missing values using the 'prodNA' function:
set.seed(81)
iris.mis <- prodNA(iris, noNA = 0.2)
summary(iris.mis)

## Impute missing values providing the complete matrix for
## illustration. Use 'verbose' to see what happens between iterations:
iris.imp <- missForest(iris.mis, xtrue = iris, verbose = TRUE)


## Here are the final results
iris.imp

##As can be seen here it still has the factor column
str(iris.imp$ximp)

对类别变量使用“ missForest”在r中进行多重插补

1 个答案: