Question

我正在对老鼠进行多次估算，但我很惊讶地发现没有NA的变量中的原始值会被改变和扭曲。

有关可重复的示例，请参见下文。我将使用mtcars（base R）并在其中嵌入2列中的随机NAs - disp和hp。我将标记这些NA的位置。然后我会将缺失值归为真，并将其与原始值进行比较。最后，我将在散点图中绘制结果：原始值与推算值。我希望原始值与没有NA的列的估算值一致，因为不应该有任何插补。但这种情况并非如此。代码和图表如下：

library(data.table)
library(ggplot2)
library(mice)
data(mtcars)
setDT(mtcars)
dim(mtcars)
# 32 11
mtcars_original <- copy(mtcars)
mtcars[as.numeric(sample(row.names(mtcars), 7)), ]$hp <- NA
mtcars[as.numeric(sample(row.names(mtcars), 7)), ]$disp <- NA
mtcars[, ":="(hp_NA = ifelse(is.na(hp), 1, 0) , disp_NA = ifelse(is.na(disp), 1, 0))]
mtcars_imputed <- complete(mice(mtcars))
mtcars_imputed$disp_original <- mtcars_original$disp
mtcars_imputed$hp_original <- mtcars_original$hp

ggplot(mtcars_imputed, aes(x = disp_original, y= disp, color = as.factor(disp_NA))) +
  geom_point(size = 2) + ggtitle("Match between original and imputed values \n disp") +
  geom_smooth(method = "lm", color = "red", alpha = 0.3, size = 2) + theme_economist()

ggplot(mtcars_imputed, aes(x = hp_original, y= hp, color = as.factor(hp_NA))) +
  geom_point(size = 2) + ggtitle("Match between original and imputed values \n hp") +
  geom_smooth(method = "lm", color = "red", alpha = 0.3, size = 2) + theme_economist()

您的建议将不胜感激。

用小鼠估算在没有NAs-R的变量中改变原始值

0 个答案: