Question

有没有办法可以通过从其余数据中随机抽样来替换向量中的缺失值？

e.g

age<-c(4.2,5.6,NA,8.4,9.8,NA,10.4,15.3)

age[is.na(age)]<-sample(age,length(age[is.na(age)]),replace=TRUE)  ## trying to replace NA values with a random value from age.

我不明白为什么这不起作用？理想情况下，我希望每个NA值都被不同的值替换。

Answer 1

age[is.na(age)] <- sample(age[!is.na(age)], sum(is.na(age)), replace=F)

@Ananda Mahto建议

sum(is.na(age))

Answer 2

如果不希望随机抽样，您可以用平均值，中位数或模型拟合来替换它们：

library(e1071)
?impute
impute(as.matrix(age),what="mean") # replaces with mean 8.95

或

library(randomForest)
?na.roughfix
na.roughfix(age) # replaces with median 9.1

如果age是预测变量，并且您有回复，则可以使用随机林来估算

library(randomForest)
?rfImpute

用值替换NA - R.

2 个答案: