Question

我有一个缺少值的数据集，我可以使用不同的方法来估算缺失的值。现在我想估计插补方法的准确性。但由于我不知道插补前的真实值是什么，我想在原始数据“缺少数据的数据”中掩盖一些值，然后使用我的常规插补方法。完成插补后，我可以将插补值与真值进行比较，以估算插补精度。所以，我的问题是：说我想将数据集中的100个元素分配为NA，如何选择100个不缺少的元素来为其分配NA。那么，如何跟踪这些元素以进行进一步分析？

实施例

library(BLR)
library(missForest)
data(wheat)
X2<- prodNA(X, 0.1) ## original “ data with 10 % missing values 
X3<- missForest(X2)$Ximp ## imputed data set

Answer 1

此方法将确保准确拉出N个点，没有重复

## Assuming 'DF' is your data.frame or data.table

# The number of values to set to NA
N <- 10
inds <- as.matrix(expand.grid(1:nrow(DF), 1:ncol(DF)))

# Drop any indecies where DF is NA
inds <- matrix(inds[!is.na(DF[inds])], ncol=2)

# Sample randomly
selected <- inds[sample(nrow(inds), N), ]

# Note that `selected` is a matrix of (row, col) indices
DF[selected] <- NA

分配随机缺失值

1 个答案:

此方法将确保准确拉出N个点，没有重复