Question

我有一个带有一些估算值的数据集。根据预定义的编辑规则，这些估算值中的一些是不可信的。出于这个原因，我想调整这些难以置信的估算值，但调整应该尽可能小。

这是一个简化的例子：

# Seed
set.seed(111)

# Example data
data <- data.frame(x1 = round(rnorm(200, 5, 5), 0),
                   x2 = factor(round(runif(200, 1, 3), 0)),
                   x3 = round(rnorm(200, 2, 10), 0),
                   x4 = factor(round(runif(200, 0, 5), 0)))
data[data$x1 > 5 & data$x2 == 1, ]$x3 <- 4
data[data$x1 > 5 & data$x2 == 1, ]$x4 <- 5

# Missings
data$x1[sample(1:nrow(data), 25)] <- NA
data$x2[sample(1:nrow(data), 50)] <- NA
data$x3[sample(1:nrow(data), 40)] <- NA
data$x4[sample(1:nrow(data), 35)] <- NA

# Imputation
library("mice")
imp <- mice(data, m = 1)

# Imputed data
data_imp <- complete(imp, "repeated")

# So far everything works well. 
# However, there is a predefined edit rule, which should not be violated.

# Edit Rule: 
# If x1 > 5 and x2 == 1 
# Then x3 > 3 and x4 > 4

# Because of the imputation, some of the observations have implausible values.
implausible <- data_imp[data_imp$x1 > 5 & data_imp$x2 == 1 & 
                          (data_imp$x3 <= 3 | (data_imp$x4 != 4 & data_imp$x4 != 5)), ]
implausible

# Example 1)
# In row 26 x1 has a value > 5 and x2 equals 1. 
# For that reason, x3 would have to be larger than 3 (here x3 is -17).
# Like you can see in the original data, x2 has been imputed in row 26.
data[rownames(implausible), ]
# Hence, x2 would have to be adjusted, so that it randomly gets a different category.

# Example 2)
# In row 182 are also implausible values. 
# Three of the variables have been imputed in this row. 
# Therefore, all/some of the imputed cells would have to be adjusted, 
# but the adjustment should be as small as possible.

我已经做了一些研究并找到了一些相关的论文/书籍，其中描述了一些优化算法：

Pannekoek＆amp;张（2011）：https://www.researchgate.net/publication/269410841_Partial_donor_Imputation_with_Adjustments

de Waal，Pannekoek＆amp; Scholtus（2011年）：统计数据编辑和估算手册

然而，我正在努力在R中实现这些算法。是否有可用的包，这有助于这些计算。我非常感谢我对代码的一些帮助或有关该主题的一些提示！

以优化的方式调整难以置信的估算值

0 个答案: