调查权重和多重估算的边际效应

时间:2018-01-29 16:57:22

标签: r survey imputation

我正在处理使用概率权重和多次估算的调查数据。在使用推算数据集和调查权重估算logit模型后,我希望得到边际效应。我无法弄清楚如何在R中做到这一点.Stata有包mimrgns,这使得它很容易。还有article (pdf)supplementary material (pdf)给出了一些方向,但我似乎无法将其应用于我的情况。

在下面的例子中,请假设我已经在三个数据集(即df1,df2和df3)中估算了“收入”。我想用就业状况(即工作)和“收入”来预测“性别”。

这是一个可重复的例子。

library(tibble)
library(survey)
library(mitools)
library(ggeffects)

# Data set 1
# Note that I am excluding the "income" variable from the "df"s and creating  
# it separately so that it varies between the data sets. This simulates the 
# variation with multiple imputation. Since I am using the same seed
# (i.e., 123), all the other variables will be the same, the only one that 
# will vary will be "income."

set.seed(123)

df1 <- tibble(id      = seq(1, 100, by = 1),
              gender  = as.factor(rbinom(n = 100, size = 1, prob = 0.50)),
              working = as.factor(rbinom(n = 100, size = 1, prob = 0.40)),
              pweight = sample(50:500, 100,  replace   = TRUE))


# Create random income variable.

set.seed(456)

income <- tibble(income = sample(0:100000, 100))

# Bind it to df1

df1 <- cbind(df1, income)


# Data set 2

set.seed(123)

df2 <- tibble(id      = seq(1, 100, by = 1),
              gender  = as.factor(rbinom(n = 100, size = 1, prob = 0.50)),
              working = as.factor(rbinom(n = 100, size = 1, prob = 0.40)),
              pweight = sample(50:500, 100,  replace   = TRUE))

set.seed(789)

income <- tibble(income = sample(0:100000, 100))

df2 <- cbind(df2, income)


# Data set 3

set.seed(123)

df3 <- tibble(id      = seq(1, 100, by = 1),
              gender  = as.factor(rbinom(n = 100, size = 1, prob = 0.50)),
              working = as.factor(rbinom(n = 100, size = 1, prob = 0.40)),
              pweight = sample(50:500, 100,  replace   = TRUE))

set.seed(101)

income <- tibble(income = sample(0:100000, 100))

df3 <- cbind(df3, income)


# Apply weights via svydesign

imputation <- svydesign(id      = ~id,
                        weights = ~pweight,
                        data    = imputationList(list(df1, 
                                                      df2, 
                                                      df3)))


# Logit model with weights and imputations

logitImp <- with(imputation, svyglm(gender ~ working + income,
                             family = binomial()))


# Combine results across MI datasets

summary(MIcombine(logitImp))

通常我会使用library(ggeffects)来获得边际效果,但是当我尝试使用估算数据Error in class(model) <- "lmerMod" : attempt to set an attribute on NULL时,我会收到以下错误。这是一个如何在没有插补的情况下使用“df1”作为数据集的例子。

# Create new svydesign variable

noImp <- svydesign(id      = ~id,
                   weights = ~pweight, 
                   data    = df1)


# Run model

logit <- svyglm(gender ~ working + income,
                family = binomial,
                design = noImp,
                data   = df1)


# Get marginal effects at the mean

ggpredict(logit, term = "working")

任何想法如何通过多重插补来做到这一点?

0 个答案:

没有答案