使用R调整概率预测的置信区间(尝试复制Stata margins命令)

时间:2019-05-19 11:18:00

标签: r stata prediction confidence-interval marginal-effects

我试图用R复制this document的第7章(第329页)中包含的Stata命令“边距”的示例。在此示例中,我认为作者使用“边际标准化”(本文的方法1:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4052139/)估计调整后的概率预测

可以下载数据here

您可以在下面找到示例中使用的Stata代码:

quietly logit diabetes i.black i.female age i.female#c.age, nolog
quietly margins female#black, at(age=(20 30 40 50 60 70))
marginsplot, noci

这些是Stata结果:

Adjusted predictions                              Number of obs   =      10349
Model VCE    : OIM

Expression   : Pr(diabetes), predict()

1._at        : age             =          20

2._at        : age             =          30

3._at        : age             =          40

4._at        : age             =          50

5._at        : age             =          60

6._at        : age             =          70

----------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
_at#female#black |
          1 0 0  |   .0053939   .0009007     5.99   0.000     .0036285    .0071593
          1 0 1  |   .0110213   .0021029     5.24   0.000     .0068998    .0151429
          1 1 0  |   .0062929   .0010313     6.10   0.000     .0042716    .0083142
          1 1 1  |    .012846   .0024139     5.32   0.000     .0081148    .0175773
          2 0 0  |   .0097292   .0013039     7.46   0.000     .0071736    .0122849
          2 0 1  |   .0197896   .0032341     6.12   0.000     .0134508    .0261283
          2 1 0  |   .0113424   .0014749     7.69   0.000     .0084517    .0142331
          2 1 1  |   .0230321   .0036826     6.25   0.000     .0158144    .0302499
          3 0 0  |   .0174876   .0018209     9.60   0.000     .0139187    .0210566
          3 0 1  |   .0352846   .0049516     7.13   0.000     .0255796    .0449896
          3 1 0  |   .0203609    .002013    10.11   0.000     .0164154    .0243063
          3 1 1  |     .04096   .0055719     7.35   0.000     .0300392    .0518808
          4 0 0  |   .0312377   .0025201    12.40   0.000     .0262985     .036177
          4 0 1  |    .062143   .0076684     8.10   0.000     .0471133    .0771727
          4 1 0  |   .0362867    .002677    13.55   0.000     .0310398    .0415336
          4 1 1  |   .0718169   .0084956     8.45   0.000     .0551659    .0884679
          5 0 0  |   .0551919    .003841    14.37   0.000     .0476636    .0627202
          5 0 1  |   .1071746   .0121984     8.79   0.000     .0832662     .131083
          5 1 0  |   .0638573   .0039306    16.25   0.000     .0561535    .0715611
          5 1 1  |   .1229396   .0132736     9.26   0.000     .0969238    .1489554
          6 0 0  |   .0957004   .0070862    13.51   0.000     .0818118     .109589
          6 0 1  |    .178623   .0196724     9.08   0.000     .1400658    .2171802
          6 1 0  |   .1099855   .0073083    15.05   0.000     .0956615    .1243095
          6 1 1  |    .202514   .0209663     9.66   0.000     .1614207    .2436072
----------------------------------------------------------------------------------

我使用“预测”包获得了类似(不完全相同?)调整后的概率预测。但是,CRAN软件包没有提供这些估计的置信区间,因此我不得不从github下载该软件包。

这是我使用的 R 代码:

library(foreign)
kk <- read.dta("http://www.stata-press.com/data/r12/nhanes2.dta")

library(remotes)
if (!require("remotes")) {
  install.packages("remotes")
}
remotes::install_github("leeper/prediction")


library(prediction)

kk$female.f <- factor(kk$female, levels=c(0,1), labels=c("no", "yes"))
kk$black.f <- factor(kk$black, levels=c(0,1), labels=c("no", "yes"))

model1 <- glm(diabetes ~ black.f + female.f + age + female.f:age, family="binomial", data=kk)

kk2 <- prediction(model1, at = list(female.f=c("no","yes"), black.f=c("no","yes"), age=c(20,30,40,50,60,70)))

kk3 <- summary(kk2)
plot(value ~ `at(age)`, data=subset(kk3,`at(female.f)`=="yes" & `at(black.f)`=="yes"), type="l")

这些是获得的结果:

    kk3 <- summary(kk2)
Average predictions for 10349 observations:
at(female.f) at(black.f) at(age) Prediction       SE      z         p    lower    upper
       no          no      20   0.003308 0.000901  3.672 2.408e-04 0.001542 0.005074
      yes          no      20   0.008568 0.001648  5.198 2.018e-07 0.005337 0.011799
       no         yes      20   0.006731 0.001946  3.459 5.431e-04 0.002917 0.010545
      yes         yes      20   0.017338 0.003661  4.736 2.181e-06 0.010162 0.024513
       no          no      30   0.006735 0.001427  4.721 2.348e-06 0.003939 0.009531
      yes          no      30   0.014274 0.002126  6.713 1.909e-11 0.010106 0.018441
       no         yes      30   0.013655 0.003180  4.294 1.758e-05 0.007421 0.019888
      yes         yes      30   0.028713 0.004984  5.761 8.346e-09 0.018945 0.038481
       no          no      40   0.013663 0.002098  6.512 7.401e-11 0.009551 0.017775
      yes          no      40   0.023687 0.002569  9.220 2.979e-20 0.018652 0.028723
       no         yes      40   0.027503 0.004984  5.518 3.422e-08 0.017735 0.037271
      yes         yes      40   0.047194 0.006672  7.073 1.516e-12 0.034116 0.060272
       no          no      50   0.027520 0.002799  9.832 8.164e-23 0.022034 0.033006
      yes          no      50   0.039063 0.002948 13.252 4.407e-40 0.033286 0.044841
       no         yes      50   0.054618 0.007594  7.192 6.363e-13 0.039734 0.069502
      yes         yes      50   0.076631 0.009068  8.451 2.897e-17 0.058858 0.094404
       no          no      60   0.054652 0.003860 14.160 1.614e-45 0.047087 0.062216
      yes          no      60   0.063768 0.003921 16.263 1.801e-59 0.056083 0.071453
       no         yes      60   0.105565 0.012120  8.710 3.047e-18 0.081809 0.129320
      yes         yes      60   0.122077 0.013192  9.254 2.164e-20 0.096222 0.147933
       no          no      70   0.105626 0.008537 12.373 3.663e-35 0.088894 0.122358
      yes          no      70   0.102432 0.007571 13.529 1.057e-41 0.087592 0.117272
       no         yes      70   0.194268 0.021728  8.941 3.865e-19 0.151681 0.236855
      yes         yes      70   0.188960 0.020675  9.140 6.273e-20 0.148438 0.229482

如您所见,“预测”命令没有提供与Stata相同的预测概率及其95%置信区间。

您知道为什么 R 估计不同吗?

2 个答案:

答案 0 :(得分:0)

也许您对margins软件包感到幸运。诀窍是使用summary

kk <- foreign::read.dta("http://www.stata-press.com/data/r12/nhanes2.dta")

model1 <- glm(diabetes ~ black + female*age, family="binomial", data=kk)

library(margins)
res <- summary(margins(model1, variables="female", 
                       at=list(age=(2:7)*10, black=0:1, female=0:1)))
head(res)
# factor     age  black female    AME     SE      z      p  lower  upper
# female 20.0000 0.0000 0.0000 0.0032 0.0006 4.9238 0.0000 0.0019 0.0044
# female 20.0000 0.0000 1.0000 0.0081 0.0039 2.0699 0.0385 0.0004 0.0158
# female 20.0000 1.0000 0.0000 0.0064 0.0014 4.4950 0.0000 0.0036 0.0092
# female 20.0000 1.0000 1.0000 0.0163 0.0079 2.0536 0.0400 0.0007 0.0319
# female 30.0000 0.0000 0.0000 0.0051 0.0011 4.7605 0.0000 0.0030 0.0072
# female 30.0000 0.0000 1.0000 0.0107 0.0047 2.2650 0.0235 0.0014 0.0199

对于带有AME的所需图,我们可能可以使用margins::cplot,其中文档?cplot表示:

  

请注意,当what = "prediction"时,图显示预测   数据的均值或众数值,而当what = "effect"时   显示了平均边际效应(即处于观察值)。

相应地

cplot(model1, x="age", what="effect", 
      data=kk[kk[["female"]] == 0 & kk[["black"]] == 0, ], 
      draw=TRUE, xvals=seq(20, 70, 10))

应给您所需的结果

enter image description here

答案 1 :(得分:0)

我的Stata代码中有一个错误,我没有使用“边际标准化”来估计调整后的概率预测(本文的方法1:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4052139/)。

正确的Stata代码是:

quietly logit diabetes i.black i.female age i.female#c.age, nolog
margins female#black, at(age=(20 30 40 50 60 70)) post

这些是结果:

Adjusted predictions                            Number of obs     =     10,349
Model VCE    : OIM

Expression   : Pr(diabetes), predict()

1._at        : age             =          20

2._at        : age             =          30

3._at        : age             =          40

4._at        : age             =          50

5._at        : age             =          60

6._at        : age             =          70

----------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
_at#female#black |
          1 0 0  |   .0033083    .000901     3.67   0.000     .0015424    .0050743
          1 0 1  |   .0067309   .0019462     3.46   0.001     .0029165    .0105453
          1 1 0  |   .0085682   .0016485     5.20   0.000     .0053373    .0117992
          1 1 1  |   .0173377   .0036609     4.74   0.000     .0101624    .0245129
          2 0 0  |   .0067352   .0014267     4.72   0.000      .003939    .0095315
          2 0 1  |   .0136545   .0031803     4.29   0.000     .0074213    .0198877
          2 1 0  |   .0142736   .0021263     6.71   0.000     .0101061    .0184411
          2 1 1  |   .0287132   .0049838     5.76   0.000     .0189452    .0384813
          3 0 0  |   .0136633   .0020981     6.51   0.000     .0095511    .0177754
          3 0 1  |   .0275028   .0049839     5.52   0.000     .0177345    .0372711
          3 1 0  |   .0236872   .0025692     9.22   0.000     .0186517    .0287227
          3 1 1  |   .0471941   .0066724     7.07   0.000     .0341164    .0602717
          4 0 0  |   .0275202   .0027989     9.83   0.000     .0220343     .033006
          4 0 1  |    .054618   .0075938     7.19   0.000     .0397344    .0695016
          4 1 0  |   .0390632   .0029478    13.25   0.000     .0332856    .0448407
          4 1 1  |   .0766313   .0090681     8.45   0.000     .0588582    .0944044
          5 0 0  |   .0546516   .0038595    14.16   0.000     .0470871    .0622162
          5 0 1  |   .1055647   .0121204     8.71   0.000     .0818092    .1293202
          5 1 0  |   .0637682    .003921    16.26   0.000     .0560831    .0714532
          5 1 1  |   .1220774    .013192     9.25   0.000     .0962216    .1479332
          6 0 0  |   .1056261   .0085369    12.37   0.000     .0888941    .1223581
          6 0 1  |    .194268   .0217283     8.94   0.000     .1516813    .2368548
          6 1 0  |   .1024321   .0075714    13.53   0.000     .0875924    .1172718
          6 1 1  |   .1889598    .020675     9.14   0.000     .1484376    .2294821
----------------------------------------------------------------------------------

现在,使用R(通过“预测”包)获得的结果与使用Stata获得的结果相同。