我试图用R复制this document的第7章(第329页)中包含的Stata命令“边距”的示例。在此示例中,我认为作者使用“边际标准化”(本文的方法1:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4052139/)估计调整后的概率预测
可以下载数据here。
您可以在下面找到示例中使用的Stata代码:
quietly logit diabetes i.black i.female age i.female#c.age, nolog
quietly margins female#black, at(age=(20 30 40 50 60 70))
marginsplot, noci
这些是Stata结果:
Adjusted predictions Number of obs = 10349
Model VCE : OIM
Expression : Pr(diabetes), predict()
1._at : age = 20
2._at : age = 30
3._at : age = 40
4._at : age = 50
5._at : age = 60
6._at : age = 70
----------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
_at#female#black |
1 0 0 | .0053939 .0009007 5.99 0.000 .0036285 .0071593
1 0 1 | .0110213 .0021029 5.24 0.000 .0068998 .0151429
1 1 0 | .0062929 .0010313 6.10 0.000 .0042716 .0083142
1 1 1 | .012846 .0024139 5.32 0.000 .0081148 .0175773
2 0 0 | .0097292 .0013039 7.46 0.000 .0071736 .0122849
2 0 1 | .0197896 .0032341 6.12 0.000 .0134508 .0261283
2 1 0 | .0113424 .0014749 7.69 0.000 .0084517 .0142331
2 1 1 | .0230321 .0036826 6.25 0.000 .0158144 .0302499
3 0 0 | .0174876 .0018209 9.60 0.000 .0139187 .0210566
3 0 1 | .0352846 .0049516 7.13 0.000 .0255796 .0449896
3 1 0 | .0203609 .002013 10.11 0.000 .0164154 .0243063
3 1 1 | .04096 .0055719 7.35 0.000 .0300392 .0518808
4 0 0 | .0312377 .0025201 12.40 0.000 .0262985 .036177
4 0 1 | .062143 .0076684 8.10 0.000 .0471133 .0771727
4 1 0 | .0362867 .002677 13.55 0.000 .0310398 .0415336
4 1 1 | .0718169 .0084956 8.45 0.000 .0551659 .0884679
5 0 0 | .0551919 .003841 14.37 0.000 .0476636 .0627202
5 0 1 | .1071746 .0121984 8.79 0.000 .0832662 .131083
5 1 0 | .0638573 .0039306 16.25 0.000 .0561535 .0715611
5 1 1 | .1229396 .0132736 9.26 0.000 .0969238 .1489554
6 0 0 | .0957004 .0070862 13.51 0.000 .0818118 .109589
6 0 1 | .178623 .0196724 9.08 0.000 .1400658 .2171802
6 1 0 | .1099855 .0073083 15.05 0.000 .0956615 .1243095
6 1 1 | .202514 .0209663 9.66 0.000 .1614207 .2436072
----------------------------------------------------------------------------------
我使用“预测”包获得了类似(不完全相同?)调整后的概率预测。但是,CRAN软件包没有提供这些估计的置信区间,因此我不得不从github下载该软件包。
这是我使用的 R 代码:
library(foreign)
kk <- read.dta("http://www.stata-press.com/data/r12/nhanes2.dta")
library(remotes)
if (!require("remotes")) {
install.packages("remotes")
}
remotes::install_github("leeper/prediction")
library(prediction)
kk$female.f <- factor(kk$female, levels=c(0,1), labels=c("no", "yes"))
kk$black.f <- factor(kk$black, levels=c(0,1), labels=c("no", "yes"))
model1 <- glm(diabetes ~ black.f + female.f + age + female.f:age, family="binomial", data=kk)
kk2 <- prediction(model1, at = list(female.f=c("no","yes"), black.f=c("no","yes"), age=c(20,30,40,50,60,70)))
kk3 <- summary(kk2)
plot(value ~ `at(age)`, data=subset(kk3,`at(female.f)`=="yes" & `at(black.f)`=="yes"), type="l")
这些是获得的结果:
kk3 <- summary(kk2)
Average predictions for 10349 observations:
at(female.f) at(black.f) at(age) Prediction SE z p lower upper
no no 20 0.003308 0.000901 3.672 2.408e-04 0.001542 0.005074
yes no 20 0.008568 0.001648 5.198 2.018e-07 0.005337 0.011799
no yes 20 0.006731 0.001946 3.459 5.431e-04 0.002917 0.010545
yes yes 20 0.017338 0.003661 4.736 2.181e-06 0.010162 0.024513
no no 30 0.006735 0.001427 4.721 2.348e-06 0.003939 0.009531
yes no 30 0.014274 0.002126 6.713 1.909e-11 0.010106 0.018441
no yes 30 0.013655 0.003180 4.294 1.758e-05 0.007421 0.019888
yes yes 30 0.028713 0.004984 5.761 8.346e-09 0.018945 0.038481
no no 40 0.013663 0.002098 6.512 7.401e-11 0.009551 0.017775
yes no 40 0.023687 0.002569 9.220 2.979e-20 0.018652 0.028723
no yes 40 0.027503 0.004984 5.518 3.422e-08 0.017735 0.037271
yes yes 40 0.047194 0.006672 7.073 1.516e-12 0.034116 0.060272
no no 50 0.027520 0.002799 9.832 8.164e-23 0.022034 0.033006
yes no 50 0.039063 0.002948 13.252 4.407e-40 0.033286 0.044841
no yes 50 0.054618 0.007594 7.192 6.363e-13 0.039734 0.069502
yes yes 50 0.076631 0.009068 8.451 2.897e-17 0.058858 0.094404
no no 60 0.054652 0.003860 14.160 1.614e-45 0.047087 0.062216
yes no 60 0.063768 0.003921 16.263 1.801e-59 0.056083 0.071453
no yes 60 0.105565 0.012120 8.710 3.047e-18 0.081809 0.129320
yes yes 60 0.122077 0.013192 9.254 2.164e-20 0.096222 0.147933
no no 70 0.105626 0.008537 12.373 3.663e-35 0.088894 0.122358
yes no 70 0.102432 0.007571 13.529 1.057e-41 0.087592 0.117272
no yes 70 0.194268 0.021728 8.941 3.865e-19 0.151681 0.236855
yes yes 70 0.188960 0.020675 9.140 6.273e-20 0.148438 0.229482
如您所见,“预测”命令没有提供与Stata相同的预测概率及其95%置信区间。
您知道为什么 R 估计不同吗?
答案 0 :(得分:0)
也许您对margins
软件包感到幸运。诀窍是使用summary
。
kk <- foreign::read.dta("http://www.stata-press.com/data/r12/nhanes2.dta")
model1 <- glm(diabetes ~ black + female*age, family="binomial", data=kk)
library(margins)
res <- summary(margins(model1, variables="female",
at=list(age=(2:7)*10, black=0:1, female=0:1)))
head(res)
# factor age black female AME SE z p lower upper
# female 20.0000 0.0000 0.0000 0.0032 0.0006 4.9238 0.0000 0.0019 0.0044
# female 20.0000 0.0000 1.0000 0.0081 0.0039 2.0699 0.0385 0.0004 0.0158
# female 20.0000 1.0000 0.0000 0.0064 0.0014 4.4950 0.0000 0.0036 0.0092
# female 20.0000 1.0000 1.0000 0.0163 0.0079 2.0536 0.0400 0.0007 0.0319
# female 30.0000 0.0000 0.0000 0.0051 0.0011 4.7605 0.0000 0.0030 0.0072
# female 30.0000 0.0000 1.0000 0.0107 0.0047 2.2650 0.0235 0.0014 0.0199
对于带有AME的所需图,我们可能可以使用margins::cplot
,其中文档?cplot
表示:
请注意,当
what = "prediction"
时,图显示预测 数据的均值或众数值,而当what = "effect"
时 显示了平均边际效应(即处于观察值)。
相应地
cplot(model1, x="age", what="effect",
data=kk[kk[["female"]] == 0 & kk[["black"]] == 0, ],
draw=TRUE, xvals=seq(20, 70, 10))
应给您所需的结果:
答案 1 :(得分:0)
我的Stata代码中有一个错误,我没有使用“边际标准化”来估计调整后的概率预测(本文的方法1:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4052139/)。
正确的Stata代码是:
quietly logit diabetes i.black i.female age i.female#c.age, nolog
margins female#black, at(age=(20 30 40 50 60 70)) post
这些是结果:
Adjusted predictions Number of obs = 10,349
Model VCE : OIM
Expression : Pr(diabetes), predict()
1._at : age = 20
2._at : age = 30
3._at : age = 40
4._at : age = 50
5._at : age = 60
6._at : age = 70
----------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
_at#female#black |
1 0 0 | .0033083 .000901 3.67 0.000 .0015424 .0050743
1 0 1 | .0067309 .0019462 3.46 0.001 .0029165 .0105453
1 1 0 | .0085682 .0016485 5.20 0.000 .0053373 .0117992
1 1 1 | .0173377 .0036609 4.74 0.000 .0101624 .0245129
2 0 0 | .0067352 .0014267 4.72 0.000 .003939 .0095315
2 0 1 | .0136545 .0031803 4.29 0.000 .0074213 .0198877
2 1 0 | .0142736 .0021263 6.71 0.000 .0101061 .0184411
2 1 1 | .0287132 .0049838 5.76 0.000 .0189452 .0384813
3 0 0 | .0136633 .0020981 6.51 0.000 .0095511 .0177754
3 0 1 | .0275028 .0049839 5.52 0.000 .0177345 .0372711
3 1 0 | .0236872 .0025692 9.22 0.000 .0186517 .0287227
3 1 1 | .0471941 .0066724 7.07 0.000 .0341164 .0602717
4 0 0 | .0275202 .0027989 9.83 0.000 .0220343 .033006
4 0 1 | .054618 .0075938 7.19 0.000 .0397344 .0695016
4 1 0 | .0390632 .0029478 13.25 0.000 .0332856 .0448407
4 1 1 | .0766313 .0090681 8.45 0.000 .0588582 .0944044
5 0 0 | .0546516 .0038595 14.16 0.000 .0470871 .0622162
5 0 1 | .1055647 .0121204 8.71 0.000 .0818092 .1293202
5 1 0 | .0637682 .003921 16.26 0.000 .0560831 .0714532
5 1 1 | .1220774 .013192 9.25 0.000 .0962216 .1479332
6 0 0 | .1056261 .0085369 12.37 0.000 .0888941 .1223581
6 0 1 | .194268 .0217283 8.94 0.000 .1516813 .2368548
6 1 0 | .1024321 .0075714 13.53 0.000 .0875924 .1172718
6 1 1 | .1889598 .020675 9.14 0.000 .1484376 .2294821
----------------------------------------------------------------------------------
现在,使用R(通过“预测”包)获得的结果与使用Stata获得的结果相同。