我通过包MatchIt运行粗略精确匹配(CEM)作为预处理步骤,并希望在进一步分析中使用匹配的数据。在查看匹配数据的摘要统计时,我注意到从匹配数据集中提取的方法与MatchIt摘要输出不同。例如,使用lalonde数据集:
library(MatchIt)
library(doBy)
data(lalonde)
m.out <- matchit(treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde, method = "cem")
summary(m.out) #Means from MatchIt summary output:
Summary of balance for matched data:
Means Treated Means Control
age 21.5441 21.1781
educ 10.2941 10.3827
black 0.8676 0.8676
hispan 0.0588 0.0588
married 0.0441 0.0441
nodegree 0.6176 0.6176
re74 456.1345 622.8740
re75 350.6728 520.7135
m.dat<-match.data(m.out)
ExtractedMeans<-summaryBy(age+educ+black+hispan+married+nodegree+re74+re75 ~ treat, data = m.dat, FUN=function(x) { c(Mean=mean(x)) } )
ExtractedMeans #Means extracted manually from matched data:
treat 1 0
age.Mean 21.544 19.628
educ.Mean 10.294 9.7179
black.Mean 0.8676 0.60256
hispan.Mean 0.0588 0.10256
married.Mean 0.0441 0.07692
nodegree.Mean 0.6176 0.75641
re74.Mean 456.13 609.61
re75.Mean 350.67 464.22
从匹配数据手动提取的控制组的均值与MatchIt摘要输出不一致。有谁知道这里发生了什么?我上周将这个问题发布到了MatchIt gmane电子邮件列表中,但没有收到回复。谢谢你的帮助。
答案 0 :(得分:2)
'doSummary'功能不使用权重。如果将权重乘以您想要平均的变量,您将获得与包显示的平均值相同的平均值。举个例子,拿你的代码来做这个:
> tapply(m.dat$age, m.dat$treat, mean)
0 1
19.62821 21.54412
> tapply(m.dat$age*m.dat$weights, m.dat$treat, mean)
0 1
21.17811 21.54412
所以,它们与MatchIt结果相同......