Question

请考虑以下内容：

我正在使用R中的MatchIt包来匹配数据。我所拥有的控件少于处理的控件，因此可以使用选项replace = TRUE。根据{{3}}，权重告诉我们匹配控件的频率。

从手册：

“要与替换项匹配，请使用replace = TRUE。与替换项匹配后，权重可以用来反映频率每个控制单元都与之匹配。”

但是，我不明白为什么权重可以使用小数，以及如何反映频率。

例如，我在手册的示例中添加了replace == TRUE（请参见第18页）：

library("dplyr")
library("MatchIt")

m.out1 <- matchit(treat ~ re74 + re75 + age + educ, data = lalonde,
 method = "nearest", distance = "logit", replace = T)

tail(match.data(m.out1), 15)
#>         treat age educ black hispan married nodegree re74 re75      re78
#> PSID388     0  19   11     1      0       0        1    0    0 16485.520
#> PSID390     0  48   13     0      0       1        0    0    0     0.000
#> PSID392     0  17   10     1      0       0        1    0    0     0.000
#> PSID393     0  38   12     0      0       1        0    0    0 18756.780
#> PSID396     0  48   14     0      0       1        0    0    0  7236.427
#> PSID398     0  17    8     1      0       0        1    0    0  4520.366
#> PSID400     0  37    8     1      0       0        1    0    0   648.722
#> PSID401     0  17   10     1      0       0        1    0    0  1053.619
#> PSID407     0  23   12     0      0       0        0    0    0  3902.676
#> PSID409     0  17   10     0      0       0        1    0    0 14942.770
#> PSID411     0  18   10     1      0       0        1    0    0  5306.516
#> PSID413     0  17   10     0      0       1        1    0    0  3859.822
#> PSID419     0  51    4     1      0       0        1    0    0     0.000
#> PSID423     0  27   10     1      0       0        1    0    0  7543.794
#> PSID425     0  18   11     0      0       0        1    0    0 10150.500
#>          distance weights
#> PSID388 0.4067545     0.6
#> PSID390 0.4042321     1.2
#> PSID392 0.3974677     0.6
#> PSID393 0.4016920     4.2
#> PSID396 0.4152715     0.6
#> PSID398 0.3758217     1.8
#> PSID400 0.3595084     0.6
#> PSID401 0.3974677     1.2
#> PSID407 0.4144044     1.8
#> PSID409 0.3974677     0.6
#> PSID411 0.3966277     1.2
#> PSID413 0.3974677     1.2
#> PSID419 0.3080590     0.6
#> PSID423 0.3890954     1.2
#> PSID425 0.4076015     1.2

对于控件“ PSID393”，权重为4.276。因此，我假设此控件被匹配了4或5次（四舍五入后）。

但是，我们也可以查看match.matrix来查看匹配的治疗并逐一进行对照。过滤“ PSID393”，我们发现该控件实际上已被匹配7次：

m.out1$match.matrix %>% data.frame() %>% filter(X1 == "PSID393")


#>        X1
#> 1 PSID393
#> 2 PSID393
#> 3 PSID393
#> 4 PSID393
#> 5 PSID393
#> 6 PSID393
#> 7 PSID393

^{由manual（v0.2.1）于2019-05-06创建}

如何正确解释这两个输出？

Answer 1

按比例缩放权重，以使它们总和为对照组中唯一匹配的观察值的数量。使用示例数据，请注意，权重的总和等于观察值的数量，平均权重为1。此外，最常使用的观察值的权重是最不经常使用的观察值的权重的7倍）：

<ReferenceManyField 
  label="Users" 
  reference="users" 
  target="uuid_organization" 
  sort={{ field: 'email', order: 'ASC' }}
>
  <UserTotal />
</ReferenceManyField>

match.data(m.out1) %>%
  group_by(treat) %>% 
  summarise(min.weight=min(weights),
            max.weight=max(weights),
            mean.weight=mean(weights),
            sum.weights=sum(weights),
            n=n(),
            max.match.ratio=max.weight/min.weight)

要查看权重的分布，我们可以这样做：

  treat min.weight max.weight mean.weight sum.weights     n max.match.ratio
1     0      0.605       4.24           1         112   112               7
2     1      1           1              1         185   185               1

match.data(m.out1) %>% 
  group_by(treat, weights) %>% 
  tally %>% 
  group_by(treat) %>% 
  mutate(weight.ratio = weights/min(weights))

the MatchIt vignette末尾有一个常见问题解答。项目5.3，“权重是如何精确创建的？”注意到“控制组的权重按比例得出总和，以得出唯一匹配的控制组数单位。”

R：使用MatchIt的倾向得分匹配。如何使用replace = TRUE查找匹配观察值的数量？

1 个答案: