Question

请遵循匹配包中的示例，特别是GenMatch示例Link to package description pp11。

我们有以下代码

library(Matching)
data(lalonde)
attach(lalonde)

lalonde$ID <- 1:length(lalonde$age)

X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74)

BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74,
                    I(re74*re75))

genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1,
                   pop.size=16, max.generations=10, wait.generations=1)

Y=re78/1000

mout <- Match(Y=Y, Tr=treat, X=X, Weight.matrix=genout)
summary(mout)

摘要向我们展示了所有185 treat==1个案例已匹配

然后我们检查

summary(mout$weights)

这告诉我们有些treat==1案例已与treat==0

多次匹配

我想创建一个data.frame，其中只包含一个重复的treat==1个案例但只包含treat==0个案例。

所以从本质上讲，长度是185 + length(mout$index.control)

然后，我想介绍一个变量$PairID，对于每个treat==1案例，会为每个treat==0案例重复一次。

data.frame应该：

enter image description here

所以上面我们看到案例1-3只返回一对，但案例6返回2对。这可以通过以下方式看出：mout$weights[mout$index.treated]

我的想法是首先删除重复的$ index.treated案例

treat <- lalonde[mout$index.treated,]

library(dplyr)

DATA_clean <- treat %>%
  group_by(ID) %>%
  filter(!n() > 1)

但这会删除所有重复的案例。我想保留一个！

Answer 1

如果您只想为treat的每个值保留ID的第一行，那么您可以使用slice：

DATA_clean <- treat %>%
  group_by(ID) %>%
  slice(1)

您的原始代码无效，因为n()会返回ID的每个值的总行数。如果每个ID都有多行，那么所有数据都会被过滤掉。另一方面，slice只返回具有指定索引的行。如果您想要一个随机行，则可以将slice(1)替换为sample_n(1)（根据@ Frank＆＃39; s建议）。

来自匹配包的子集数据

1 个答案: