如何在R中进行群组匹配?

时间:2017-04-20 12:55:23

标签: r dataframe match grouping

假设我有下面的data.frame,其中treat == 1表示id收到的处理和probtreat == 1的计算概率。

set.seed(1)
df <- data.frame(id = 1:10, treat = sample(0:1, 10, replace = T))
df$prob <- ifelse(df$treat, rnorm(10, .8, .1), rnorm(10, .4, .4))
df
   id treat      prob
1   1     0 0.3820266
2   2     0 0.3935239
3   3     1 0.8738325
4   4     1 0.8575781
5   5     0 0.6375605
6   6     1 0.9511781
7   7     1 0.8389843
8   8     1 0.7378759
9   9     1 0.5785300
10 10     0 0.6479303

为了尽量减少选择偏差,我现在希望根据treatprob的值创建伪治疗组和控制组:

  • 如果treat == 1的任何ID都在prob id的{​​{1}}范围内,我希望treat == 0的值为& #34;处理&#34 ;.

  • 如果group的任何ID都在treat == 0 prob的{​​{1}}范围内,我希望id的值为& #34;控制&#34 ;.

以下是我希望得到的结果示例。

treat == 1

我该怎么做呢?在上面的示例中,匹配是使用替换完成的,但是也可以使用没有替换的解决方案。

4 个答案:

答案 0 :(得分:4)

你可以尝试

foo <- function(x){
  TR <- range(x$prob[x$treat == 0])
  CT <- range(x$prob[x$treat == 1])
  tmp <- sapply(1:nrow(x), function(y, z){
    if(z$treat[y] == 1){
    ifelse(any(abs(z$prob[y] - TR) <= 0.1), "treated", "NA")
    }else{
    ifelse(any(abs(z$prob[y] - CT) <= 0.1), "control", "NA")
    }}, x)
  cbind(x, group = tmp)
  }

foo(df)    
   id treat      prob   group
1   1     0 0.3820266      NA
2   2     0 0.3935239      NA
3   3     1 0.8738325      NA
4   4     1 0.8575781      NA
5   5     0 0.6375605 control
6   6     1 0.9511781      NA
7   7     1 0.8389843      NA
8   8     1 0.7378759 treated
9   9     1 0.5785300 treated
10 10     0 0.6479303 control

答案 1 :(得分:2)

我认为此问题非常适合基础R中的cut。以下是如何以矢量化方式执行此操作:

f <- function(r) {
      x <- cut(df[r,]$prob, breaks = c(df[!r,]$prob-0.1, df[!r,]$prob+0.1))
      df[r,][!is.na(x),]$id
}

ones <- df$treat==1
df$group <- NA

df[df$id %in% f(ones),]$group <- "treated"
df[df$id %in% f(!ones),]$group <- "control"

> df

   # id treat      prob   group
# 1   1     0 0.3820266    <NA>
# 2   2     0 0.3935239    <NA>
# 3   3     1 0.8738325    <NA>
# 4   4     1 0.8575781    <NA>
# 5   5     0 0.6375605 control
# 6   6     1 0.9511781    <NA>
# 7   7     1 0.8389843    <NA>
# 8   8     1 0.7378759 treated
# 9   9     1 0.5785300 treated
# 10 10     0 0.6479303 control

答案 2 :(得分:1)

也许不是最优雅但似乎对我有用:

df %>% group_by(id,treat) %>% mutate(group2 = ifelse(treat==1,
                                                 ifelse(any(abs(prob-df[df$treat==0,3])<0.1),"treated","NA"),
                                                 ifelse(any(abs(prob-df[df$treat==1,3])<0.1),"control","NA"))) # treat==0

答案 3 :(得分:1)

这是你想要的吗?

#Base R:

apply(df[df$treat == 1, ],1, function(x){
  ifelse(any(df[df$treat == 0, 'prob'] -.1 < x[3] & x[3] < df[df$treat == 0, 'prob'] +.1), 'treated', NA)
})

您可以反转$treat子句以反映控制组并将变量附加到您的df。