Question

我想将每个亚组的一半分配给治疗条件，将一半分配给对照组。当我的子组中的记录数为奇数时，可以任意分配最后一个。

我正试图在dplyr小组中做到这一点，并努力解释奇/偶。我尝试过：

set.seed(1)
library(dplyr)
mtcars %>%
  group_by(cyl) %>%
  mutate(group = case_when(
    n() %% 2 == 0 ~  sample(rep(c("treatment", "control"), n() / 2)),
    TRUE ~ sample(rep(c("treatment", "control"), ceiling(n() / 2)))[-1]
  ))

但是我得到了错误：

错误：TRUE ~ sample(rep(c("treatment", "control"), ceiling(n()/2)))[-1]的长度必须为10或1，而不是11

如果该方法更简单，我也愿意使用purrr。

Answer 1

mtcars %>% 
  group_by(cyl) %>% 
  mutate(group = sample(rep(c("treatment", "control"), ceiling(n()/2)), n()))

对于组中偶数n = 2k行，它会重排k“处理”和k“控制”值。
对于奇数n = 2k + 1，它从2k + 1“处理”值和k + 1“控制”值中采样k + 1值。我相信这就是您所需要的。

这当然可以推广到任意数量的组：

mtcars %>% 
  group_by(cyl) %>% 
  mutate(group = sample(rep(c("A", "B", "C"), ceiling(n()/3)), n())) %>% 
  count(cyl, group)

Answer 2

我相信这是问题所要求的。

mtcars %>%
  group_by(cyl) %>%
  mutate(i = row_number() %in% sample(row_number(), n() %/% 2),
         group = ifelse(i, "treatment", "control")) %>%
  select(-i)

通过count设置group的值来检查结果。

library(dplyr)

set.seed(1)

mtcars %>%
  group_by(cyl) %>%
  mutate(i = row_number() %in% sample(row_number(), n() %/% 2),
         group = ifelse(i, "treatment", "control")) %>%
  select(-i) %>%
  count(cyl, group)
## A tibble: 6 x 3
## Groups:   cyl [3]
#    cyl group         n
#  <dbl> <chr>     <int>
#1     4 control       6
#2     4 treatment     5
#3     6 control       4
#4     6 treatment     3
#5     8 control       7
#6     8 treatment     7

不同大小的亚组内的随机分组

2 个答案: