计算多站点RCT的权重

时间:2014-09-01 16:09:27

标签: r plyr weighted-average

假设多站点RCT有两组 - 实验和对照。需要为每个时间段和站点计算权重。我将实验对象的权重固定为1,然后计算对照组的权重,使得对照组权重的总和等于实验组权重的总和(对于每个时间段和站点)。以下是生成假设数据集的代码:

set.seed(13458) # Set random seed
ID <- c(1:20)  # Generate 20 unique subject IDs 
timePeriod <- c(rep(1, 3), rep(2, 6), rep(3, 11)) # Generate time periods
site <- c(rep("A", 3), rep("B", 6), rep("C", 11)) # Generate sites
group <- sample(c("exp", "cont"), 20, replace = TRUE) # Random assignment
outcome <- sample(c(rep(0,75), 0:100), 20, replace = TRUE) # Generate outcomes 
DF <- data.frame(ID, timePeriod, site, group, outcome) # Create a data frame

输出:

head(DF)
    ID timePeriod site group outcome
1   1          1    A  cont      12
2   2          1    A  cont      37
3   3          1    A   exp      59
4   4          2    B   exp       0
5   5          2    B  cont       0
6   6          2    B   exp       0
7   7          2    B  cont       0
8   8          2    B   exp      22
9   9          2    B   exp      34
10 10          3    C  cont      26

这是一个相当笨拙的策略,用于计算每个时间段和站点的实验和对照受试者的权重。

library(plyr)
a <- ddply(DF, c("timePeriod", "site", "group"), function(x){
     countSubjects <- length(x$group)
     data.frame(N = countSubjects)     
     })

a$weight <- rep(NA, nrow(a)) for(i in 1:nrow(a))
     {
     n <- a$N[i+1]
     d <- a$N[i]
     weight <- n/d
     a$weight[i] <- ifelse(a$group[i] == "cont", weight, 
                         ifelse(a$group[i] == "exp", 1, a$ratio))
     }

> print(a)
  timePeriod site group N weight
1          1    A  cont 2    0.5
2          1    A   exp 1    1.0
3          2    B  cont 2    2.0
4          2    B   exp 4    1.0
5          3    C  cont 5    1.2
6          3    C   exp 6    1.0
> 

如何使用这些权重来计算调整后的结果(即将每个受试者的结果乘以时间段,地点和组的相应权重)?我的兴趣是将计算的权重和调整后的结果添加到原始数据框。 (以下是调整后结果的一个例子。)

ID 1:  12 * .5 = 6
ID 2:  37 * .5 = 18.5
ID 3:  59 * 1 = 59 
ID 4:  0 * 1 = 0
…
ID 8:  22 * 1 = 22
…
ID 10: 26 * 1.2 = 31.2

1 个答案:

答案 0 :(得分:1)

使用dplyr可以这样做:

tmp <- DF %>% group_by(timePeriod, site, group) %>% mutate(N=n(), outcome)
tmp %>% group_by(timePeriod, site) %>% 
  arrange(group) %>%
  mutate(weight=ifelse(group=="cont", last(N)/first(N), last(N)/last(N)),
         adjusted=outcome*weight )