重复ID的基线变化

时间:2015-07-24 20:42:17

标签: r

例如,

> set.seed(1)
 df1 <- data.frame(ID = c(rep(c(rep(1,3), rep(2,3)),2),rep(c(rep(3,3), rep(4,3)),2)),
                     Day=rep(c(1,2,3),8))
 df2 <- data.frame(measure = c(rep("mean",6),rep("median",6),rep("mean",6),rep("median",6)),
                     val=sample(1:24,24))

 data <- cbind(df1,df2)

> data

    ID Day measure val
1   1   1    mean   7
2   1   2    mean   9
3   1   3    mean  13
4   2   1    mean  20
5   2   2    mean   5
6   2   3    mean  18
7   1   1  median  19
8   1   2  median  12
9   1   3  median  11
10  2   1  median   1
11  2   2  median   3
12  2   3  median  14
13  3   1    mean  23
14  3   2    mean  21
15  3   3    mean   8
16  4   1    mean  16
17  4   2    mean   6
18  4   3    mean  24
19  3   1  median  22
20  3   2  median   4
21  3   3  median  17
22  4   1  median  15
23  4   2  median   2
24  4   3  median  10

我想创建另一个变量,用于衡量每个ID中每个度量的第1天的变化,以便

    ID Day measure val change
1   1   1    mean   7    0
2   1   2    mean   9    2
3   1   3    mean  13    6
4   2   1    mean  20    0
5   2   2    mean   5  -15
6   2   3    mean  18   -2
7   1   1  median  19    0
8   1   2  median  12   -7
9   1   3  median  11   -8
10  2   1  median   1    0
11  2   2  median   3    2
12  2   3  median  14   13
13  3   1    mean  23    0
14  3   2    mean  21   -2
15  3   3    mean   8   -15
16  4   1    mean  16    0
17  4   2    mean   6   -10
18  4   3    mean  24    8
19  3   1  median  22    0
20  3   2  median   4   -18
21  3   3  median  17   -5
22  4   1  median  15    0
23  4   2  median   2   -13
24  4   3  median  10   -5

我一直在尝试修改Calculating change from baseline with data in long format中的代码,但我的数据集中有重复的措施。

1 个答案:

答案 0 :(得分:3)

我们可以使用data.table来创建“更改”列。将'data.frame'转换为'data.table'(setDT(data)),按'ID','measure'分组,我们计算'val'和'day'之间的差异,对应'Day'1创造'变化'。

library(data.table)
setDT(data)[, change:= val-val[Day==1L], by = .(ID, measure)]
data
#    ID Day measure val change
# 1:  1   1    mean   7      0
# 2:  1   2    mean   9      2
# 3:  1   3    mean  13      6
# 4:  2   1    mean  20      0
# 5:  2   2    mean   5    -15
# 6:  2   3    mean  18     -2
# 7:  1   1  median  19      0
# 8:  1   2  median  12     -7
# 9:  1   3  median  11     -8
#10:  2   1  median   1      0
#11:  2   2  median   3      2
#12:  2   3  median  14     13
#13:  3   1    mean  23      0
#14:  3   2    mean  21     -2
#15:  3   3    mean   8    -15
#16:  4   1    mean  16      0
#17:  4   2    mean   6    -10
#18:  4   3    mean  24      8
#19:  3   1  median  22      0
#20:  3   2  median   4    -18
#21:  3   3  median  17     -5
#22:  4   1  median  15      0
#23:  4   2  median   2    -13
#24:  4   3  median  10     -5

使用dplyr的类似选项是

library(dplyr)
data %>% 
   group_by(ID, measure) %>%
   mutate(change = val- val[Day==1L])

如果订购了“日期”列,则base R选项ave

 data$change <- with(data, val-ave(val, ID, measure, FUN=function(x) head(x,1)))

或其他base R选项,如果列是有序的,则不进行分组

 data$change <- with(data, {i <- Day==1L; val-(val*i)[val*i>0][cumsum(i)] })