例如,
> set.seed(1)
df1 <- data.frame(ID = c(rep(c(rep(1,3), rep(2,3)),2),rep(c(rep(3,3), rep(4,3)),2)),
Day=rep(c(1,2,3),8))
df2 <- data.frame(measure = c(rep("mean",6),rep("median",6),rep("mean",6),rep("median",6)),
val=sample(1:24,24))
data <- cbind(df1,df2)
> data
ID Day measure val
1 1 1 mean 7
2 1 2 mean 9
3 1 3 mean 13
4 2 1 mean 20
5 2 2 mean 5
6 2 3 mean 18
7 1 1 median 19
8 1 2 median 12
9 1 3 median 11
10 2 1 median 1
11 2 2 median 3
12 2 3 median 14
13 3 1 mean 23
14 3 2 mean 21
15 3 3 mean 8
16 4 1 mean 16
17 4 2 mean 6
18 4 3 mean 24
19 3 1 median 22
20 3 2 median 4
21 3 3 median 17
22 4 1 median 15
23 4 2 median 2
24 4 3 median 10
我想创建另一个变量,用于衡量每个ID中每个度量的第1天的变化,以便
ID Day measure val change
1 1 1 mean 7 0
2 1 2 mean 9 2
3 1 3 mean 13 6
4 2 1 mean 20 0
5 2 2 mean 5 -15
6 2 3 mean 18 -2
7 1 1 median 19 0
8 1 2 median 12 -7
9 1 3 median 11 -8
10 2 1 median 1 0
11 2 2 median 3 2
12 2 3 median 14 13
13 3 1 mean 23 0
14 3 2 mean 21 -2
15 3 3 mean 8 -15
16 4 1 mean 16 0
17 4 2 mean 6 -10
18 4 3 mean 24 8
19 3 1 median 22 0
20 3 2 median 4 -18
21 3 3 median 17 -5
22 4 1 median 15 0
23 4 2 median 2 -13
24 4 3 median 10 -5
我一直在尝试修改Calculating change from baseline with data in long format中的代码,但我的数据集中有重复的措施。
答案 0 :(得分:3)
我们可以使用data.table
来创建“更改”列。将'data.frame'转换为'data.table'(setDT(data)
),按'ID','measure'分组,我们计算'val'和'day'之间的差异,对应'Day'1创造'变化'。
library(data.table)
setDT(data)[, change:= val-val[Day==1L], by = .(ID, measure)]
data
# ID Day measure val change
# 1: 1 1 mean 7 0
# 2: 1 2 mean 9 2
# 3: 1 3 mean 13 6
# 4: 2 1 mean 20 0
# 5: 2 2 mean 5 -15
# 6: 2 3 mean 18 -2
# 7: 1 1 median 19 0
# 8: 1 2 median 12 -7
# 9: 1 3 median 11 -8
#10: 2 1 median 1 0
#11: 2 2 median 3 2
#12: 2 3 median 14 13
#13: 3 1 mean 23 0
#14: 3 2 mean 21 -2
#15: 3 3 mean 8 -15
#16: 4 1 mean 16 0
#17: 4 2 mean 6 -10
#18: 4 3 mean 24 8
#19: 3 1 median 22 0
#20: 3 2 median 4 -18
#21: 3 3 median 17 -5
#22: 4 1 median 15 0
#23: 4 2 median 2 -13
#24: 4 3 median 10 -5
使用dplyr
的类似选项是
library(dplyr)
data %>%
group_by(ID, measure) %>%
mutate(change = val- val[Day==1L])
如果订购了“日期”列,则base R
选项ave
data$change <- with(data, val-ave(val, ID, measure, FUN=function(x) head(x,1)))
或其他base R
选项,如果列是有序的,则不进行分组
data$change <- with(data, {i <- Day==1L; val-(val*i)[val*i>0][cumsum(i)] })