以下是我的数据的一个可重复的小例子:
> mydata <- structure(list(subject = c(1, 1, 1, 2, 2, 2), time = c(0, 1, 2, 0, 1, 2), measure = c(10, 12, 8, 7, 0, 0)), .Names = c("subject", "time", "measure"), row.names = c(NA, -6L), class = "data.frame")
> mydata
subject time measure
1 0 10
1 1 12
1 2 8
2 0 7
2 1 0
2 2 0
我想生成一个新变量,即“从基线变化”。也就是说,我想
subject time measure change
1 0 10 0
1 1 12 2
1 2 8 -2
2 0 7 0
2 1 0 -7
2 2 0 -7
有没有一种简单的方法可以做到这一点,除了以编程方式循环遍历所有记录或首先重塑为宽格式?
答案 0 :(得分:4)
有很多种可能性。我的最爱:
library(plyr)
ddply(mydata,.(subject),transform,change=measure-measure[1])
subject time measure change
1 1 0 10 0
2 1 1 12 2
3 1 2 8 -2
4 2 0 7 0
5 2 1 0 -7
6 2 2 0 -7
library(data.table)
myDT <- as.data.table(mydata)
myDT[,change:=measure-measure[1],by=subject]
print(myDT)
subject time measure change
1: 1 0 10 0
2: 1 1 12 2
3: 1 2 8 -2
4: 2 0 7 0
5: 2 1 0 -7
6: 2 2 0 -7
如果您的数据集很大,data.table是首选。
答案 1 :(得分:3)
怎么样:
mydata$change <- do.call("c", with(mydata, lapply(split(measure, subject), function(x) x - x[1])))
或者你也可以使用ave
函数:
with(mydata, ave(measure, subject, FUN=function(x) x - x[1]))
# [1] 0 2 -2 0 -7 -7
或
within(mydata, change <- ave(measure, subject, FUN=function(x) x - x[1]))
# subject time measure change
# 1 1 0 10 0
# 2 1 1 12 2
# 3 1 2 8 -2
# 4 2 0 7 0
# 5 2 1 0 -7
# 6 2 2 0 -7
答案 2 :(得分:1)
你可以使用tapply:
mydata$change<-as.vector(unlist(tapply(mydata$measure,mydata$subject,FUN=function(x){return (x-rep(x[1],length(x)))})));