以下是我的数据的一个可重复的小例子:
> mydata <- structure(list(subject = c(1, 1, 1, 2, 2, 2), time = c(0, 1, 2, 0, 1, 2), measure = c(10, 12, 8, 7, 0, 0)), .Names = c("subject", "time", "measure"), row.names = c(NA, -6L), class = "data.frame")
> mydata
subject time measure
1 0 10
1 1 12
1 2 8
2 0 7
2 1 0
2 2 0
我想为该特定主题生成一个包含measure
均值的新变量,所以:
subject time measure mn_measure
1 0 10 10
1 1 12 10
1 2 8 10
2 0 7 2.333
2 1 0 2.333
2 2 0 2.333
有没有一种简单的方法可以做到这一点,除了以编程方式循环遍历所有记录或首先重塑为宽格式?
答案 0 :(得分:14)
使用基本R函数ave()
,尽管名称令人困惑,但可以计算各种统计数据,包括mean
:
within(mydata, mean<-ave(measure, subject, FUN=mean))
subject time measure mean
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
请注意,我只是为了缩短代码而使用within
。这是没有within()
的等价物:
mydata$mean <- ave(mydata$measure, mydata$subject, FUN=mean)
mydata
subject time measure mean
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
答案 1 :(得分:9)
或者使用data.table
包:
require(data.table)
dt <- data.table(mydata, key = "subject")
dt[, mn_measure := mean(measure), by = subject]
# subject time measure mn_measure
# 1: 1 0 10 10.000000
# 2: 1 1 12 10.000000
# 3: 1 2 8 10.000000
# 4: 2 0 7 2.333333
# 5: 2 1 0 2.333333
# 6: 2 2 0 2.333333
答案 2 :(得分:6)
您可以使用ddply
包中的plyr
:
library(plyr)
res = ddply(mydata, .(subject), mutate, mn_measure = mean(measure))
res
subject time measure mn_measure
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333