我有一个数据集,其中包含一系列ID和活动,以及一系列ID和活动组合的观察列。我想取每次观察的平均值,但由于有数百次观察,我不清楚如何继续。
示例数据:
id,activity,obs1,obs2,obs3
1,1,325,6432,5432
1,2,321,214,2143
1,3,3652,123,123
2,1,5321,123,643
2,2,4312,4321,432
2,3,522,123,321
1,1,532,765,8976
1,2,142,865,5445
1,3,643,654,53
2,1,756,765,7865
2,2,876,654,976
2,3,6754,765,987
到目前为止我尝试过:
library(dplyr)
example <- read.table("clipboard",sep=",",header=T)
group <- group_by(example,id,activity)
summarize(group, mobs1=mean(obs1), mobs2=mean(obs2), mobs3=mean(obs3))
哪个能为我提供正确的表单,但如何在不summarize()
数百次的情况下查看mobsN=mean(obsN)
?我觉得应用功能会在这里进行,但我不确定是哪个...
答案 0 :(得分:3)
这应该会给你想要的结果:
library(dplyr)
means.wide <- example %>%
group_by(id,activity) %>%
summarise_each(funs(mean))
您还可以将example
转换为长格式,然后计算均值:
library(dplyr)
library(tidyr)
means.long <- example %>%
gather(obs, val, -c(id,activity)) %>%
group_by(id,activity,obs) %>%
summarise(mean_val=mean(val))
您也可以使用data.table
包执行此操作:
# compareble to the wide dplyr version
library(data.table)
setDT(example)[, lapply(.SD, mean), by=list(id,activity)]
# compareble to the long dplyr version
library(data.table)
melt(setDT(example),id.vars=c("id","activity"))[, mean(value), by=list(id,activity,variable)]
不要忘记好的旧基础R :
aggregate(. ~ id + activity, example, FUN = mean)