我有一个数据框,我想在每个测量中应用函数均值来汇总两个变量。这是数据框的负责人:
Subject Activity meassureA meassureB meassureC meassureD
1 1 running 0.2820216 -0.037696218 -0.13489730 -0.3282802
2 1 running 0.2558408 -0.064550029 -0.09518634 -0.2292069
3 1 walking 0.2548672 0.003814723 -0.12365809 -0.2751579
4 2 running 0.3433705 -0.014446221 -0.16737697 -0.2299235
现在,我希望得到这样的结果:
Subject Activity meassureA meassureB meassureC meassureD
1 1 running mean(S1,A1) mean(S1,A1) mean(S1,A1) mean(S1,A1)
2 1 walking mean(S1,A2) mean(S1,A2) mean(S1,A2) mean(S1,A2)
3 2 running mean(S2,A1) mean(S2,A1) mean(S2,A1) mean(S2,A1)
4 2 walking mean(S2,A2) mean(S2,A2) mean(S2,A2) mean(S2,A2)
其中meassure A的值是主题1(S1)执行活动(A1)的所有值的平均值。
我在考虑使用aggregate(),但是到目前为止我无法应用我学到的问题。任何帮助都非常感谢。
答案 0 :(得分:1)
正如大卫在评论中提到的,你可以这样做:
aggregate(. ~ Subject + Activity, df, mean)
或使用data.table
:
data.table::setDT(df)[, lapply(.SD, mean), by = .(Subject, Activity)]
或使用dplyr
:
library(dplyr)
df %>% group_by(Subject, Activity) %>% summarise_each(funs(mean))
给出了:
# Subject Activity meassureA meassureB meassureC meassureD
#1 1 running 0.2689312 -0.051123123 -0.1150418 -0.2787436
#2 1 walking 0.2548672 0.003814723 -0.1236581 -0.2751579
#3 2 running 0.3433705 -0.014446221 -0.1673770 -0.2299235