假设我要根据B-D列中的不同值来计算A列的平均值(或自定义函数)。数据如下:
input:
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
B = round(runif(20,min = 0,max = 1),0),
C = round(runif(20,min = 0,max = 1),0),
D = round(runif(20,min = 0,max = 1),0))
output (note your rand numbers might result in different summary table):
col value mean
B 0 5.92
B 1 4.71
C 0 6
C 1 5.17
D 0 4.89
D 1 6
我可以为每一列分别做到这一点:
data %>% group_by(B) %>% summarise(mean(A))
我将其放在for loop
中:
p <- data.frame(NULL)
for(i in c('B','C','D')){
q <- data %>% group_by_(i) %>% summarise(col=i,mean = mean(A))
p <- append(p,q)
}
但是它没有按预期工作。任何建议都将非常有帮助。
答案 0 :(得分:1)
一种选择是将数据gather
转换为“长”格式,并按“键”,“值”列分组,获得{A}的mean
library(tidyverse)
gather(data, key, val, B:D) %>%
group_by(key, val) %>%
summarise(A = mean(A))
或者在base R
中,通过unlist
将列从“ B”更改为“ D”,并将分组列用作具有复制列名的“ A”
aggregate(A ~ ., cbind(data['A'], cN = names(data)[-1][col(data[-1])],
group = unlist(data[-1])), mean)
set.seed(24)
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
B = round(runif(20,min = 0,max = 1),0),
C = round(runif(20,min = 0,max = 1),0),
D = round(runif(20,min = 0,max = 1),0))
答案 1 :(得分:1)
使用base和reshape软件包的另一个选项是:
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
B = round(runif(20,min = 0,max = 1),0),
C = round(runif(20,min = 0,max = 1),0),
D = round(runif(20,min = 0,max = 1),0))
melt(t(apply(data[,-1],2,function(x) by(data[,1],x,mean))))
Var1 Var2 value
1 B 0 4.100000
2 C 0 3.727273
3 D 0 4.250000
4 B 1 4.800000
5 C 1 5.333333
6 D 1 4.583333
melt和t函数只是为了获得所需形状的输出