如何基于多个其他列groupby重复总结一列

时间:2019-06-03 13:54:27

标签: r for-loop dplyr apply summary

假设我要根据B-D列中的不同值来计算A列的平均值(或自定义函数)。数据如下:

input:
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
                   B = round(runif(20,min = 0,max = 1),0),
                   C = round(runif(20,min = 0,max = 1),0),
                   D = round(runif(20,min = 0,max = 1),0))

output (note your rand numbers might result in different summary table):
col value mean    
B   0     5.92
B   1     4.71
C   0     6   
C   1     5.17
D   0     4.89
D   1     6

我可以为每一列分别做到这一点:

data %>% group_by(B) %>% summarise(mean(A))

我将其放在for loop中:

p <- data.frame(NULL)
for(i in c('B','C','D')){
  q <- data %>% group_by_(i) %>% summarise(col=i,mean = mean(A))
  p <- append(p,q)
}

但是它没有按预期工作。任何建议都将非常有帮助。

2 个答案:

答案 0 :(得分:1)

一种选择是将数据gather转换为“长”格式,并按“键”,“值”列分组,获得{A}的mean

library(tidyverse)
gather(data, key, val, B:D) %>%
     group_by(key, val) %>%
     summarise(A = mean(A))

或者在base R中,通过unlist将列从“ B”更改为“ D”,并将分组列用作具有复制列名的“ A”

aggregate(A ~ ., cbind(data['A'], cN = names(data)[-1][col(data[-1])], 
           group = unlist(data[-1])), mean)

数据

set.seed(24)
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
               B = round(runif(20,min = 0,max = 1),0),
               C = round(runif(20,min = 0,max = 1),0),
               D = round(runif(20,min = 0,max = 1),0))

答案 1 :(得分:1)

使用base和reshape软件包的另一个选项是:

data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
                   B = round(runif(20,min = 0,max = 1),0),
                   C = round(runif(20,min = 0,max = 1),0),
                   D = round(runif(20,min = 0,max = 1),0))

melt(t(apply(data[,-1],2,function(x) by(data[,1],x,mean))))

  Var1 Var2    value
1    B    0 4.100000
2    C    0 3.727273
3    D    0 4.250000
4    B    1 4.800000
5    C    1 5.333333
6    D    1 4.583333

melt和t函数只是为了获得所需形状的输出