按因子分组,然后汇总一个不同的变量

时间:2019-06-30 23:50:50

标签: r

我有这种格式的数据,其中样本是成组的(在此示例中为A或B),具有数字数量和质量得分(这是一个因素)。

我想每个summarise qual_score group_name

示例数据:

group_name <- rep(c("A","B"),5)
qual_score <- c(rep("POOR",4),rep("FAIR",1),rep("GOOD",5))
quantity <- 5:14

df <- data.frame(group_name, qual_score, quantity)
> df
   group_name qual_score quantity
1           A       POOR        5
2           B       POOR        6
3           A       POOR        7
4           B       POOR        8
5           A       FAIR        9
6           B       FAIR       10
7           A       GOOD       11
8           B       GOOD       12
9           A       GOOD       13
10          B       GOOD       14

所需的输出:

desired_output <- data.frame(c("2","2"),c("1","0"),c("2","3"))
colnames(desired_output) <- c("POOR", "FAIR", "GOOD")
rownames(desired_output) <- c("A", "B")
desired_output

  POOR FAIR GOOD
A    2    1    2
B    2    0    3

我可以对整个数据帧执行summary()中的qual_score

> summary(df$qual_score)
FAIR GOOD POOR 
   2    4    4 

并且可以group_by()根据每个组来总结均值(数量):

> df %>%
+     group_by(group_name) %>%
+     summarise(mean(quantity))
# A tibble: 2 x 2
  group_name `mean(quantity)`
  <fct>                 <dbl>
1 A                         9
2 B                        10

但是当我尝试将group_by()与summary()结合使用时,会收到警告和以下输出:

> df %>%
+     group_by(group_name) %>%
+     summary(qual_score)
 group_name qual_score    quantity    
 A:5        FAIR:2     Min.   : 5.00  
 B:5        GOOD:4     1st Qu.: 7.25  
            POOR:4     Median : 9.50  
                       Mean   : 9.50  
                       3rd Qu.:11.75  
                       Max.   :14.00  
Warning messages:
1: In if (length(ll) > maxsum) { :
  the condition has length > 1 and only the first element will be used
2: In if (length(ll) > maxsum) { :
  the condition has length > 1 and only the first element will be used

1 个答案:

答案 0 :(得分:2)

library(dplyr)

df %>% 
  group_by(group_name) %>% 
  select(-quantity) %>% 
  table()

#>           qual_score
#> group_name FAIR GOOD POOR
#>          A    1    2    2
#>          B    0    3    2

如果您想完全在tidyverse中找到解决方案:

library(dplyr)
library(tidyr)

df %>% 
  group_by(group_name, qual_score) %>%
  tally() %>%
  spread(qual_score, n, fill=0) 

#> # A tibble: 2 x 4
#> # Groups:   group_name [2]
#>   group_name  FAIR  GOOD  POOR
#>   <fct>      <dbl> <dbl> <dbl>
#> 1 A              1     2     2
#> 2 B              0     3     2