我有这种格式的数据,其中样本是成组的(在此示例中为A或B),具有数字数量和质量得分(这是一个因素)。
我想每个summarise
qual_score
group_name
。
示例数据:
group_name <- rep(c("A","B"),5)
qual_score <- c(rep("POOR",4),rep("FAIR",1),rep("GOOD",5))
quantity <- 5:14
df <- data.frame(group_name, qual_score, quantity)
> df
group_name qual_score quantity
1 A POOR 5
2 B POOR 6
3 A POOR 7
4 B POOR 8
5 A FAIR 9
6 B FAIR 10
7 A GOOD 11
8 B GOOD 12
9 A GOOD 13
10 B GOOD 14
所需的输出:
desired_output <- data.frame(c("2","2"),c("1","0"),c("2","3"))
colnames(desired_output) <- c("POOR", "FAIR", "GOOD")
rownames(desired_output) <- c("A", "B")
desired_output
POOR FAIR GOOD
A 2 1 2
B 2 0 3
我可以对整个数据帧执行summary()
中的qual_score
:
> summary(df$qual_score)
FAIR GOOD POOR
2 4 4
并且可以group_by()
根据每个组来总结均值(数量):
> df %>%
+ group_by(group_name) %>%
+ summarise(mean(quantity))
# A tibble: 2 x 2
group_name `mean(quantity)`
<fct> <dbl>
1 A 9
2 B 10
但是当我尝试将group_by()与summary()结合使用时,会收到警告和以下输出:
> df %>%
+ group_by(group_name) %>%
+ summary(qual_score)
group_name qual_score quantity
A:5 FAIR:2 Min. : 5.00
B:5 GOOD:4 1st Qu.: 7.25
POOR:4 Median : 9.50
Mean : 9.50
3rd Qu.:11.75
Max. :14.00
Warning messages:
1: In if (length(ll) > maxsum) { :
the condition has length > 1 and only the first element will be used
2: In if (length(ll) > maxsum) { :
the condition has length > 1 and only the first element will be used
答案 0 :(得分:2)
library(dplyr)
df %>%
group_by(group_name) %>%
select(-quantity) %>%
table()
#> qual_score
#> group_name FAIR GOOD POOR
#> A 1 2 2
#> B 0 3 2
如果您想完全在tidyverse
中找到解决方案:
library(dplyr)
library(tidyr)
df %>%
group_by(group_name, qual_score) %>%
tally() %>%
spread(qual_score, n, fill=0)
#> # A tibble: 2 x 4
#> # Groups: group_name [2]
#> group_name FAIR GOOD POOR
#> <fct> <dbl> <dbl> <dbl>
#> 1 A 1 2 2
#> 2 B 0 3 2