我的数据是这样的:
OnAntibiotic Gender
1 Yes Male
2 Yes Male
3 Yes Female
4 Yes Female
5 Yes Female
6 Yes Female
7 Yes Female
8 Yes Female
9 Yes Female
10 Yes Female
11 Yes Male
12 Yes Male
13 Yes Male
14 Yes Male
15 No Female
16 No Female
17 No Female
18 No Male
19 No Male
我想创建一个具有计数和百分比的二乘二表。如下所示:
Yes (%) No (%)
Male 6 (42.8%) 2 (40%)
Female 8 (57.1%) 3 (60%)
我的代码是:
library(dplyr)
df %>%
group_by(Gender, OnAntibiotic) %>%
summarise(count=n())%>%
mutate(freq= n/sum(n))
请帮助我。非常感谢。
答案 0 :(得分:1)
这不是您要求的表格格式,但是包含相同的数据。您可能还需要研究功能table
和prop.table
:
df %>%
group_by(OnAntibiotic, Gender) %>%
summarise(count = n()) %>%
mutate(freq = sprintf("%d (%0.1f%%)", count, 100*count/sum(count))) %>%
select(-count)
# A tibble: 4 x 3
# Groups: OnAntibiotic [2]
OnAntibiotic Gender freq
<chr> <chr> <chr>
1 No Female 3 (60.0%)
2 No Male 2 (40.0%)
3 Yes Female 8 (57.1%)
4 Yes Male 6 (42.9%)
值得注意的是,当您在已分组的data.frame上使用summarize
时,会剥去最近的分组层。因此,在上面使用mutate时,它是对OnAntibiotic
个组的计数求和,因为在sumsum调用之后删除了另外的gender
组。
spread
动词@A。苏里曼之所以使用,是因为列名不是特别有用,但是如果您正在寻找一个表用于演示目的,那么我想您可以继续进行下去。
答案 1 :(得分:1)
library(dplyr)
df %>% group_by(Gender,OnAntibiotic) %>% mutate(n=n()) %>%
group_by(OnAntibiotic) %>% distinct(OnAntibiotic,Gender,n)%>%
mutate(Per=n/sum(n), np=paste0(n," (",round(Per*100,2)," %)")) %>%
select(-n,-Per) %>% spread(OnAntibiotic,np)
# A tibble: 2 x 3
Gender No Yes
<fct> <chr> <chr>
1 Female 3 (60 %) 8 (57.14 %)
2 Male 2 (40 %) 6 (42.86 %)
答案 2 :(得分:0)
您已经步入正轨,但是您追求的格式不是“整洁”的数据。查看tidyr
软件包以重新格式化结果。请小心重新分组数据!
library(tidyverse)
df %>%
group_by(Gender, OnAntibiotic) %>%
summarise(count = n()) %>%
group_by(OnAntibiotic) %>%
mutate(freq = n/sum(n)) %>%
gather(measure, val) %>%
spread(OnAntibiotic, val)
这将产生接近您想要的结果的东西。如果您想要转置版本,则需要获取更通用(但更困难)的reshape2
软件包。
注意:此答案的目标是数字而不是可打印的字符串