在R中,我有一个像这样的data.frame:
df1 <- data.frame(
grade = rep(LETTERS[1:5], 4),
sex = c(rep("male", 5), rep("female", 5), rep("male", 4), rep("female", 6)),
class = c(rep(1, 10), rep(2, 10))
)
df1
grade sex class
1 A male 1
2 B male 1
3 C male 1
4 D male 1
5 E male 1
6 A female 1
7 B female 1
8 C female 1
9 D female 1
10 E female 1
11 A male 2
12 B male 2
13 C male 2
14 D male 2
15 E female 2
16 A female 2
17 B female 2
18 C female 2
19 D female 2
20 E female 2
我想计算每个班级的性别百分比,并制作另一个data.frame,如:
Class Male_percent Female_percentage
1 50% 50%
2 40% 60%
有人可以教我怎么做吗? 之前可能已经问过这个问题,但我不知道这个问题的关键字是什么。如果我再次提出同样的问题,我很抱歉。
答案 0 :(得分:3)
你可以尝试
prop.table(table(df1[3:2]),1)*100
# sex
#class female male
# 1 50 50
# 2 60 40
或data.table
library(data.table)
setDT(df1)[, .N, by = .(class, sex)
][, .(Male_percent = paste0(100 * N[sex == 'male'] / sum(N), '%'),
Female_percent = paste0(100 * N[sex == 'female'] / sum(N), '%')),
by = class]
# class Male_percent Female_percent
#1: 1 50% 50%
#2: 2 40% 60%
或使用dplyr
library(dplyr)
df1 %>%
group_by(class) %>%
summarise(Male_Percent= sprintf('%d%%', 100*sum(sex=='male')/n()),
Female_Percent = sprintf('%d%%', 100*sum(sex=='female')/n()))
# class Male_Percent Female_Percent
#1 1 50% 50%
#2 2 40% 60%
或
library(sqldf)
res1 <- sqldf('select class,
100*sum(sex=="male")/count(sex) as m,
100*sum(sex=="female")/count(sex) as f,
"%" as p
from df1
group by class')
sqldf("select class,
m||p as Male_Percent,
f||p as Female_Percent
from res1")
# class Male_Percent Female_Percent
#1 1 50% 50%
#2 2 40% 60%
基于@ G.Grothendieck的评论,sqldf
评论可以简化为
sqldf("select class,
(100 * avg(sex = 'male')) || '%' as Male_Percent,
(100 * avg(sex = 'female')) || '%' as Female_Percent
from df1 group
by class")
# class Male_Percent Female_Percent
#1 1 50.0% 50.0%
#2 2 40.0% 60.0%
答案 1 :(得分:0)
使用data.table
包,您可以执行以下操作
setDT(df)[ , .(
Male_Percent = paste0(( nrow(.SD[sex == "male"]) / .N ) * 100 , "%") ,
Female_Percent = paste0(( nrow(.SD[sex == "female"]) / .N ) * 100 , "%")
) ,
by = class
]
结果
# class Male_Percent Female_Percent
# 1: 1 50% 50%
# 2: 2 40% 60%
另一个dplyr
解决方案
df %>%
group_by(sex , class) %>%
summarise(n = n()) %>%
group_by(class) %>%
summarise(
Male_Percent = paste0((n[sex == "male"] / sum(n)) * 100 , "%") ,
Female_Percent = paste0((n[sex == "female"] / sum(n) * 100) , "%")
)
# class Male_Percent Female_Percent
# 1 50% 50%
# 2 40% 60%
答案 2 :(得分:0)
从janitor包中尝试crosstab
和adorn_crosstab
,它执行这两项任务(交叉制表两个变量,然后将结果格式化为百分比):
library(janitor)
df1 %>%
crosstab(class, sex) %>%
adorn_crosstab(show_n = FALSE, digits = 0)
class female male
1 1 50% 50%
2 2 60% 40%
如果您想将百分比保持为类numeric
(例如,进一步计算),请改用ns_to_percents()
:
df1 %>%
crosstab(class, sex) %>%
ns_to_percents()
class female male
1 1 0.5 0.5
2 2 0.6 0.4
免责声明:我是这些功能的作者。