我有一个类似于以下的数据集,我的最终目标是制作一个表格,显示每个性别的平均工资和女性的平均工资等男性比例变量。
library(dplyr)
x <- data.frame(Department = c("Dep1", "Dep1","Dep2", "Dep2","Dep3"),
Gender = c("F", "M", "F", "M", "F"),
Salary = seq(10,14))
Department Gender Salary
1 Dep1 F 10
2 Dep1 M 11
3 Dep2 F 12
4 Dep2 M 13
5 Dep3 F 14
步骤1:首先,我使用汇总计算所需的汇总统计数据。
Table <- x %>% group_by(Department, Gender) %>% summarise(Count = n(),
AverageSalary = mean(Salary, na.rm = T),
MedianSalary = median(Salary, na.rm = T))
步骤2:要计算比例并将新列添加到“表格”,我会使用几天前从此论坛获得的提示。
Table %>% group_by(Department) %>%
mutate(`AvgSalaryWomen/Men` = AverageSalary[Gender == "F"]/AverageSalary[Gender == "M"],
`MedianSalaryWomen/Men` = MedianSalary[Gender == "F"]/MedianSalary[Gender == "M"])
我的挑战是Dep3没有任何男性,因此我收到以下错误消息:
Error in mutate_impl(.data, dots) :
Column `AvgSalaryWomen/Men` must be length 1 (the group size), not 0
我希望的是这样的事情
Department Gender Count AverageSalary MedianSalary AvgSalaryWomen.Men MedianSalaryWomen.Men
1 Dep1 F 1 10 10 0.9090909 0.9090909
2 Dep1 M 1 11 11 0.9090909 0.9090909
3 Dep2 F 1 12 12 0.9230769 0.9230769
4 Dep2 M 1 13 13 0.9230769 0.9230769
5 Dep3 F 1 14 14 NA NA
或者
Department Gender Count AverageSalary MedianSalary AvgSalaryWomen.Men MedianSalaryWomen.Men
1 Dep1 F 1 10 10 0.9090909 0.9090909
2 Dep1 M 1 11 11 NA NA
3 Dep2 F 1 12 12 0.9230769 0.9230769
4 Dep2 M 1 13 13 NA NA
5 Dep3 F 1 14 14 NA NA
有没有一种简单的方法可以获得这两种结果中的任何一种?我猜测替代1将是最简单的。 提前谢谢!
答案 0 :(得分:1)
使用ifelse
,您可以在计算比率之前检查某个部门是否存在两种性别(如果不存在,则返回NA
)。像这样:
Table %>% group_by(Department) %>%
mutate(`AvgSalaryWomen/Men` = ifelse(length(unique(Gender)) == 2,
AverageSalary[Gender == "F"]/AverageSalary[Gender == "M"], NA),
`MedianSalaryWomen/Men` = ifelse(length(unique(Gender)) == 2,
MedianSalary[Gender == "F"]/MedianSalary[Gender == "M"], NA))
# A tibble: 5 x 7 # Groups: Department [3] Department Gender Count AverageSalary MedianSalary `AvgSalaryWomen/Men` `MedianSalaryWomen/Men` <fct> <fct> <int> <dbl> <int> <dbl> <dbl> 1 Dep1 F 1 10.0 10 0.909 0.909 2 Dep1 M 1 11.0 11 0.909 0.909 3 Dep2 F 1 12.0 12 0.923 0.923 4 Dep2 M 1 13.0 13 0.923 0.923 5 Dep3 F 1 14.0 14 NA NA