我试图找到具有给定值的列的最高比率。 我们假设我的数据如下:
Job company
=========================
accountant Bank
accountant Insurance Co
Manager Bank
Manager Bank
accountant Insurance Co
如果我想找到像银行这样的特定公司的最高会计师与经理比例,如何使用分组?
我正在尝试这样的事情,但没有工作,
MyData %>%
count( MyData$Job,MyData$company) %>%
group_by(MyData$Job) %>%
mutate(prop = MyData$Job[accountant] / sum(MyData$Job[accountant])) %>%
spread(key = company[bank], value = prop)
答案 0 :(得分:1)
count()
是group_by()
+ tally()
+ ungroup()
的包装器。否则,根据您的问题,听起来好像您再次需要group_by()
。
此外,您可以在此处直接引用变量名称,而不使用$
符号。
示例数据:
set.seed(1)
mydata <- data.frame(
Job = sample(c("Acct", "Manager"), size = 50, replace = TRUE),
Company = sample(c("Bank", "Insurance"), size = 50, replace = TRUE)
)
> head(mydata)
Job Company
1 Acct Bank
2 Acct Insurance
3 Manager Bank
4 Manager Bank
5 Acct Bank
6 Manager Bank
<强>代码强>:
count()
计算每家公司内每项工作的数量:
library(dplyr)
mydata %>%
count(Job, Company)
# A tibble: 4 x 3
Job Company n
<fctr> <fctr> <int>
1 Acct Bank 17
2 Acct Insurance 6
3 Manager Bank 12
4 Manager Insurance 15
spread()
重新排列数据框,使每个作业都在自己的列中。在这种情况下,每个公司都留在自己的行中:
library(tidyr)
mydata %>%
count(Job, Company) %>%
spread(Job, n)
# A tibble: 2 x 3
Company Acct Manager
* <fctr> <int> <int>
1 Bank 17 12
2 Insurance 6 15
如果您想计算会计/经理的比例,您可以直接这样做:
mydata %>%
count(Job, Company) %>%
spread(Job, n) %>%
mutate(p = Acct / Manager) %>%
arrange(desc(p))
# A tibble: 2 x 4
Company Acct Manager p
<fctr> <int> <int> <dbl>
1 Bank 17 12 1.42
2 Insurance 6 15 0.400