计算最高比率

时间:2018-02-02 04:20:25

标签: r probability

我试图找到具有给定值的列的最高比率。 我们假设我的数据如下:

      Job          company
   =========================
    accountant     Bank
    accountant     Insurance Co
    Manager        Bank
    Manager        Bank
    accountant     Insurance Co

如果我想找到像银行这样的特定公司的最高会计师与经理比例,如何使用分组?

我正在尝试这样的事情,但没有工作,

MyData %>%
  count( MyData$Job,MyData$company) %>%
  group_by(MyData$Job) %>%
  mutate(prop = MyData$Job[accountant] / sum(MyData$Job[accountant])) %>%
  spread(key = company[bank], value = prop)

1 个答案:

答案 0 :(得分:1)

count()group_by() + tally() + ungroup()的包装器。否则,根据您的问题,听起来好像您再次需要group_by()

此外,您可以在此处直接引用变量名称,而不使用$符号。

示例数据

set.seed(1)
mydata <- data.frame(
  Job = sample(c("Acct", "Manager"), size = 50, replace = TRUE),
  Company = sample(c("Bank", "Insurance"), size = 50, replace = TRUE)
)

> head(mydata)
      Job   Company
1    Acct      Bank
2    Acct Insurance
3 Manager      Bank
4 Manager      Bank
5    Acct      Bank
6 Manager      Bank

<强>代码

count()计算每家公司内每项工作的数量:

library(dplyr)

mydata %>%
  count(Job, Company)

# A tibble: 4 x 3
  Job     Company       n
  <fctr>  <fctr>    <int>
1 Acct    Bank         17
2 Acct    Insurance     6
3 Manager Bank         12
4 Manager Insurance    15

spread()重新排列数据框,使每个作业都在自己的列中。在这种情况下,每个公司都留在自己的行中:

library(tidyr)

mydata %>%
  count(Job, Company) %>%
  spread(Job, n)

# A tibble: 2 x 3
  Company    Acct Manager
* <fctr>    <int>   <int>
1 Bank         17      12
2 Insurance     6      15

如果您想计算会计/经理的比例,您可以直接这样做:

mydata %>%
  count(Job, Company) %>%
  spread(Job, n) %>%
  mutate(p = Acct / Manager) %>%
  arrange(desc(p))

# A tibble: 2 x 4
  Company    Acct Manager     p
  <fctr>    <int>   <int> <dbl>
1 Bank         17      12 1.42 
2 Insurance     6      15 0.400