数据框中两列之间的计算

时间:2017-08-08 14:42:38

标签: r

我有一个名为cleancc的数据框,格式如下:

Education  Status
College    Default
College    No Default
HS         Default
PHD        No Default
HS         No Default
College    No Default

我想执行一些计算,根据教育水平查看默认费率。例如,像这样。

Education  Def NDef  DefRate
HS         1   1     50.00%
College    1   2     33.33%
PHD        0   1     0.00%

以下代码为我提供了每个教育级别的计数。

table(cleancc$Education)

我正在努力解决如何将这些链接到“状态”列并创建显示默认速率的表格。

2 个答案:

答案 0 :(得分:1)

我们可以使用功能强大的 dplyr 包来执行此聚合:

library(dplyr)
dat %>%
    group_by(Education) %>%
    summarise(Def = sum(Status == 'Default'),
              NDef = sum(Status != 'Default'),
              DefRate = mean(Status == 'Default'))

  Education   Def  NDef   DefRate
      <chr> <int> <int>     <dbl>
1   College     1     2 0.3333333
2        HS     1     1 0.5000000
3       PHD     0     1 0.0000000

我们也可以使用aggregate函数:

aggregate(Status ~ Education, data = dat, FUN = function(x){
    c('Def' = sum(x == 'Default'),
    'NDef' = sum(x != 'Default'),
    'DefRate' = mean(x == 'Default')
    )
})

  Education Status.Def Status.NDef Status.DefRate
1   College  1.0000000   2.0000000      0.3333333
2        HS  1.0000000   1.0000000      0.5000000
3       PHD  0.0000000   1.0000000      0.0000000

数据

dput(dat)
structure(list(Education = c("College", "College", "HS", "PHD", 
"HS", "College"), Status = c("Default", "No Default", "Default", 
"No Default", "No Default", "No Default")), .Names = c("Education", 
"Status"), row.names = c(NA, -6L), class = "data.frame")

答案 1 :(得分:1)

<button onclick="reload('8IYzyTYucKQ')">Reload</button> <-- Added the parameter to what the `data-video` should be changed