条件计数并将计数添加到列

时间:2019-02-05 14:24:12

标签: r

我有以下数据框:

[]   Group  State       County     Deaths

[1]  01     Nicaragua   County A   0 
[2]  01     Nicaragua   County B   13  
[3]  01     Nicaragua   County C   0
[4]  02     Mexico      County D   0 
[5]  02     Mexico      County F   4  
[6]  02     Mexico      County E   0        

我想从同一组(其中死亡为0 )中计算所有案例,然后将进度条添加为新列。理想的结果是这样的:

[]   Group  State       County     Deaths  Counties.without.Deaths  

[1]  01     Nicaragua   County A   0       2
[2]  01     Nicaragua   County B   13      2
[3]  01     Nicaragua   County C   0       2
[4]  02     Mexico      County D   0       3
[5]  02     Mexico      County F   0       3  
[6]  02     Mexico      County E   0       3  

是否有特定功能?我尝试使用循环,但是作为一个初学者,失败了。感谢您的帮助!

2 个答案:

答案 0 :(得分:0)

类似的东西:

library(dplyr)

df <- df %>%
  group_by(Group) %>%
  mutate(Counties.without.Deaths = sum(Deaths == 0))

您也可以使用sum代替length(Deaths[Deaths == 0]),但是它可能会稍微慢一些。

您也可以在base中完成此操作,而无需其他软件包;这将是最快的选择:

df$Counties.without.Deaths <- with(df, ave(Deaths, Group, FUN = function(x) sum(x == 0)))

一个快速的基准测试表明,base选项的速度几乎可以提高10倍:

Unit: microseconds
  expr      min        lq      mean    median       uq      max neval
 dplyr 1056.020 1091.3915 1267.1185 1121.2920 1318.019 2294.364   100
  base  113.771  132.9145  182.4703  149.6885  170.291 2769.136   100

dplyrbase的输出:

  Group     State   County Deaths Counties.without.Deaths
1     1 Nicaragua County A      0                       2
2     1 Nicaragua County B     13                       2
3     1 Nicaragua County C      0                       2
4     2    Mexico County D      0                       3
5     2    Mexico County F      0                       3
6     2    Mexico County E      0                       3

答案 1 :(得分:0)

merge(df, aggregate(Deaths ~ Group, df, FUN = function(x) sum(x == 0)), by = "Group", suffixes = c("", "counties.without"))

  Group     State   County Deaths Deathscounties.without
1     1 Nicaragua County A      0                      2
2     1 Nicaragua County B     13                      2
3     1 Nicaragua County C      0                      2
4     2    Mexico County D      0                      3
5     2    Mexico County F      0                      3
6     2    Mexico County E      0                      3

数据

df <- structure(list(Group = c(1L, 1L, 1L, 2L, 2L, 2L), State = c("Nicaragua", 
"Nicaragua", "Nicaragua", "Mexico", "Mexico", "Mexico"), County = c("County A", 
"County B", "County C", "County D", "County F", "County E"), 
    Deaths = c(0L, 13L, 0L, 0L, 0L, 0L)), row.names = c(NA, -6L
), class = "data.frame")