Question

我有以下数据框：

[]   Group  State       County     Deaths

[1]  01     Nicaragua   County A   0 
[2]  01     Nicaragua   County B   13  
[3]  01     Nicaragua   County C   0
[4]  02     Mexico      County D   0 
[5]  02     Mexico      County F   4  
[6]  02     Mexico      County E   0

我想从同一组（其中死亡为0 ）中计算所有案例，然后将进度条添加为新列。理想的结果是这样的：

[]   Group  State       County     Deaths  Counties.without.Deaths  

[1]  01     Nicaragua   County A   0       2
[2]  01     Nicaragua   County B   13      2
[3]  01     Nicaragua   County C   0       2
[4]  02     Mexico      County D   0       3
[5]  02     Mexico      County F   0       3  
[6]  02     Mexico      County E   0       3

是否有特定功能？我尝试使用循环，但是作为一个初学者，失败了。感谢您的帮助！

Answer 1

类似的东西：

library(dplyr)

df <- df %>%
  group_by(Group) %>%
  mutate(Counties.without.Deaths = sum(Deaths == 0))

您也可以使用sum代替length(Deaths[Deaths == 0])，但是它可能会稍微慢一些。

您也可以在base中完成此操作，而无需其他软件包；这将是最快的选择：

df$Counties.without.Deaths <- with(df, ave(Deaths, Group, FUN = function(x) sum(x == 0)))

一个快速的基准测试表明，base选项的速度几乎可以提高10倍：

Unit: microseconds
  expr      min        lq      mean    median       uq      max neval
 dplyr 1056.020 1091.3915 1267.1185 1121.2920 1318.019 2294.364   100
  base  113.771  132.9145  182.4703  149.6885  170.291 2769.136   100

dplyr和base的输出：

  Group     State   County Deaths Counties.without.Deaths
1     1 Nicaragua County A      0                       2
2     1 Nicaragua County B     13                       2
3     1 Nicaragua County C      0                       2
4     2    Mexico County D      0                       3
5     2    Mexico County F      0                       3
6     2    Mexico County E      0                       3

Answer 2

merge(df, aggregate(Deaths ~ Group, df, FUN = function(x) sum(x == 0)), by = "Group", suffixes = c("", "counties.without"))

  Group     State   County Deaths Deathscounties.without
1     1 Nicaragua County A      0                      2
2     1 Nicaragua County B     13                      2
3     1 Nicaragua County C      0                      2
4     2    Mexico County D      0                      3
5     2    Mexico County F      0                      3
6     2    Mexico County E      0                      3

数据：

df <- structure(list(Group = c(1L, 1L, 1L, 2L, 2L, 2L), State = c("Nicaragua", 
"Nicaragua", "Nicaragua", "Mexico", "Mexico", "Mexico"), County = c("County A", 
"County B", "County C", "County D", "County F", "County E"), 
    Deaths = c(0L, 13L, 0L, 0L, 0L, 0L)), row.names = c(NA, -6L
), class = "data.frame")

条件计数并将计数添加到列

2 个答案: