R数据表条件计算

时间:2019-07-17 09:44:04

标签: r data.table

我想对每个ID的flag ='Y'值求和并将其保存在新变量'sm'中,如何使用data.table做到这一点?

data <- data.table(id=rep(c(1,2,3),each=4), value=c(12, 10, 17, 19, 21, 22, 34, 18, 14, 12, 32, 18),flag=c(NA,'Y','Y',NA,'Y',NA,NA,NA,'Y',NA,'Y',NA))



 id value flag
 1:  1    12 <NA>
 2:  1    10    Y
 3:  1    17    Y
 4:  1    19 <NA>
 5:  2    21    Y
 6:  2    22 <NA>
 7:  2    34 <NA>
 8:  2    18 <NA>
 9:  3    14    Y
10:  3    12 <NA>
11:  3    32    Y
12:  3    18 <NA>

我想看这个:

       id value flag sm
 1:  1    12 <NA> 27
 2:  1    10    Y 27
 3:  1    17    Y 27
 4:  1    19 <NA> 27
 5:  2    21    Y 21
 6:  2    22 <NA> 21
 7:  2    34 <NA> 21
 8:  2    18 <NA> 21
 9:  3    14    Y 46
10:  3    12 <NA> 46
11:  3    32    Y 46
12:  3    18 <NA> 46

2 个答案:

答案 0 :(得分:2)

使用data.table的联接语法:

data[data[!is.na(flag) & flag == "Y", .(sm = sum(value)), by = id], on = "id"]

答案 1 :(得分:1)

我们可以sum value,其中flag == "Y"每个id

library(data.table)
data[, sm := sum(value[flag == "Y"], na.rm = TRUE), by = id]

data
#    id value flag sm
# 1:  1    12 <NA> 27
# 2:  1    10    Y 27
# 3:  1    17    Y 27
# 4:  1    19 <NA> 27
# 5:  2    21    Y 21
# 6:  2    22 <NA> 21
# 7:  2    34 <NA> 21
# 8:  2    18 <NA> 21
# 9:  3    14    Y 46
#10:  3    12 <NA> 46
#11:  3    32    Y 46
#12:  3    18 <NA> 46

或使用dplyr

library(dplyr)
data %>%
  group_by(id) %>%
  mutate(sm = sum(value[flag == "Y"], na.rm = TRUE))