我有这样的数据:
library(data.table)
group <- c("a","a","a","b","b","b")
cond <- c("N","Y","N","Y","Y","N")
value <- c(2,1,3,4,2,5)
dt <- data.table(group, cond, value)
group cond value
a N 2
a Y 1
a N 3
b Y 4
b Y 2
b N 5
当整个组的cond为Y时,我想返回最大值。像这样:
group cond value max
a N 2 1
a Y 1 1
a N 3 1
b Y 4 4
b Y 2 4
b N 5 4
我尝试将ifelse条件添加到分组的max中,但是,当行不满足条件时,我最终只是返回NA的no条件:
dt[, max := ifelse(cond=="Y", max(value), NA), by = group]
答案 0 :(得分:2)
你可以做...
dt[CJ(group = group, cond = "Y", unique=TRUE), on=.(group, cond),
.(mv = max(value))
, by=.EACHI]
# group cond mv
# 1: a Y 1
# 2: b Y 4
使用像max
优化这样的联接will eventually have。
另一种方式(最初包含在@akrun的答案中):
dt[cond == "Y", mv := max(value), by=group]
在上一个链接中,我们可以看到除the :=
part以外,这种方式已经得到优化。
答案 1 :(得分:2)
假设对于每个“组”,我们需要获取“值”的max
,其中“ cond”为“ Y”,按“组”分组后,将“值”与逻辑条件进行子集化(cond == 'Y'
)并获得max
值
dt[, max := max(value[cond == 'Y']), by = group]
dt
# group cond value max
#1: a N 2 1
#2: a Y 1 1
#3: a N 3 1
#4: b Y 4 4
#5: b Y 2 4
#6: b N 5 4