根据其他列的条件将数值替换为NA:

时间:2013-04-02 15:44:25

标签: r data.table

我是data.table包的新手,请执行我的简单问题。我有一个看起来像DT的数据集

DT <- data.table(a = sample(c("C","M","Y","K"),  100, rep=TRUE),
                   b = sample(c("A","S"),  100, rep=TRUE),
                   f = round(rnorm(n=100, mean=.90, sd=.08),digits = 2) ); DT

如果满足某个条件,我想用NA替换列f中的任何值。例如,对于0.85 > f > 0.90,我会遇到以下情况:

DT$a == "C" & DT$b == "S" & DT$f < .85| DT$a == "C" & DT$b == "S" & DT$f >.90

我还想为a和b列中的每个分类变量设置不同的条件。

1 个答案:

答案 0 :(得分:3)

使用您已声明的条件,但没有DT$data.table对您满足条件的条目进行归类,那么您可以使用j字段分配NA值使用f运算符引用至:=。也就是说,

DT[a == "C" & b == "S" & f < .85 | a == "C" & b == "S" & f >.90, f := NA]
which(is.na(DT$f))
# [1]  3 16 31 89
在OP的评论和@Joshua的好建议之后

编辑

`%between%` <- function(x, vals) { x >= vals[1] & x <= vals[2]}
`%nbetween%` <- Negate(`%between%`)
DT[a %in% c("C", "M", "Y", "K") & b == "S" & f %nbetween% c(0.85, 0.90), f := NA]
%nbetween%的否定

%between%将给出期望的结果(f <0.85且f> 0.90)。另请注意,使用%in%检查a

的多个值

编辑2: OP完全重写之后,恐怕你无能为力,除了组b ==“A”,b ==“S”。

`%nbetween%` <- Negate(`%between%`)
DT[a == "M" & b %in% c("A", "S") & f %nbetween% c(.85, .90), f := NA]
DT[a == "Y" & b %in% c("A", "S") & f %nbetween% c(.95, .90), f := NA]
DT[a == "K" & b %in% c("A", "S") & f %nbetween% c(.95, 1.10), f := NA]