使用data.table和过滤时,我得到了错误的结果。
我在这里提供一个可复制的示例来查看问题:
set.seed(123)
results <- data.table(id = 1:1000000, v1 = rnorm(1000000))
rs <- data.table(type = 1:1000000 %% 5, prob = rnorm(1000000))
results[, `:=`(cls = rs$type, cls.prob = rs$prob)]
results[cls != 0 & abs(v1 - cls.prob) < 0.3, `:=`(cls = 3, cls.prob = 1)]
summary(results[cls == 4, cls])
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4 4 4 4 4 4
results[cls != 0 & abs(v1 - cls.prob) < 0.2, `:=`(cls = 3, cls.prob = 1)]
summary(results[cls == 4, cls])
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4 4 4 4 4 4
results[cls != 0 & abs(v1 - cls.prob) < 0.5, `:=`(cls = 3, cls.prob = 1)]
summary(results[cls == 4, cls])
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 3.00 4.00 4.00 3.87 4.00 4.00
如您所见,最后一个摘要应在每个广告位中输出4。
有人知道发生了什么事吗?
我正在开发此版本的R
>version
_
platform x86_64-suse-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 1.3
year 2015
month 03
day 09
svn rev 67962
language R
version.string R version 3.1.3 (2015-03-09)
nickname Smooth Sidewalk
和此版本的data.table:
> packageVersion('data.table')
[1] ‘1.9.4’