最好用一个例子
解释这个问题考虑这个数据
library(data.table)
dt1 <- data.table(id = rep(1L:3L, each = 3),
val = rep(1L:3L, times = 3))
## The 'val' column is an integer
str(dt1)
# Classes ‘data.table’ and 'data.frame': 9 obs. of 2 variables:
# $ id : int 1 1 1 2 2 2 3 3 3
# $ val: int 1 2 3 1 2 3 1 2 3
# - attr(*, ".internal.selfref")=<externalptr>
如果我们将val
列除以val
列的最大值,则会将其强制为数字 - 没有问题
dt1[, val := val / max(val)][]
# id val
# 1: 1 0.3333333
# 2: 1 0.6666667
# ...
str(dt1)
# Classes ‘data.table’ and 'data.frame': 9 obs. of 2 variables:
# $ id : int 1 1 1 2 2 2 3 3 3
# $ val: num 0.333 0.667 1 0.333 0.667 ...
# - attr(*, ".internal.selfref")=<externalptr>
再次加载相同的数据,这次由组执行相同的计算会导致'强制'错误
dt1 <- data.table(id = rep(1L:3L, each = 3),
val = rep(1L:3L, times = 3))
dt1[, val := val / max(val), by = id][]
# Error in `[.data.table`(dt1, , `:=`(val, val/max(val)), by = id) :
# Type of RHS ('double') must match LHS ('integer'). To check and coerce would
# impact performance too much for the fastest cases. Either change the type of
# the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)
我知道在更新列
之前,解决方法是convert the class of the columndt1[, val := as.double(val)][, val := val / max(val), by = id][]
但是,为什么在不使用组时允许更新,但在包含组时会引发错误?