Question

最好用一个例子

解释这个问题

考虑这个数据

library(data.table)
dt1 <- data.table(id = rep(1L:3L, each = 3),
                  val = rep(1L:3L, times = 3))

## The 'val' column is an integer
str(dt1)

# Classes ‘data.table’ and 'data.frame':    9 obs. of  2 variables:
# $ id : int  1 1 1 2 2 2 3 3 3
# $ val: int  1 2 3 1 2 3 1 2 3
# - attr(*, ".internal.selfref")=<externalptr>

如果我们将val列除以val列的最大值，则会将其强制为数字 - 没有问题

dt1[, val := val / max(val)][]
#     id       val
# 1:  1 0.3333333
# 2:  1 0.6666667
# ...

str(dt1)
# Classes ‘data.table’ and 'data.frame':    9 obs. of  2 variables:
# $ id : int  1 1 1 2 2 2 3 3 3
# $ val: num  0.333 0.667 1 0.333 0.667 ...
# - attr(*, ".internal.selfref")=<externalptr>

再次加载相同的数据，这次由组执行相同的计算会导致'强制'错误

dt1 <- data.table(id = rep(1L:3L, each = 3),
                  val = rep(1L:3L, times = 3))

dt1[, val := val / max(val), by = id][]
# Error in `[.data.table`(dt1, , `:=`(val, val/max(val)), by = id) : 
#   Type of RHS ('double') must match LHS ('integer'). To check and coerce would 
#   impact performance too much for the fastest cases. Either change the type of 
#   the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)

我知道在更新列

之前，解决方法是convert the class of the column

dt1[, val := as.double(val)][, val := val / max(val), by = id][]

但是，为什么在不使用组时允许更新，但在包含组时会引发错误？

r - 按组更新列时的data.table强制错误

0 个答案: