data.table-使用返回矩阵的自定义函数分组

时间:2019-07-06 11:16:48

标签: r data.table

我在data.table以下

library(data.table)
DT = as.data.table(data.frame(Z=c("abc","abc","def","abc"), column=c(1,2,3,4), someOtherColumn=c(5,6,7,8)))

Fn = function(DT1) {
        Value = as.numeric(DT1[1, 2])
        Calc = sapply(DT1[, c("Z"):=NULL], sum) - Value
        return(matrix(Calc, nr = 1, nc = length(Calc)))
    }

现在,我想将Fn()应用于由'Z'组成的每个组,并得到具有2行(因为DT$Z中有2个唯一成员)和2行的结果矩阵

DT[, Fn(.SD), by = Z, .SDcols = c('Z', 'column', 'someOtherColumn')]

但是与此同时,我得到了错误

Error in `[.data.table`(DT1, , `:=`(c("Z"), NULL)) : 
  .SD is locked. Using := in .SD's j is reserved for possible future use; a tortuously flexible way to modify by group. Use := in j directly to modify by group by reference.

我可以申请lapply()来达到以下目标

do.call(rbind, lapply(split(DT, DT[['Z']]), Fn))

任何指向实现此目标的正确方法的指针都会有所帮助。

我有一个很大的DT, so am looking for some efficient method.

1 个答案:

答案 0 :(得分:1)

我试图修复代码以使其运行-我不是data.table专家,所以我无法深入了解其工作原理。也许这就是你所追求的。

我认为Fn不能返回矩阵,因为'j'必须是列表或原子向量。

 Fn = function(DT1) {
  Value = as.numeric(DT1[1,  2])
  Calc = DT1[, lapply(.SD, sum) , .SDcols = -"Z"] - Value
  list(matrix(Calc, nrow = 1, ncol = length(Calc)))
}

out <- DT[, .(Fn(.SD)), by = Z, .SDcols = c("Z", "column", "someOtherColumn")]

> out
# Z       V1
# 1: abc <matrix>
# 2: def <matrix>

#  b$V1
# [[1]]
# [,1] [,2]
# [1,] 6    18  
# 
# [[2]]
# [,1] [,2]
# [1,] 0    4