Question

我试图在scale()的多个列上应用data.table函数以定义新列。我收到以下错误：

dt = data.table( id = rep( 1:10, each = 10 ), 
             A = rnorm( 100, 1, 2 ), 
             B = runif( 100, 0, 1 ),
             C = rnorm( 100, 10, 20 ) )


cols_to_use    = c( "A", "B", "C" )
cols_to_define = paste0( cols_to_use, "_std" )

# working
dt[ , ( cols_to_define ) := lapply( .SD, scale ), .SDcols = cols_to_use ]

# not working
dt[ , ( cols_to_define ) := lapply( .SD, scale ), by = id, .SDcols = cols_to_use ]
## Error in `[.data.table`(dt, , `:=`((cols_to_define), lapply(.SD, scale)),  : 
## All items in j=list(...) should be atomic vectors or lists. 
## If you are trying something like j=list(.SD,newcol=mean(colA)) then
## use := by group instead (much quicker), or cbind or merge afterwards.

有什么想法为什么在删除by操作时能起作用？

Answer 1

问题是scale的with输出，它是matrix

dim(scale(dt$A))
#[1] 100   1

因此，我们需要通过删除vector属性将其更改为dim。 as.vector或c都可以做到

dt[ , ( cols_to_define ) := lapply( .SD, function(x) 
          c(scale(x)) ), by = id, .SDcols = cols_to_use ]

当没有by时，matrix dim属性将被删除，同时保留其他属性。

data.table：使用lapply和.SD创建多个列

1 个答案: