data.table:使用lapply和.SD创建多个列

时间:2019-02-14 17:18:53

标签: r data.table

我试图在scale()的多个列上应用data.table函数以定义新列。我收到以下错误:

dt = data.table( id = rep( 1:10, each = 10 ), 
             A = rnorm( 100, 1, 2 ), 
             B = runif( 100, 0, 1 ),
             C = rnorm( 100, 10, 20 ) )


cols_to_use    = c( "A", "B", "C" )
cols_to_define = paste0( cols_to_use, "_std" )

# working
dt[ , ( cols_to_define ) := lapply( .SD, scale ), .SDcols = cols_to_use ]

# not working
dt[ , ( cols_to_define ) := lapply( .SD, scale ), by = id, .SDcols = cols_to_use ]
## Error in `[.data.table`(dt, , `:=`((cols_to_define), lapply(.SD, scale)),  : 
## All items in j=list(...) should be atomic vectors or lists. 
## If you are trying something like j=list(.SD,newcol=mean(colA)) then
## use := by group instead (much quicker), or cbind or merge afterwards.

有什么想法为什么在删除by操作时能起作用?

1 个答案:

答案 0 :(得分:2)

问题是scale的with输出,它是matrix

dim(scale(dt$A))
#[1] 100   1

因此,我们需要通过删除vector属性将其更改为dimas.vectorc都可以做到

dt[ , ( cols_to_define ) := lapply( .SD, function(x) 
          c(scale(x)) ), by = id, .SDcols = cols_to_use ]

当没有by时,matrix dim属性将被删除,同时保留其他属性。