用lapply(SD)解析参数

时间:2019-04-19 13:00:45

标签: r data.table

我想使用.SDdata.table中的一组变量应用用户定义的函数。我正在应用的函数需要检索所选变量的参数名称,并将其转换为字符对象。我认为这就是问题的根源。

这是用户定义的函数:

qcut <- function(variable, k){
  k <- k[deparse(substitute(variable))]
  breaksV <- quantile(variable, 
                      probs = (0 : k) / k)
  labelsV <- vapply(1:(length(breaksV) - 1), 
                    function(x) paste0(breaksV[x], ' : ', breaksV[x + 1]),
                    FUN.VALUE = character(1))
  cut(variable, 
      breaks = breaksV,
      labels = labelsV,
      include.lowest = TRUE)
}

我想按如下所示将其应用于data.table:

library('data.table')
k <- c('x' = 4, 'y' = 4, 'z' = 4)
testDT[, lapply(.SD, function(v) qcut(variable = v, k = k)),
       .SDcols = c('x', 'y', 'z')]

但是,我收到以下错误消息

Error in 0:k : NA/NaN argument 

这是赔率:

structure(list(ID = c("GRWJ", "JEAT", "OYZY", "XXTR", "FYHS", 
"XSRW", "YJJS", "RUYW", "QAIL", "BYAR", "FZJD", "EJKT", "RTJB", 
"JUYH", "USJK", "MMOY", "SMYZ", "ZIXB", "JSGP", "UVSA", "YLJO", 
"FNOC", "QRTQ", "DDVV", "GIWJ", "KKAD", "ACFC", "JYJJ", "WVHZ", 
"IWSN", "MYSI", "PBXI", "MJVJ", "ENUA", "VCKA", "RUOW", "UTBK", 
"CBWM", "SMYK", "KXNS", "VBYK", "QHDN", "UNGA", "OPMH", "NGMZ", 
"ULLY", "AJBY", "MYQU", "TDST", "SBJB"), x = c(3.1e-05, 0.044495, 
0.82244, 0.322291, 0.393595, 0.309097, 0.826368, 0.729424, 0.317649, 
0.599793, 0.647603, 0.547048, 0.529873, 0.90804, 0.835195, 0.068696, 
0.984329, 0.945783, 0.017137, 0.772506, 0.49308, 0.919386, 0.964342, 
0.864672, 0.786249, 0.123862, 0.990535, 0.455714, 0.345516, 0.482433, 
0.0631, 0.494563, 0.386052, 0.156384, 0.99985, 0.585455, 0.361887, 
0.350248, 0.126752, 0.812634, 0.369723, 0.437286, 0.771568, 0.697878, 
0.826174, 0.530293, 0.968455, 0.415824, 0.793458, 0.622709), 
    y = c(0.000183, 0.155732, 0.873416, 0.648545, 0.826873, 0.92659, 
    0.30854, 0.741526, 0.393468, 0.846041, 0.281525, 0.94879, 
    0.348011, 0.013456, 0.814513, 0.275943, 0.927687, 0.689675, 
    0.166494, 0.282393, 0.943686, 0.618783, 0.025198, 0.711721, 
    0.961377, 0.810826, 0.706806, 0.020492, 0.800801, 0.160464, 
    0.488463, 0.180498, 0.482467, 0.276557, 0.198618, 0.129442, 
    0.743469, 0.897698, 0.190162, 0.245063, 0.248908, 0.268675, 
    0.821389, 0.217688, 0.623633, 0.852871, 0.569763, 0.696233, 
    0.429293, 0.75561), z = c(0.000824, 0.533939, 0.838542, 0.990648, 
    0.418881, 0.777664, 0.413932, 0.884338, 0.501968, 0.678107, 
    0.860718, 0.769314, 0.319211, 0.90838, 0.370327, 0.037394, 
    0.707165, 0.626002, 0.844727, 0.741801, 0.224398, 0.438229, 
    0.47211, 0.488282, 0.692023, 0.750198, 0.326013, 0.021528, 
    0.695158, 0.620887, 0.36288, 0.631916, 0.420333, 0.251881, 
    0.193051, 0.507559, 0.203826, 0.233957, 0.000203, 0.156666, 
    0.165943, 0.676477, 0.984216, 0.025225, 0.306233, 0.344595, 
    0.702484, 0.434983, 0.434638, 0.929275)), class = c("data.table", 
"data.frame"), row.names = c(NA, -50L), .internal.selfref = <pointer: 0x000002a000eb1ef0>)

这似乎是.SD实现所特有的问题,因为当我按如下所述将函数分别应用于每个变量时,它可以正常工作:

testDT[, qcut(x, k = k)]
testDT[, qcut(y, k = k)]
testDT[, qcut(z, k = k)]

^这些都可以正常工作。

0 个答案:

没有答案