我想使用.SD
对data.table
中的一组变量应用用户定义的函数。我正在应用的函数需要检索所选变量的参数名称,并将其转换为字符对象。我认为这就是问题的根源。
这是用户定义的函数:
qcut <- function(variable, k){
k <- k[deparse(substitute(variable))]
breaksV <- quantile(variable,
probs = (0 : k) / k)
labelsV <- vapply(1:(length(breaksV) - 1),
function(x) paste0(breaksV[x], ' : ', breaksV[x + 1]),
FUN.VALUE = character(1))
cut(variable,
breaks = breaksV,
labels = labelsV,
include.lowest = TRUE)
}
我想按如下所示将其应用于data.table:
library('data.table')
k <- c('x' = 4, 'y' = 4, 'z' = 4)
testDT[, lapply(.SD, function(v) qcut(variable = v, k = k)),
.SDcols = c('x', 'y', 'z')]
但是,我收到以下错误消息
Error in 0:k : NA/NaN argument
这是赔率:
structure(list(ID = c("GRWJ", "JEAT", "OYZY", "XXTR", "FYHS",
"XSRW", "YJJS", "RUYW", "QAIL", "BYAR", "FZJD", "EJKT", "RTJB",
"JUYH", "USJK", "MMOY", "SMYZ", "ZIXB", "JSGP", "UVSA", "YLJO",
"FNOC", "QRTQ", "DDVV", "GIWJ", "KKAD", "ACFC", "JYJJ", "WVHZ",
"IWSN", "MYSI", "PBXI", "MJVJ", "ENUA", "VCKA", "RUOW", "UTBK",
"CBWM", "SMYK", "KXNS", "VBYK", "QHDN", "UNGA", "OPMH", "NGMZ",
"ULLY", "AJBY", "MYQU", "TDST", "SBJB"), x = c(3.1e-05, 0.044495,
0.82244, 0.322291, 0.393595, 0.309097, 0.826368, 0.729424, 0.317649,
0.599793, 0.647603, 0.547048, 0.529873, 0.90804, 0.835195, 0.068696,
0.984329, 0.945783, 0.017137, 0.772506, 0.49308, 0.919386, 0.964342,
0.864672, 0.786249, 0.123862, 0.990535, 0.455714, 0.345516, 0.482433,
0.0631, 0.494563, 0.386052, 0.156384, 0.99985, 0.585455, 0.361887,
0.350248, 0.126752, 0.812634, 0.369723, 0.437286, 0.771568, 0.697878,
0.826174, 0.530293, 0.968455, 0.415824, 0.793458, 0.622709),
y = c(0.000183, 0.155732, 0.873416, 0.648545, 0.826873, 0.92659,
0.30854, 0.741526, 0.393468, 0.846041, 0.281525, 0.94879,
0.348011, 0.013456, 0.814513, 0.275943, 0.927687, 0.689675,
0.166494, 0.282393, 0.943686, 0.618783, 0.025198, 0.711721,
0.961377, 0.810826, 0.706806, 0.020492, 0.800801, 0.160464,
0.488463, 0.180498, 0.482467, 0.276557, 0.198618, 0.129442,
0.743469, 0.897698, 0.190162, 0.245063, 0.248908, 0.268675,
0.821389, 0.217688, 0.623633, 0.852871, 0.569763, 0.696233,
0.429293, 0.75561), z = c(0.000824, 0.533939, 0.838542, 0.990648,
0.418881, 0.777664, 0.413932, 0.884338, 0.501968, 0.678107,
0.860718, 0.769314, 0.319211, 0.90838, 0.370327, 0.037394,
0.707165, 0.626002, 0.844727, 0.741801, 0.224398, 0.438229,
0.47211, 0.488282, 0.692023, 0.750198, 0.326013, 0.021528,
0.695158, 0.620887, 0.36288, 0.631916, 0.420333, 0.251881,
0.193051, 0.507559, 0.203826, 0.233957, 0.000203, 0.156666,
0.165943, 0.676477, 0.984216, 0.025225, 0.306233, 0.344595,
0.702484, 0.434983, 0.434638, 0.929275)), class = c("data.table",
"data.frame"), row.names = c(NA, -50L), .internal.selfref = <pointer: 0x000002a000eb1ef0>)
这似乎是.SD
实现所特有的问题,因为当我按如下所述将函数分别应用于每个变量时,它可以正常工作:
testDT[, qcut(x, k = k)]
testDT[, qcut(y, k = k)]
testDT[, qcut(z, k = k)]
^这些都可以正常工作。