我编写了一个以编程方式使用data.table的函数。功能如下
transformVariables4 <- function(df_1n_data,
c_1n_variablesToTransform,
c_1n_newVariableNames,
f_01_functionToTransform,
...) {
for (i in 1:length(c_1n_variablesToTransform)) {
df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))]
}
return(df_1n_data)
}
该功能适用于此场景
df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2)
df4 <- transformVariables4(
df_1n_data = data.table(df),
c_1n_variablesToTransform = "e",
c_1n_newVariableNames = "new",
f_01_functionToTransform = sum,
na.rm = TRUE
)
但不适用于以下情况
df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2, g = 3, h = 4, i = 5, j = 6, k = 7, l = 8)
df4 <- transformVariables4(
df_1n_data = data.table(df),
c_1n_variablesToTransform = "e",
c_1n_newVariableNames = "new",
f_01_functionToTransform = sum,
na.rm = TRUE
)
它抛出一个错误说
.subset2(x,i,exact = exact)中的错误:1级没有这样的索引
两种情况之间的唯一区别是,在第二种情况下,数据包含更多列
我正在试图弄清楚问题可能是什么并修复它。这需要一点时间。 如果还有其他一些方法可以让我快速完成这项工作,那就太棒了:)
我试过调试它。下面是回溯输出的一部分
Error in .subset2(x, i, exact = exact) : no such index at level 1
12 (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x,
i, exact = exact))(x, ..., exact = exact)
11 `[[.data.frame`(df_1n_data, c_1n_variablesToTransform[i])
10 df_1n_data[[c_1n_variablesToTransform[i]]]
9 eval(expr, envir, enclos)
8 eval(jsub, SDenv, parent.frame())
7 `[.data.table`(df_1n_data, , `:=`(c(c_1n_newVariableNames[i]),
list(forceAndCall(n = 1, FUN = f_01_functionToTransform,
df_1n_data[[c_1n_variablesToTransform[i]]], ...)))) at abcd.R#75
6 df_1n_data[, `:=`(c(c_1n_newVariableNames[i]), list(forceAndCall(n = 1,
FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]],
...)))] at abcd.R#75
5 transformVariables4(df_1n_data = data.table(df), c_1n_variablesToTransform = "e",
c_1n_newVariableNames = "new", f_01_functionToTransform = sum,
na.rm = TRUE) at abcd.R#90
顶部的代码行(由12
表示)位于函数[[.data.frame
的源代码中。第一个方案的代码行中i
的值为
"e"
但是对于第二种情况,它是
c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_)
使.subset2(x, i, exact = exact)
失败。下一步是找出这种行为的原因。
更新
找出这种行为的原因。这是因为
中i
的RHS上的:=
df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))]
}
匹配数据中的列名。下一步是弄清楚为什么会发生这种情况以及正确的做法
更新
感谢Roland帮助我理解为什么会发生这种情况以及正确的做法
i
问题是一个范围问题。 data.table在其搜索路径上使用第一个i
,这是数据中i
列的全部NA
s,这反过来会导致.subset2
失败。通过使用Roland解决方案中的第二个功能,正确地做我打算做的事情
答案 0 :(得分:4)
我会像这样重写它:
transformVariables4 <- function(df_1n_data,
c_1n_variablesToTransform,
c_1n_newVariableNames,
f_01_functionToTransform,
...) {
for (i in seq_along(c_1n_variablesToTransform)) {
var <- c_1n_variablesToTransform[i] #to force evaluation
df_1n_data[, (c_1n_newVariableNames[i]) := f_01_functionToTransform(get(var), ...)]
}
df_1n_data[]
}
library(data.table)
df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2, g = 3, h = 4, i = 5, j = 6, k = 7, l = 8)
df4 <- transformVariables4(
df_1n_data = data.table(df),
c_1n_variablesToTransform = "e",
c_1n_newVariableNames = "new",
f_01_functionToTransform = sum,
na.rm = TRUE
)
当然,更为惯用的是:
transformVariables4 <- function(df_1n_data,
c_1n_variablesToTransform,
c_1n_newVariableNames,
f_01_functionToTransform,
...) {
df_1n_data[, (c_1n_newVariableNames) := lapply(.SD, f_01_functionToTransform, ...),
.SDcols = c_1n_variablesToTransform]
df_1n_data[]
}
我还会使用较短的参数名来提高可读性。