尝试以编程方式使用data.table时,特定方案中的“级别1没有此类索引”错误

时间:2016-05-27 10:02:06

标签: r data.table subset

问题

我编写了一个以编程方式使用data.table的函数。功能如下

transformVariables4              <- function(df_1n_data,
                                             c_1n_variablesToTransform,
                                             c_1n_newVariableNames,
                                             f_01_functionToTransform,
                                             ...) {

  for (i in 1:length(c_1n_variablesToTransform)) {
    df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))]
  }

  return(df_1n_data)
}

该功能适用​​于此场景

df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2)
df4 <- transformVariables4(
  df_1n_data = data.table(df),
  c_1n_variablesToTransform = "e",
  c_1n_newVariableNames = "new",
  f_01_functionToTransform = sum,
  na.rm = TRUE
)

但不适用于以下情况

df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2, g = 3, h = 4, i = 5, j = 6, k = 7, l = 8)
df4 <- transformVariables4(
  df_1n_data = data.table(df),
  c_1n_variablesToTransform = "e",
  c_1n_newVariableNames = "new",
  f_01_functionToTransform = sum,
  na.rm = TRUE
)

它抛出一个错误说

  

.subset2(x,i,exact = exact)中的错误:1级没有这样的索引

两种情况之间的唯一区别是,在第二种情况下,数据包含更多列

我正在试图弄清楚问题可能是什么并修复它。这需要一点时间。 如果还有其他一些方法可以让我快速完成这项工作,那就太棒了:)

到目前为止我想到的

我试过调试它。下面是回溯输出的一部分

Error in .subset2(x, i, exact = exact) : no such index at level 1 

12 (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, 
    i, exact = exact))(x, ..., exact = exact) 

11 `[[.data.frame`(df_1n_data, c_1n_variablesToTransform[i]) 

10 df_1n_data[[c_1n_variablesToTransform[i]]] 

9 eval(expr, envir, enclos) 

8 eval(jsub, SDenv, parent.frame()) 

7 `[.data.table`(df_1n_data, , `:=`(c(c_1n_newVariableNames[i]), 
    list(forceAndCall(n = 1, FUN = f_01_functionToTransform, 
        df_1n_data[[c_1n_variablesToTransform[i]]], ...)))) at abcd.R#75

6 df_1n_data[, `:=`(c(c_1n_newVariableNames[i]), list(forceAndCall(n = 1, 
    FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], 
    ...)))] at abcd.R#75

5 transformVariables4(df_1n_data = data.table(df), c_1n_variablesToTransform = "e", 
    c_1n_newVariableNames = "new", f_01_functionToTransform = sum, 
    na.rm = TRUE) at abcd.R#90

顶部的代码行(由12表示)位于函数[[.data.frame的源代码中。第一个方案的代码行中i的值为

"e"

但是对于第二种情况,它是

c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_)

使.subset2(x, i, exact = exact)失败。下一步是找出这种行为的原因。

更新

找出这种行为的原因。这是因为

i的RHS上的:=
df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))]
  }

匹配数据中的列名。下一步是弄清楚为什么会发生这种情况以及正确的做法

更新

感谢Roland帮助我理解为什么会发生这种情况以及正确的做法

i问题是一个范围问题。 data.table在其搜索路径上使用第一个i,这是数据中i列的全部NA s,这反过来会导致.subset2失败。通过使用Roland解决方案中的第二个功能,正确地做我打算做的事情

1 个答案:

答案 0 :(得分:4)

我会像这样重写它:

transformVariables4              <- function(df_1n_data,
                                             c_1n_variablesToTransform,
                                             c_1n_newVariableNames,
                                             f_01_functionToTransform,
                                             ...) {

  for (i in seq_along(c_1n_variablesToTransform)) {
    var <- c_1n_variablesToTransform[i] #to force evaluation
    df_1n_data[, (c_1n_newVariableNames[i]) := f_01_functionToTransform(get(var), ...)]
  }

  df_1n_data[]
}
library(data.table)

df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2, g = 3, h = 4, i = 5, j = 6, k = 7, l = 8)
df4 <- transformVariables4(
  df_1n_data = data.table(df),
  c_1n_variablesToTransform = "e",
  c_1n_newVariableNames = "new",
  f_01_functionToTransform = sum,
  na.rm = TRUE
)

当然,更为惯用的是:

transformVariables4              <- function(df_1n_data,
                                             c_1n_variablesToTransform,
                                             c_1n_newVariableNames,
                                             f_01_functionToTransform,
                                             ...) {

  df_1n_data[, (c_1n_newVariableNames) := lapply(.SD, f_01_functionToTransform, ...), 
              .SDcols = c_1n_variablesToTransform]

  df_1n_data[]
}

我还会使用较短的参数名来提高可读性。