用于将data.table中的连续变量分组的功能

时间:2019-08-23 09:53:04

标签: r data.table grouping

我正在尝试创建一个函数来对R中的data.table中的变量进行分组,这是我到目前为止的尝试:

fun_group = function(data,col_a,lower,upper,by){
  data2 <- data.table(data)

  data2[, .SD, .SDcols = c(col_a)]

  #function to make categories:
  fun_cat_var <- function(x, lower = 0, upper, by = 10,
                          sep = "-", above.char = "") {

    x[x<lower] <- lower

    labs <- c(paste(seq(lower, upper - by, by = by)),
              paste(upper, above.char, sep = ""))

    cut(floor(x), breaks = c(seq(lower, upper, by = by), Inf),
        right = FALSE, labels = labs)
  }


  data2[, ("grp") := lapply(.SD, fun_cat_var), .SDcols = c(predictor,lower,upper,by)]
}

问题是我不确定语法的工作方式吗,我应该在哪里放置下,上和按参数?此函数给我错误:

Error in `[.data.table`(data2, , `:=`(("grp"), lapply(.SD, fun_cat_var)),  : 
  Some items of .SDcols are not column names: [100, 200, 10]

,当我运行

fun_group(mtcars,"hp")

1 个答案:

答案 0 :(得分:1)

评论太久。您在寻找吗?

fun_group = function(data, col_a, lower, upper, by){
    data2 <- data.table(data)

    #function to make categories:
    fun_cat_var <- function(x, lower = 0, upper, by = 10,
        sep = "-", above.char = "") {

        x[x<lower] <- lower

        labs <- c(paste(seq(lower, upper - by, by = by)),
            paste(upper, above.char, sep = ""))

        cut(floor(x), breaks = c(seq(lower, upper, by = by), Inf),
            right=FALSE, labels=labs)
    }

    data2[, paste0("OUT_", col_a) := lapply(.SD, fun_cat_var, lower=lower, upper=upper, by=by), .SDcols=col_a]
}
fun_group(mtcars, "hp", 0, 400, 100)

如果是的话,您可能想关闭该问​​题,因为它只是一些语法问题。