自定义ML功能不起作用:选择了未定义的列

时间:2018-06-15 03:43:41

标签: r machine-learning rlang

我正在尝试使用caTools包编写自定义函数来执行基于逻辑回归的ML,但我不断收到错误:undefined columns selected

我检查了xlearn函数的ylearnlogit_boost参数的输入,并且如文档中所述,它们分别是包含要素和标签向量的数据框。所以不确定我做错了什么。

# needed libraries
library(dplyr)
library(rlang)
library(caTools)

# function body
logit_boost <- function(data, x, y, split_size = 0.8) {
  # creating a dataframe
  data <-
    dplyr::select(.data = data,
                  !!rlang::enquo(x),
                  !!rlang::enquo(y))

  # for reproducibility
  set.seed(123)

  # creating indices to choose rows from the data
  train_indices <-
    base::sample(x = base::seq_len(length.out = nrow(data)),
                 size = floor(split_size * nrow(data)))

  # training dataset
  train <- data[train_indices, ]

  # testing dataset
  test <- data[-train_indices, ]

  # defining label column we are interested in and everything else
  label_train <-
    train %>% dplyr::select(.data = ., !!rlang::enquo(x))

  data_train <-
    train %>% dplyr::select(.data = ., -!!rlang::enquo(x))

  # training model (y ~ x)
  logit_model <-
    caTools::LogitBoost(xlearn = data_train,
                        ylearn = label_train)

  # prediction
  # stats::predict(object = logit_model, test, type = "raw")
}

logit_boost(data = mtcars, x = am, y = mpg)
#> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)): undefined columns selected

1 个答案:

答案 0 :(得分:1)

help(LogitBoost)示例部分中,Label = iris[, 5]会生成一个向量,正如ylearn LogitBoost()参数中所预期的那样。

在您的代码中,label_train <- train %>% dplyr::select(.data = ., !!rlang::enquo(x))会生成data.frame。根据设计,dplyr默认为drop = FALSE(甚至忽略参数),只选择一列。

我们可以做到:

logit_model <- caTools::LogitBoost(xlearn = data_train, ylearn = dplyr::pull(label_train))