我正在尝试使用caTools
包编写自定义函数来执行基于逻辑回归的ML,但我不断收到错误:undefined columns selected
。
我检查了xlearn
函数的ylearn
和logit_boost
参数的输入,并且如文档中所述,它们分别是包含要素和标签向量的数据框。所以不确定我做错了什么。
# needed libraries
library(dplyr)
library(rlang)
library(caTools)
# function body
logit_boost <- function(data, x, y, split_size = 0.8) {
# creating a dataframe
data <-
dplyr::select(.data = data,
!!rlang::enquo(x),
!!rlang::enquo(y))
# for reproducibility
set.seed(123)
# creating indices to choose rows from the data
train_indices <-
base::sample(x = base::seq_len(length.out = nrow(data)),
size = floor(split_size * nrow(data)))
# training dataset
train <- data[train_indices, ]
# testing dataset
test <- data[-train_indices, ]
# defining label column we are interested in and everything else
label_train <-
train %>% dplyr::select(.data = ., !!rlang::enquo(x))
data_train <-
train %>% dplyr::select(.data = ., -!!rlang::enquo(x))
# training model (y ~ x)
logit_model <-
caTools::LogitBoost(xlearn = data_train,
ylearn = label_train)
# prediction
# stats::predict(object = logit_model, test, type = "raw")
}
logit_boost(data = mtcars, x = am, y = mpg)
#> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)): undefined columns selected
答案 0 :(得分:1)
在help(LogitBoost)
示例部分中,Label = iris[, 5]
会生成一个向量,正如ylearn
LogitBoost()
参数中所预期的那样。
在您的代码中,label_train <- train %>% dplyr::select(.data = ., !!rlang::enquo(x))
会生成data.frame。根据设计,dplyr
默认为drop = FALSE
(甚至忽略参数),只选择一列。
我们可以做到:
logit_model <- caTools::LogitBoost(xlearn = data_train, ylearn = dplyr::pull(label_train))