我正在使用包rfe
中的caret
进行功能选择,以进行线性回归。
我的一个回归量是一个逻辑变量,当我用这个变量进行特征选择时,我总是如此
得到Error in { : task 1 failed - "undefined columns selected"
。
如何使用rfe
使用逻辑变量进行特征选择?
是否有必要将其转换为0,1?
以下是一个可重现的例子:
library(caret)
x <- mtcars[-1]
y <- mtcars$mpg
set.seed(2017)
ctrl <- rfeControl(functions = lmFuncs,
method = "repeatedcv",
repeats = 5,
verbose = FALSE)
lmProfile1 <- rfe(x, y, sizes = 1:5, rfeControl = ctrl)
# > lmProfile1
#
# Recursive feature selection
#
# Outer resampling method: Cross-Validated (10 fold, repeated 5 times)
#
# Resampling performance over subset size:
#
# Variables RMSE Rsquared RMSESD RsquaredSD Selected
# 1 3.503 0.8338 1.627 0.2393
# 2 3.197 0.8841 1.347 0.1783
# 3 3.214 0.8788 1.327 0.1815
# 4 3.050 0.8861 1.341 0.1603 *
# 5 3.063 0.8842 1.254 0.1670
# 10 3.332 0.8638 1.404 0.1926
#
# The top 4 variables (out of 4):
# wt, am, qsec, hp
# am is one of the best features, now I turn it into a logic variable
x <- mtcars[-1]
x$am <- x$am == 1
y <- mtcars$mpg
set.seed(2017)
ctrl <- rfeControl(functions = lmFuncs,
method = "repeatedcv",
repeats = 5,
verbose = FALSE)
lmProfile2 <- rfe(x, y, sizes = 1:5, rfeControl = ctrl)
# Error in { : task 1 failed - "undefined columns selected"
# > packageVersion('caret')
# [1] ‘6.0.73’