Question

我正在使用Caret的rfe进行回归应用。我的数据（data.table）有176个预测因子（包括49个因子预测因子）。当我运行该函数时，我收到此错误：

Error in { :  task 1 failed - "rfe is expecting 176 importance values but only has 2"

然后，我使用model.matrix( ~ . - 1, data = as.data.frame(train_model_sell_single_bid))将因子预测变量转换为虚拟变量。但是，我得到了类似的错误：

Error in { :  task 1 failed - "rfe is expecting 184 importance values but only has 2"

我在Windows 7（64位），Caret版本6.0-41上使用R版本3.1.1。我还安装了Revolution R Enterprise版本7.3（64位）。但是在使用R版本3.0.1和Caret版本6.0-24的Amazon EC2（c3.8xlarge）Linux实例上再现了相同的错误。

使用的数据集（重现我的错误）：

https://www.dropbox.com/s/utuk9bpxl2996dy/train_model_sell_single_bid.RData?dl=0 https://www.dropbox.com/s/s9xcgfit3iqjffp/train_model_bid_outcomes_sell_single.RData?dl=0

我的代码：

library(caret)
library(data.table)
library(bit64)
library(doMC)

load("train_model_sell_single_bid.RData")
load("train_model_bid_outcomes_sell_single.RData")

subsets <- seq(from = 4, to = 184, by= 4)

registerDoMC(cores = 32)

set.seed(1015498)
ctrl <- rfeControl(functions = lmFuncs,
                   method = "repeatedcv",
                   repeats = 1,
                   #saveDetails = TRUE,
                   verbose = FALSE)

x <- as.data.frame(train_model_sell_single_bid[,!"security_id", with=FALSE])
y <- train_model_bid_outcomes_sell_single[,bid100]

lmProfile_single_bid100 <- rfe(x, y,
                               sizes = subsets,
                               preProc = c("center", "scale"),
                               rfeControl = ctrl)

Answer 1

您似乎可能有高度相关的预测因子在选择特征之前，您应该运行：

crrltn = findCorrelation(correlations, cutoff = .90)
if (length(crrltn) != 0)
  x <- x[,-crrltn]

如果此后问题仍然存在，则可能与自动生成的折叠中预测变量的高度相关性有关，您可以尝试使用以下方法控制生成的折叠：

set.seed(12213)
index <- createFolds(y, k = 10, returnTrain = T)

然后将这些作为参数提供给rfeControl函数：

lmctrl <- rfeControl(functions = lmFuncs, 
                     method = "repeatedcv", 
                     index = index,
                     verbose = TRUE)

set.seed(111333)
lrprofile <- rfe( z , x,
                  sizes = sizes,
                  rfeControl = lmctrl)

如果您遇到同样的问题，请检查每个折叠中的预测变量之间是否存在高度相关：

for(i in 1:length(index)){
  crrltn = cor(x[index[[i]],])     
  findCorrelation(crrltn, cutoff = .90, names = T, verbose = T)
}

R Caret的rfe [错误{：任务1失败 - “rfe期待184个重要值，但只有2个”]

1 个答案: