R caret:结合rfe()和train()

时间:2019-01-08 09:05:42

标签: r random-forest r-caret

我想将递归特征消除与rfe()结合起来,并使用trainControl()(随机森林)的方法与rf进行模型选择一起进行调整。我希望使用MAPE(平均绝对百分比误差)代替标准的摘要统计量。因此,我使用ChickWeight数据集尝试了以下代码:

library(caret)
library(randomForest)
library(MLmetrics)

# Compute MAPE instead of other metrics
mape <- function(data, lev = NULL, model = NULL){
  mape <- MAPE(y_pred = data$pred, y_true = data$obs)
  c(MAPE = mape)
}

# specify trainControl
trc <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid", savePred =T,
                    summaryFunction = mape)
# set up grid
tunegrid <- expand.grid(.mtry=c(1:3))

# specify rfeControl
rfec <- rfeControl(functions=rfFuncs, method="cv", number=10, saveDetails = TRUE)

set.seed(42)
results <- rfe(weight ~ Time + Chick + Diet, 
           sizes=c(1:3), # number of predictors from which should algorithm chose the best predictor
           data = ChickWeight, 
           method="rf",
           ntree = 250, 
           metric= "RMSE", 
           tuneGrid=tunegrid,
           rfeControl=rfec,
           trControl = trc)

代码运行无错误。但是在哪里可以找到在summaryFunction中定义为trainControl的MAPE? trainControl被执行还是被忽略?

我该如何重写代码以使用rfe进行递归特征消除,然后在mtry中使用trainControl调整超参数rfe并同时计算额外的错误度量(MAPE)?

1 个答案:

答案 0 :(得分:1)

trainControl被忽略,因为其描述

  

控制火车函数

的计算细微差别

会建议。要使用MAPE,您需要

rfec$functions$summary <- mape

然后

rfe(weight ~ Time + Chick + Diet, 
    sizes = c(1:3),
    data = ChickWeight, 
    method ="rf",
    ntree = 250, 
    metric = "MAPE", # Modified
    maximize = FALSE, # Modified
    rfeControl = rfec)
#
# Recursive feature selection
#
# Outer resampling method: Cross-Validated (10 fold) 
#
# Resampling performance over subset size:
#
#  Variables   MAPE  MAPESD Selected
#          1 0.1903 0.03190         
#          2 0.1029 0.01727        *
#          3 0.1326 0.02136         
#         53 0.1303 0.02041         
#
# The top 2 variables (out of 2):
#    Time, Chick.L