我想将递归特征消除与rfe()
结合起来,并使用trainControl()
(随机森林)的方法与rf
进行模型选择一起进行调整。我希望使用MAPE(平均绝对百分比误差)代替标准的摘要统计量。因此,我使用ChickWeight
数据集尝试了以下代码:
library(caret)
library(randomForest)
library(MLmetrics)
# Compute MAPE instead of other metrics
mape <- function(data, lev = NULL, model = NULL){
mape <- MAPE(y_pred = data$pred, y_true = data$obs)
c(MAPE = mape)
}
# specify trainControl
trc <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid", savePred =T,
summaryFunction = mape)
# set up grid
tunegrid <- expand.grid(.mtry=c(1:3))
# specify rfeControl
rfec <- rfeControl(functions=rfFuncs, method="cv", number=10, saveDetails = TRUE)
set.seed(42)
results <- rfe(weight ~ Time + Chick + Diet,
sizes=c(1:3), # number of predictors from which should algorithm chose the best predictor
data = ChickWeight,
method="rf",
ntree = 250,
metric= "RMSE",
tuneGrid=tunegrid,
rfeControl=rfec,
trControl = trc)
代码运行无错误。但是在哪里可以找到在summaryFunction
中定义为trainControl
的MAPE? trainControl
被执行还是被忽略?
我该如何重写代码以使用rfe
进行递归特征消除,然后在mtry
中使用trainControl
调整超参数rfe
并同时计算额外的错误度量(MAPE)?
答案 0 :(得分:1)
trainControl
被忽略,因为其描述
控制火车函数
的计算细微差别
会建议。要使用MAPE,您需要
rfec$functions$summary <- mape
然后
rfe(weight ~ Time + Chick + Diet,
sizes = c(1:3),
data = ChickWeight,
method ="rf",
ntree = 250,
metric = "MAPE", # Modified
maximize = FALSE, # Modified
rfeControl = rfec)
#
# Recursive feature selection
#
# Outer resampling method: Cross-Validated (10 fold)
#
# Resampling performance over subset size:
#
# Variables MAPE MAPESD Selected
# 1 0.1903 0.03190
# 2 0.1029 0.01727 *
# 3 0.1326 0.02136
# 53 0.1303 0.02041
#
# The top 2 variables (out of 2):
# Time, Chick.L