我正在尝试实现一些功能,以比较五个不同的机器学习模型,以预测回归问题中的某些值。
我的意图是开发一套可以训练不同代码并将其组织成一套结果的功能。我通过实例选择的模型是:套索,随机森林,支持向量机,线性模型和神经网络。为了调整某些模型,我打算使用Max Kuhn的引用:https://topepo.github.io/caret/available-models.html。 但是,由于每种模型都需要不同的调整参数,所以我不确定如何设置它们:
首先,我将网格设置为“ nnet”模型调整。在这里,我选择了隐藏层中不同数量的节点和衰减系数:
my.grid <- expand.grid(size=seq(from = 1, to = 10, by = 1), decay = seq(from = 0.1, to = 0.5, by = 0.1))
然后,我构建将以6折配置5次运行五个模型的函数:
my_list_model <- function(model) {
set.seed(1)
train.control <- trainControl(method = "repeatedcv",
number = 6,
repeats = 5,
returnResamp = "all",
savePredictions = "all")
# The tunning configurations of machine learning models:
set.seed(1)
fit_m <- train(ST1 ~.,
data = train, # my original dataframe, not showed in this code
method = model,
metric = "RMSE",
preProcess = "scale",
trControl = train.control
linout = 1 # linear activation function output
trace = FALSE
maxit = 1000
tuneGrid = my.grid) # Here is how I call the tune of 'nnet' parameters
return(fit_m)
}
最后,我执行了五个模型:
lapply(list(
Lass = "lasso",
RF = "rf",
SVM = "svmLinear",
OLS = "lm",
NN = "nnet"),
my_list_model) -> model_list
但是,当我运行它时,它显示:
错误:调整参数网格不应包含列分数
据我了解,我不知道如何很好地指定调音参数。如果我尝试抛弃'nnet'模型并将其更改为倒数第二行,例如,将其更改为XGBoost模型,则看起来效果很好,并且可以计算出结果。也就是说,似乎问题在于“ nnet”调整参数。
然后,我认为我真正的问题是:如何配置这些不同的模型参数,特别是“ nnet”模型。另外,由于我不需要设置套索,随机森林,svmLinear和线性模型的参数,因此如何通过插入符号包对其进行调整?
答案 0 :(得分:1)
my_list_model <- function(model,grd=NULL){
train.control <- trainControl(method = "repeatedcv",
number = 6,
returnResamp = "all",
savePredictions = "all")
# The tuning configurations of machine learning models:
set.seed(1)
fit_m <- train(Y ~.,
data = df, # my original dataframe, not showed in this code
method = model,
metric = "RMSE",
preProcess = "scale",
trControl = train.control,
linout = 1, # linear activation function output
trace = FALSE,
maxit = 1000,
tuneGrid = grd) # Here is how I call the tune of 'nnet' parameters
return(fit_m)
}
首先运行以下代码,然后查看所有相关参数
modelLookup('rf')
现在基于上面的查找代码制作所有模型的网格
svmGrid <- expand.grid(C=c(3,2,1))
rfGrid <- expand.grid(mtry=c(5,10,15))
创建所有模型网格的列表,并确保模型名称与列表中的名称相同
grd_all<-list(svmLinear=svmGrid
,rf=rfGrid)
model_list<-lapply(c("rf","svmLinear")
,function(x){my_list_model(x,grd_all[[x]])})
model_list
[[1]]
Random Forest
17 samples
3 predictor
Pre-processing: scaled (3)
Resampling: Cross-Validated (6 fold, repeated 1 times)
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ...
Resampling results across tuning parameters:
mtry RMSE Rsquared MAE
5 63.54864 0.5247415 55.72074
10 63.70247 0.5255311 55.35263
15 62.13805 0.5765130 54.53411
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 15.
[[2]]
Support Vector Machines with Linear Kernel
17 samples
3 predictor
Pre-processing: scaled (3)
Resampling: Cross-Validated (6 fold, repeated 1 times)
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ...
Resampling results across tuning parameters:
C RMSE Rsquared MAE
1 59.83309 0.5879396 52.26890
2 66.45247 0.5621379 58.74603
3 67.28742 0.5576000 59.55334
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was C = 1.