以下是代码的摘录,我正在尝试使用德国信用数据集。
我正在尝试为我的shinydashboard制作一套通用函数。
问题在于gbm。如果响应变量未转换为因子,则R会话将崩溃。
如果响应变量转换为因子,则RandomForest将不会在其输出组件中产生OOB错误率和混淆矩阵。
请告知。
响应变量是"默认"。在应用模型之前, 响应变量被视为,
## load the dataset
data_x = read.csv("credit.csv")
## Preprocessing the dataset
data_x$default <- ifelse(data_x$default == "yes", 1, 0)
##Loading packages
pacman::p_load(shiny,shinydashboard,gbm,
randomForest,ggplot2,ipred,caret,ROCR,dplyr,ModelMetrics)
user defined function
model = function(algo =gbm ,distribution = 'bernoulli',
type = 'response', set ='AUC',n.trees =10000){
## Fit the model
model<- algo(formula = default ~ .,
distribution = distribution,
data = train,
n.trees = n.trees,
cv.fold= 3)
## Generate the prediction on the test set
pred<- predict(object = model,
newdata = test,
n.trees = n.trees,
type = type)
## Generate the test set AUCs using the pred
AUC<- auc(actual = test$default, predicted = pred)
if (set == 'AUC'){
return(AUC)
}
if (set == 'predictions'){
return(pred)
}
if (set == 'model'){
return(model)
}
else
return(NULL)
}
now call different model
List of different models
get_model<- function(algo,type = 'response', ntrees = 10000){
z= model(algo = algo, type= type, set = 'model')
}
Bag_model<- get_model(algo = bagging, type='prob')
RF_model<- get_model(algo = randomForest)
GBM_model<- get_model(algo = gbm)