Caret的问题在使用caretStack对象时预测函数

时间:2017-08-01 12:16:51

标签: r r-caret ensemble-learning

我一直在使用caretEnsemble和插入符号包进行堆叠。我的数据是一个文档术语矩阵,带有一些额外的功能,如POS标记,目标是使用两个类进行情绪分析。 “Sentitr”表示对应于训练观察的情绪向量。 “Sentitest”测试集的向量。
我使用60:40分割

control <- trainControl(method="cv", number=10, savePredictions = "final",  classProbs = TRUE,
                        summaryFunction = twoClassSummary,
                        index=createResample(sentitr, 10))


algorithmList <- c('pda', 'nnet', 'gbm', 'svmLinear', 'rf', 'C5.0', 'glmnet')



models <- caretList(trainset, sentitr, trControl=control, methodList=algorithmList)


# some model info
summary(models)
res = resamples(models)
summary(res)
modelCor(res)
# lda and nnet extremely closely correlated



stackcontrol <- trainControl(method="cv", number=5, savePredictions = "final",  classProbs = TRUE,
                        summaryFunction = twoClassSummary)

# stacks

stack.c5.0 <- caretStack(models, method="C5.0", metric="ROC", trControl=stackcontrol)
summary(stack.c5.0)

stack.c50.pred = predict(stack.c5.0, newdata = testset, type = "raw")
stackc50.conf = confusionMatrix(stack.c50.pred, sentitest)

我每次将数据随机分配到60/40训练/测试集中时,我试图运行该模型10次。我在测试集上得到了以下分类准确度(我从混淆矩阵中提取)

   X0.3225 
1   0.3225  
2   0.2550  
3   0.7500 
4   0.2675 
5   0.2950 
6   0.7825 
7   0.2575  
8   0.2875  
9   0.2900  
10  0.3275 

这些是输出。如您所见,在两次模型迭代中实现了大约75-80%的准确度。这是预期的,并反映了我从拟合单个模型得到的结果。但是剩下的迭代会产生非常糟糕的准确性。在我看来,模型随机地混淆了测试误差和准确性。 任何想法导致这种行为的原因

当预测出现如此糟糕的准确性时,每次迭代,训练caretStack时都会出现以下错误:

2: In predict.C5.0(modelFit, newdata, trial = submodels$trials[j]) :
  'trials' should be <= 9 for this object. Predictions generated using 9 trials
3: In predict.C5.0(modelFit, newdata, type = "prob", trials = submodels$trials[j]) :
  'trials' should be <= 9 for this object. Predictions generated using 9 trials
4: In predict.C5.0(modelFit, newdata, trial = submodels$trials[j]) :
  'trials' should be <= 9 for this object. Predictions generated using 9 trials
5: In predict.C5.0(modelFit, newdata, type = "prob", trials = submodels$trials[j]) :
  'trials' should be <= 9 for this object. Predictions generated using 9 trials

0 个答案:

没有答案