Question

我一直在使用caretEnsemble和插入符号包进行堆叠。我的数据是一个文档术语矩阵，带有一些额外的功能，如POS标记，目标是使用两个类进行情绪分析。 “Sentitr”表示对应于训练观察的情绪向量。 “Sentitest”测试集的向量。
我使用60:40分割

control <- trainControl(method="cv", number=10, savePredictions = "final",  classProbs = TRUE,
                        summaryFunction = twoClassSummary,
                        index=createResample(sentitr, 10))


algorithmList <- c('pda', 'nnet', 'gbm', 'svmLinear', 'rf', 'C5.0', 'glmnet')



models <- caretList(trainset, sentitr, trControl=control, methodList=algorithmList)


# some model info
summary(models)
res = resamples(models)
summary(res)
modelCor(res)
# lda and nnet extremely closely correlated



stackcontrol <- trainControl(method="cv", number=5, savePredictions = "final",  classProbs = TRUE,
                        summaryFunction = twoClassSummary)

# stacks

stack.c5.0 <- caretStack(models, method="C5.0", metric="ROC", trControl=stackcontrol)
summary(stack.c5.0)

stack.c50.pred = predict(stack.c5.0, newdata = testset, type = "raw")
stackc50.conf = confusionMatrix(stack.c50.pred, sentitest)

我每次将数据随机分配到60/40训练/测试集中时，我试图运行该模型10次。我在测试集上得到了以下分类准确度（我从混淆矩阵中提取）

这些是输出。如您所见，在两次模型迭代中实现了大约75-80％的准确度。这是预期的，并反映了我从拟合单个模型得到的结果。但是剩下的迭代会产生非常糟糕的准确性。在我看来，模型随机地混淆了测试误差和准确性。任何想法导致这种行为的原因

当预测出现如此糟糕的准确性时，每次迭代，训练caretStack时都会出现以下错误：

2: In predict.C5.0(modelFit, newdata, trial = submodels$trials[j]) :
  'trials' should be <= 9 for this object. Predictions generated using 9 trials
3: In predict.C5.0(modelFit, newdata, type = "prob", trials = submodels$trials[j]) :
  'trials' should be <= 9 for this object. Predictions generated using 9 trials
4: In predict.C5.0(modelFit, newdata, trial = submodels$trials[j]) :
  'trials' should be <= 9 for this object. Predictions generated using 9 trials
5: In predict.C5.0(modelFit, newdata, type = "prob", trials = submodels$trials[j]) :
  'trials' should be <= 9 for this object. Predictions generated using 9 trials

Caret的问题在使用caretStack对象时预测函数

0 个答案: