Question

我正在使用插入符号包来训练模型，并希望获得模型的准确性。我听到的一种常见方法是使用confusionMatrix。但是，当我在下面运行代码时，经过训练的模型为我提供了一些准确度值，这些值与confusionMatrix（）报告的略有不同。所以我的问题是我应该使用什么精度？如何解释模型直接在控制台中提供的准确性？

ModelRF_ALL_b <- train(price~.,method="rf",data=datatraining_b)
ModelRF_ALL_b

控制台报告以下内容

Random Forest 

8143 samples
   8 predictor
   2 classes: '0', '1' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 8143, 8143, 8143, 8143, 8143, 8143, ... 
Resampling results across tuning parameters:

  mtry  Accuracy   Kappa    
  2     0.9948108  0.9843501
  4     0.9945824  0.9836512
  7     0.9940732  0.9821099

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 2.

我也可以运行confusionMatrix（）

confusionMatrix(datatraining_b$price,
predict(ModelRF_ALL_b,datatraining_b))

它的精度为1。

Confusion Matrix and Statistics

      Reference
Prediction    0    1
     0 6414    0
     1    0 1729

           Accuracy : 1          
             95% CI : (0.9995, 1)
No Information Rate : 0.7877     
P-Value [Acc > NIR] : < 2.2e-16  

              Kappa : 1          
 Mcnemar's Test P-Value : NA         

        Sensitivity : 1.0000     
        Specificity : 1.0000     
     Pos Pred Value : 1.0000     
     Neg Pred Value : 1.0000     
         Prevalence : 0.7877     
     Detection Rate : 0.7877     
   Detection Prevalence : 0.7877     
  Balanced Accuracy : 1.0000     

   'Positive' Class : 0

Answer 1

您可以将这些值分别解释为样本内的准确度，而无需重新采样。

当您拟合模型时，软件包caret会执行自举重采样，重复25次，这可以在模型输出中看到。因此，精度值基于25 x 8143观测值。为了创建混淆矩阵，您使用的是最终模型（mtry = 2的模型）来预测训练样本的结果，该样本的长度为8143。因此，在相应样本中略有差异是正常的准确性。

在评估拟合优度时，您需要小心，因为您正在使用同一数据集训练和评估模型。毫不奇怪，您可以获得很高的准确性。最好使用看不见的数据集评估最终模型，以确保其性能并发现可能的过度拟合问题。

如何使用插入符号包解释模型输出的准确性

1 个答案: