使用插入符号从rpart模型中获取不同的性能度量。为什么呢?

时间:2016-05-07 17:34:06

标签: r classification r-caret rpart

我一直使用插入符号来使用"rpart2"方法构建决策树模型,并在R中重复10次交叉验证。

set.seed(888)
DT_up_rpart <-train(DiagDM1~., data = training, method = "rpart2", trControl = ctrl2,metric = "ROC", tuneGrid = maxdepthGrid)

从这个电话中我得到了这个结果:

## CART
## 56662 samples
##    11 predictor
##     2 classes:'No','Si'
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times)
## Summary of sample sizes: 50995, 50995, 50996, 50996, 50996, 50996, ...
## Resampling results across tuning parameters:
##
##   maxdepth  ROC        Sens       Spec       Accuracy   Kappa
##   3         0.7378576  0.7408495  0.6449825  0.6929158  0.3858318
##   4         0.7382515  0.7367128  0.6502273  0.6934699  0.3869400
##   5         0.7382515  0.7367128  0.6502273  0.6934699  0.3869400
##   6         0.7382515  0.7367128  0.6502273  0.6934699  0.3869400
##
## ROC was used to select the optimal model using  the largest value.
## The final value used for the model was maxdepth = 4

正如您所看到的,模型的最终值是maxdepth = 4.因此,现在使用相同的训练数据集,相同的种子,相同的性能测量等,我使用maxdepth = 4训练新模型(常量)值):

set.seed(888)
DT_up_rpart2 <-train(DiagDM1~., data = training, method = "rpart2", trControl = ctrl2, metric = "ROC", tuneGrid =data.frame(maxdepth = 4))

我得到了这些结果:

## Resampling: Cross-Validated (10 fold, repeated 5 times)## Summary of sample sizes: 50995, 50995, 50996, 50996, 50996, 50996, ...
## Resampling results:
##   ROC        Sens       Spec       Accuracy   Kappa
##   0.7379012  0.7391975  0.6464791  0.6928382  0.3856765
##
## Tuning parameter'maxdepth'was held constant at a value of 4

请注意,在上面显示的第一个结果中,maxdepth = 4得到ROC = 0.7382512,而在第二个模型中,maxdepth = 4得到ROC = 0.7379012,所有性能测量都是如此。我期待从两个模型中获得相同的结果。这就是我在两种情况下设置种子值的原因。

为什么我会得到不同的结果?我使用相同的数据集,相同的种子等等......或者我是以错误的方式定义种子?

任何帮助都将不胜感激。

0 个答案:

没有答案