我用H2ORandomForestEstimator建立了一个模型,结果显示如下。
阈值不断变化(转换时为0.5,验证时为0.313725489027),我想在H2ORandomForestEstimator中固定阈值,以便在微调期间进行比较。有没有办法设置阈值?
如果无法设置此值,我们如何知道我们的模型基于什么阈值?
rf_v1
** Reported on train data. **
MSE: 2.75013548238e-05
RMSE: 0.00524417341664
LogLoss:0.000494320913199
Mean Per-Class Error: 0.0188802936476
AUC: 0.974221763605
Gini: 0.948443527211
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.5:
0 1 Error Rate
----- ------ --- ------- --------------
0 161692 1 0 (1.0/161693.0)
1 3 50 0.0566 (3.0/53.0)
Total 161695 51 0 (4.0/161746.0)
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
--------------------------- ----------- -------- -----
max f1 0.5 0.961538 19
max f2 0.25 0.955056 21
max f0point5 0.571429 0.983936 18
max accuracy 0.571429 0.999975 18
max precision 1 1 0
max recall 0 1 69
max specificity 1 1 0
max absolute_mcc 0.5 0.961704 19
max min_per_class_accuracy 0.25 0.962264 21
max mean_per_class_accuracy 0.25 0.98112 21
Gains/Lift Table: Avg response rate: 0.03 %
** Reported on validation data. **
MSE: 1.00535766226e-05
RMSE: 0.00317073755183
LogLoss: 4.53885183426e-05
Mean Per-Class Error: 0.0
AUC: 1.0
Gini: 1.0
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.313725489027:
0 1 Error Rate
----- ----- --- ------- -------------
0 53715 0 0 (0.0/53715.0)
1 0 16 0 (0.0/16.0)
Total 53715 16 0 (0.0/53731.0)
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
--------------------------- ----------- ------- -----
max f1 0.313725 1 5
max f2 0.313725 1 5
max f0point5 0.313725 1 5
max accuracy 0.313725 1 5
max precision 1 1 0
max recall 0.313725 1 5
max specificity 1 1 0
max absolute_mcc 0.313725 1 5
max min_per_class_accuracy 0.313725 1 5
max mean_per_class_accuracy 0.313725 1 5
答案 0 :(得分:2)
阈值为max-F1。
如果要应用自己的阈值,则必须采用正分类的概率,然后自己进行比较以产生所需的标签。
如果使用Web浏览器连接到H2O-3内部的H2O Flow Web UI,则可以将鼠标悬停在ROC曲线上,并直观地浏览每个阈值的混淆矩阵。