Question

我已经创建了h20随机森林模型用于欺诈预测。现在，同时使用预测功能对测试数据进行评分。我从预测函数输出中获得了以下数据框。

现在对于第二条记录，它预测为1，但p1的概率远小于p0。可用于欺诈预测模型的正确概率分数（p0 / 1）和分类是什么？

如果这些不是正确的概率，那么使用下面提到的参数（calibrate_model = True）计算出的校准概率将给出正确的概率吗？

    nfolds=5
    rf1 = h2o.estimators.H2ORandomForestEstimator(
        model_id = "rf_df1", 
        ntrees = 200,
        max_depth = 4,
        sample_rate = .30,
       # stopping_metric="misclassification",
       # stopping_rounds = 2, 
        mtries = 6,
        min_rows = 12,
        nfolds=3,
        distribution = "multinomial",
        fold_assignment="Modulo",
        keep_cross_validation_predictions=True,
        calibrate_model = True,
        calibration_frame = calib,
        weights_column = "weight",
        balance_classes = True
      #  stopping_tolerance = .005)
       )

        predict p0          p1
    1   0   0.9986012   0.000896514
    2   1   0.9985695   0.000448676
    3   0   0.9981387   0.000477767

Answer 1

预测标签基于阈值，并且所使用的阈值通常基于使F1分数最大化的阈值。请参阅以下post，以了解有关如何解释概率结果的更多信息。

有关here和here的校准框架和模型如何工作的详细信息。

h20在测试数据上预测函数概率评分

1 个答案: