Question

任何人都可以将sklearn混淆矩阵匹配到h2o吗？

他们从不匹配。...

用Keras做类似的事情会完美匹配。

但是在h2o中，它们始终处于关闭状态。尝试了每种方式...

从以下位置借用了一些代码： Any difference between H2O and Scikit-Learn metrics scoring?

        'use strict';
        const crypto = require('crypto');
        const ALGORITHM = 'AES-256-ECB';
        const secretString = 'AAABBBCCC'

        // missing part in JS (how to convert secretString to key)

        function encrypt(plaintext, key) {
            const cipher = crypto.createCipheriv(ALGORITHM, key, Buffer.alloc(0));
            return cipher.update(plaintext, 'utf8', 'base64') + cipher.final('base64');
        }

Answer 1

这就是窍门，对预感Vivek而言。仍然不是完全匹配，但非常接近。

perf = model.model_performance(train)
threshold = perf.find_threshold_by_max_metric('f1')
model.model_performance(test).confusion_matrix(thresholds=threshold)

Answer 2

我也遇到同样的问题。这是我要做的一个公平的比较：

model.train(x=x, y=y, training_frame=train, validation_frame=test)
cm1 = model.confusion_matrix(metrics=['F1'], valid=True)

由于我们使用训练数据和验证数据来训练模型，因此pred['predict']将使用the threshold which maximizes the F1 score of validation data。为了确保这一点，可以使用以下几行：

threshold = perf.find_threshold_by_max_metric(metric='F1', valid=True)
pred_df['predict'] = pred_df['p1'].apply(lambda x: 0 if x < threshold else 1)

要从scikit中获得另一个混淆矩阵，请学习：

from sklearn.metrics import confusion_matrix

cm2 = confusion_matrix(y_true, pred_df['predict'])

就我而言，我不明白为什么我得到的结果略有不同。例如：

print(cm1)
>> [[3063  176]
    [  94  146]]

print(cm2)
>> [[3063  176]
    [  95  145]]

h2o vs scikit学习混淆矩阵

2 个答案: