Question

我已经开始使用精度和召回率评估随机森林分类器。但是，尽管分类器的CPU和GPU实现的训练集和测试集相同，但我看到返回的评估分数有所不同。这是库中的已知错误吗？

下面两个代码示例供参考。

Scikit-Learn（CPU）

from sklearn.metrics import recall_score, precision_score
from sklearn.ensemble import RandomForestClassifier

rf_cpu = RandomForestClassifier(n_estimators=5000, n_jobs=-1)
rf_cpu.fit(X_train, y_train)
rf_cpu_pred = clf.predict(X_test)

recall_score(rf_cpu_pred, y_test)
precision_score(rf_cpu_pred, y_test)

CPU Recall: 0.807186
CPU Precision: 0.82095

H2O4GPU（GPU）

from h2o4gpu.metrics import recall_score, precision_score
from h2o4gpu import RandomForestClassifier

rf_gpu = RandomForestClassifier(n_estimators=5000, n_gpus=1)
rf_gpu.fit(X_train, y_train)
rf_gpu_pred = clf.predict(X_test)

recall_score(rf_gpu_pred, y_test)
precision_score(rf_gpu_pred, y_test)

GPU Recall: 0.714286
GPU Precision: 0.809988

Answer 1

更正：发现精确度和召回率的输入顺序错误。根据Scikit-Learn documentation，顺序始终为(y_true, y_pred)。

正确的评估代码

recall_score(y_test, rf_gpu_pred)
precision_score(y_test, rf_gpu_pred)

H2O4GPU和Scikit-Learn之间的分类分数有所不同

1 个答案: