我已经开始使用精度和召回率评估随机森林分类器。但是,尽管分类器的CPU和GPU实现的训练集和测试集相同,但我看到返回的评估分数有所不同。这是库中的已知错误吗?
下面两个代码示例供参考。
Scikit-Learn(CPU)
from sklearn.metrics import recall_score, precision_score
from sklearn.ensemble import RandomForestClassifier
rf_cpu = RandomForestClassifier(n_estimators=5000, n_jobs=-1)
rf_cpu.fit(X_train, y_train)
rf_cpu_pred = clf.predict(X_test)
recall_score(rf_cpu_pred, y_test)
precision_score(rf_cpu_pred, y_test)
CPU Recall: 0.807186
CPU Precision: 0.82095
H2O4GPU(GPU)
from h2o4gpu.metrics import recall_score, precision_score
from h2o4gpu import RandomForestClassifier
rf_gpu = RandomForestClassifier(n_estimators=5000, n_gpus=1)
rf_gpu.fit(X_train, y_train)
rf_gpu_pred = clf.predict(X_test)
recall_score(rf_gpu_pred, y_test)
precision_score(rf_gpu_pred, y_test)
GPU Recall: 0.714286
GPU Precision: 0.809988
答案 0 :(得分:0)
更正:发现精确度和召回率的输入顺序错误。根据Scikit-Learn documentation,顺序始终为(y_true, y_pred)
。
正确的评估代码
recall_score(y_test, rf_gpu_pred)
precision_score(y_test, rf_gpu_pred)