如何设置阈值scikit学习随机森林模型

时间:2018-04-11 23:40:50

标签: python scikit-learn

在看到precision_recall_curve之后,如果我想设置threshold = 0.4,如何将0.4实现到我的随机森林模型(二进制分类)中,对于任何概率<0.4,将其标记为0,对于任何&gt; = 0.4,将其标记为1。

from sklearn.ensemble import RandomForestClassifier
  random_forest = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=12)
  random_forest.fit(X_train, y_train)
from sklearn.metrics import accuracy_score
  predicted = random_forest.predict(X_test)
accuracy = accuracy_score(y_test, predicted)

文档Precision recall

3 个答案:

答案 0 :(得分:3)

假设您正在进行二元分类,这很容易:

threshold = 0.4

predicted_proba = random_forest.predict_proba(X_test)
predicted = (predicted_proba [:,1] >= threshold).astype('int')

accuracy = accuracy_score(y_test, predicted)

答案 1 :(得分:0)

random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, y_train)

threshold = 0.4

predicted = random_forest.predict_proba(X_test)
predicted [:,0] = (predicted [:,0] < threshold).astype('int')
predicted [:,1] = (predicted [:,1] >= threshold).astype('int')


accuracy = accuracy_score(y_test, predicted)
print(round(accuracy,4,)*100, "%")

这附带一个错误是指最后一个准确性部分“ValueError:无法处理二进制和多标记指示符的混合”

答案 2 :(得分:0)

sklearn.metrics.accuracy_score 需要一维数组,但您的预测 数组是二维数组。这有一个错误。 https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html