超参数基于概率使用GridSearchCV调整随机森林分类器

时间:2018-02-01 02:48:03

标签: python random-forest hyperparameters

刚开始进行随机森林二元分类的超参数调整,我想知道是否有人知道/可以建议如何将评分设置为基于预测概率而不是预测分类。理想情况下,我想要在下面的概率(即 [0.2,0.6,0.7,0.1,0.0] )中考虑roc_auc而不是分类(即 [0,1, 1,0,0] )。

from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier as rfc
from sklearn.grid_search import GridSearchCV

rfbase = rfc(n_jobs = 3, max_features = 'auto', n_estimators = 100, bootstrap=False)

param_grid = {
    'n_estimators': [200,500],
    'max_features': [.5,.7],
    'bootstrap': [False, True],
    'max_depth':[3,6]
}

rf_fit = GridSearchCV(estimator=rfbase, param_grid=param_grid
      , scoring = 'roc_auc')

我认为目前roc_auc正在脱离实际的分类。在我开始创建自定义评分功能之前,想要检查一下是否有更有效的方法,感谢您提前获得帮助!

使用Jarad提供的最终解决方案:

from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier as rfc
from sklearn.grid_search import GridSearchCV

rfbase = rfc(n_jobs = 3, max_features = 'auto', n_estimators = 100, bootstrap=False)

param_grid = {
    'n_estimators': [200,500],
    'max_features': [.5,.7],
    'bootstrap': [False, True],
    'max_depth':[3,6]
}

def roc_auc_scorer(y_true, y_pred):
    return roc_auc_score(y_true, y_pred[:, 1])
scorer = make_scorer(roc_auc_scorer, needs_proba=True)

rf_fit = GridSearchCV(estimator=rfbase, param_grid=param_grid
      , scoring = scorer)

1 个答案:

答案 0 :(得分:0)

使用提供的参考Jarad进行最终解决:

from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier as rfc
from sklearn.grid_search import GridSearchCV

rfbase = rfc(n_jobs = 3, max_features = 'auto', n_estimators = 100, bootstrap=False)

param_grid = {
    'n_estimators': [200,500],
    'max_features': [.5,.7],
    'bootstrap': [False, True],
    'max_depth':[3,6]
}

def roc_auc_scorer(y_true, y_pred):
    return roc_auc_score(y_true, y_pred[:, 1])
scorer = make_scorer(roc_auc_scorer, needs_proba=True)

rf_fit = GridSearchCV(estimator=rfbase, param_grid=param_grid
      , scoring = scorer)