网格搜索查找AUC的参数

时间:2016-06-07 21:46:39

标签: python scikit-learn svm grid-search

我正在尝试为我的SVM找到参数,这给了我最好的AUC。但我无法在sklearn中找到AUC的任何得分功能。有人有想法吗?这是我的代码:

    parameters = {"C":[0.1, 1, 10, 100, 1000], "gamma":[0.1, 0.01, 0.001, 0.0001, 0.00001]}
    clf = SVC(kernel = "rbf")
    clf = GridSearchCV(clf, parameters, scoring = ???)
    svr.fit(features_train , labels_train)
    print svr.best_params_

那么我可以使用什么?获得高AUC分数的最佳参数?

4 个答案:

答案 0 :(得分:18)

您可以简单地使用:

clf = GridSearchCV(clf, parameters, scoring='roc_auc')

答案 1 :(得分:3)

你可以自己做任何得分手:

from sklearn.metrics import make_scorer
from sklearn.metrics import roc_curve, auc

# define scoring function 
 def custom_auc(ground_truth, predictions):
     # I need only one column of predictions["0" and "1"]. You can get an error here
     # while trying to return both columns at once
     fpr, tpr, _ = roc_curve(ground_truth, predictions[:, 1], pos_label=1)    
     return auc(fpr, tpr)

# to be standart sklearn's scorer        
 my_auc = make_scorer(custom_auc, greater_is_better=True, needs_proba=True)

 pipeline = Pipeline(
                [("transformer", TruncatedSVD(n_components=70)),
                ("classifier", xgb.XGBClassifier(scale_pos_weight=1.0, learning_rate=0.1, 
                                max_depth=5, n_estimators=50, min_child_weight=5))])

 parameters_grid = {'transformer__n_components': [60, 40, 20] }

 grid_cv = GridSearchCV(pipeline, parameters_grid, scoring = my_auc, n_jobs=-1,
                                                        cv = StratifiedShuffleSplit(n_splits=5,test_size=0.3,random_state = 0))
 grid_cv.fit(X, y)

有关详细信息,请查看此处:sklearn make_scorer

答案 2 :(得分:2)

使用下面的代码,它将为您提供所有参数列表

import sklearn

sklearn.metrics.SCORERS.keys()

选择要使用的适当参数

在您的情况下,以下代码将起作用

clf = GridSearchCV(clf, parameters, scoring = 'roc_auc')

答案 3 :(得分:1)

我没试过,但我相信你想使用sklearn.metrics.roc_auc_score

问题在于它不是模特得分手,所以你需要建立一个。 类似的东西:

from sklearn.metrics import roc_auc_score

def score_auc(estimator, X, y):
    y_score = estimator.predict_proba(X)  # You could also use the binary predict, but probabilities should give you a more realistic score.
    return roc_auc_score(y, y_score)

并将此函数用作GridSearch中的评分参数。