我在sklearn中使用RandomizedSearchCV函数和随机森林分类器。 要查看其他指标,我正在使用自定义评分
from sklearn.metrics import make_scorer, roc_auc_score, recall_score, matthews_corrcoef, balanced_accuracy_score, accuracy_score
acc = make_scorer(accuracy_score)
auc_score = make_scorer(roc_auc_score)
recall = make_scorer(recall_score)
mcc = make_scorer(matthews_corrcoef)
bal_acc = make_scorer(balanced_accuracy_score)
scoring = {"roc_auc_score": auc_score, "recall": recall, "MCC" : mcc, 'Bal_acc' : bal_acc, "Accuracy": acc }
这些自定义评分器用于随机搜索
rf_random = RandomizedSearchCV(estimator=rf, param_distributions=random_grid, n_iter=100, cv=split, verbose=2,
random_state=42, n_jobs=-1, error_score=np.nan, scoring = scoring, iid = True, refit="roc_auc_score")
现在的问题是,当我使用自定义拆分时,AUC抛出异常,因为该精确拆分只有一个类标签。
我不想更改拆分,因此是否有可能在RandomizedSearchCV或make_scorer函数中捕获这些异常? 所以例如如果未计算其中一个指标(由于异常),则只需输入NaN并继续使用下一个模型。
编辑: 显然,error_score除外模型训练,但不包括度量标准计算。如果我使用例如Accuracy,那么一切都会正常工作,而我只会在只有一个班级标签的地方收到警告。如果我使用AUC作为度量标准,我仍然会抛出异常。
在这里获得一些想法很棒!
解决方案: 定义自定义计分器,但有以下例外:
def custom_scorer(y_true, y_pred, actual_scorer):
score = np.nan
try:
score = actual_scorer(y_true, y_pred)
except ValueError:
pass
return score
这将导致一个新的指标:
acc = make_scorer(accuracy_score)
recall = make_scorer(custom_scorer, actual_scorer=recall_score)
new_auc = make_scorer(custom_scorer, actual_scorer=roc_auc_score)
mcc = make_scorer(custom_scorer, actual_scorer=matthews_corrcoef)
bal_acc = make_scorer(custom_scorer,actual_scorer=balanced_accuracy_score)
scoring = {"roc_auc_score": new_auc, "recall": recall, "MCC" : mcc, 'Bal_acc' : bal_acc, "Accuracy": acc }
又可以将其传递给RandomizedSearchCV的得分参数
我发现的第二个解决方案是:
def custom_auc(clf, X, y_true):
score = np.nan
y_pred = clf.predict_proba(X)
try:
score = roc_auc_score(y_true, y_pred[:, 1])
except Exception:
pass
return score
也可以传递给评分参数:
scoring = {"roc_auc_score": custom_auc, "recall": recall, "MCC" : mcc, 'Bal_acc' : bal_acc, "Accuracy": acc }
(改编自this answer)
答案 0 :(得分:1)
您可以有一个通用计分器,该计分器可以将其他计分器用作输入,检查结果,捕获他们抛出的任何异常并在其上返回固定值。
def custom_scorer(y_true, y_pred, actual_scorer):
score = np.nan
try:
score = actual_scorer(y_true, y_pred)
except Exception:
pass
return score
然后您可以使用以下命令来调用它:
acc = make_scorer(custom_scorer, actual_scorer = accuracy_score)
auc_score = make_scorer(custom_scorer, actual_scorer = roc_auc_score,
needs_threshold=True) # <== Added this to get correct roc
recall = make_scorer(custom_scorer, actual_scorer = recall_score)
mcc = make_scorer(custom_scorer, actual_scorer = matthews_corrcoef)
bal_acc = make_scorer(custom_scorer, actual_scorer = balanced_accuracy_score)
复制示例:
import numpy as np
def custom_scorer(y_true, y_pred, actual_scorer):
score = np.nan
try:
score = actual_scorer(y_true, y_pred)
except Exception:
pass
return score
from sklearn.metrics import make_scorer, roc_auc_score, accuracy_score
acc = make_scorer(custom_scorer, actual_scorer = accuracy_score)
auc_score = make_scorer(custom_scorer, actual_scorer = roc_auc_score,
needs_threshold=True) # <== Added this to get correct roc
from sklearn.datasets import load_iris
X, y = load_iris().data, load_iris().target
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, KFold
cvv = KFold(3)
params={'criterion':['gini', 'entropy']}
gc = GridSearchCV(DecisionTreeClassifier(), param_grid=params, cv =cvv,
scoring={"roc_auc": auc_score, "accuracy": acc},
refit="roc_auc", n_jobs=-1,
return_train_score = True, iid=False)
gc.fit(X, y)
print(gc.cv_results_)