Question

我正在使用RandomForestClassifier对GridSearchCV进行参数调整。出于评估目的，我想要best_estimator的混淆矩阵，据我所知，GridSearchCV没有保存。

gs = GridSearchCV(RandomForestClassifier(n_estimators=1000, random_state=42), param_grid={'max_depth': range(5, 25, 4), 'min_samples_leaf': range(5, 40, 5),'criterion': ['entropy', 'gini']}, scoring=scoring, cv=3, refit='Accuracy', n_jobs=-1)
gs.fit(X_Distances, Y)
results = gs.cv_results_

我使用给定参数初始化gridsearch以接收best_parameters。最后，我使用最佳参数来复制gridsearch的{{1}}，并进行分层交叉验证。我假设我正在使用与gridSearchCV相同的训练/测试数据来训练/验证best_estimator，因为我使用相同的参数和交叉验证选项（分层+3倍）。

best_estimator

我对过度拟合有些担忧。是否有更简单的方法来获取rf = RandomForestClassifier(n_estimators=1000, min_samples_leaf=7, max_depth=18, criterion='entropy', random_state=42) accuracy = [] metrics = {'accuracy':[], 'precision':[], 'recall':[], 'fscore':[], 'support':[]} counter = 0 print('################################################### RandomForest ###################################################') skf = StratifiedKFold(n_splits=3, random_state=42, shuffle=False) for train_index, test_index in skf.split(X_Distances,Y): X_train, X_test = X_Distances[train_index], X_Distances[test_index] y_train, y_test = Y[train_index], Y[test_index] rf.fit(X_train, y_train) y_pred = rf.predict(X_test) precision, recall, fscore, support = np.round(score(y_test, y_pred), 2) metrics['accuracy'].append(round(accuracy_score(y_test, y_pred), 2)) metrics['precision'].append(precision) metrics['recall'].append(recall) metrics['fscore'].append(fscore) metrics['support'].append(support) print(classification_report(y_test, y_pred)) matrix = confusion_matrix(y_test, y_pred) methods.saveConfusionMatrix(matrix, ('confusion_matrix_randomforest_distances_' + str(counter) +'.png')) counter = counter+1 meanAcc= round(np.mean(np.asarray(metrics['accuracy'])),2)*100 print('meanAcc: ', meanAcc)的混淆矩阵？如果没有，我的方法是否正确？

编辑：我刚刚测试了以下内容：

best_estimator

这会在gs = GridSearchCV(RandomForestClassifier(n_estimators=100, random_state=42), param_grid={'max_depth': range(5, 25, 4), 'min_samples_leaf': range(5, 40, 5),'criterion': ['entropy', 'gini']}, scoring=scoring, cv=3, refit='Accuracy', n_jobs=-1) gs.fit(X_Distances, Y)处产生best_score = 0.5362903225806451。当我在索引28处检查3倍的准确度时，我得到：

split0：0.5185929648241207
split1：0.526686807653575
split2：0.5637651821862348

这导致平均测试准确度：0.5362903225806451。 best_params：best_index = 28

现在我运行这个代码，它使用上面提到的best_params和一个分层的3倍分割（如GridSearchCV）：

{'criterion': 'entropy', 'max_depth': 21, 'min_samples_leaf': 5}

指标dictionairy产生完全相同的准确度（split0：0.5185929648241207，split1：0.526686807653575，split2：0.5637651821862348）

然而，平均计算有点偏差：0.5363483182213101

如何获得GridSearchCV的best_estimator的混淆矩阵

0 个答案: