Question

我创建了一个简单的脚本来对随机森林分类器应用网格搜索，虽然我过去曾使用它，但它现在似乎已被打破，我无法找到原因。

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

rfc = RandomForestClassifier(n_estimators=100, n_jobs=-1).fit(X, y)


grid_values = {'criterion':['gini','entropy'], 'max_features':['log2', 5, 10, 15, 20, 25], 'max_depth':[None, 5, 10, 15, 20],
               'min_samples_split':[2, 3],'n_jobs':[-1], 'class_weight': [{0 : 1., 1: 30.}, {0 : 1., 1: 50.}, {0 : 1., 1: 100.}]}

for eval_metric in ('precision', 'accuracy'):
  rfc_custom = GridSearchCV(rfc, param_grid=grid_values, scoring=eval_metric)
  rfc_custom.fit(X_train, y_train)
  rfc_custom.best_params_
  print('Grid best parameter (max. {0}): {1}'
         .format(eval_metric, rfc_custom.best_params_))
  print('Grid best score ({0}): {1}'
         .format(eval_metric, rfc_custom.best_score_))

当我运行此操作时，我收到以下警告： UndefinedMetricWarning：由于没有预测样本，精度定义不明确并设置为0.0。

在线搜索，我添加了此代码并停止警告：

import warnings
import sklearn.exception

warnings.filterwarnings("ignore",category=sklearn.exceptions.UndefinedMetricWarning)

运行算法后，我得到精度为0.0

这是正常的，因为我得到的警告？我可能错过了什么吗？

Answer 1

我认为在一些CV条件下没有TP和FP样本，因此在GridSearchCV内部发生了零分裂。如果验证数据没有标签数据，或者所有样本都被错误地分类到其他样本中，就会发生这种情况。

备忘录：精度的定义是（TP）/（TP + FP），其中TP为真阳性，FP为假阳性。

GridSearch'UndefinedfinedMetricWarning'和糟糕的结果

1 个答案: