GridSearchCV和RandomizedSearchCV(sklearn):TypeError:__call __()缺少1个必需的位置参数:'y_true'

时间:2019-03-04 12:25:41

标签: python-3.x scikit-learn grid-search

我正在尝试使用GridSearchCVRandomizedSearchCV来找到两种无监督学习算法(用于新颖性检测)的最佳参数,分别是OneClassSVMLocalOutlierFactor sklearn

下面是我编写的函数(对此example进行了修改):

def gridsearch(clf, param_dist_rand, param_grid_exhaustive, X):


    def report(results, n_top=3):
       for i in range(1, n_top + 1):
           candidates = np.flatnonzero(results['rank_test_score'] == i)
           for candidate in candidates:
               print("Model with rank: {0}".format(i))
               print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
                results['mean_test_score'][candidate],
                results['std_test_score'][candidate]))
               print("Parameters: {0}".format(results['params'][candidate]))
               print("")

     n_iter_search = 20
     random_search = RandomizedSearchCV(clf, 
     param_distributions=param_dist_rand, n_iter=n_iter_search, cv=5, 
     error_score=np.NaN, scoring='accuracy')

      start = time()
      random_search.fit(X)
      print("RandomizedSearchCV took %.2f seconds for %d candidates"
      " parameter settings." % ((time() - start), n_iter_search))
      report(random_search.cv_results_)


      grid_search = GridSearchCV(clf, param_grid=param_grid_exhaustive, 
      cv=5, error_score=np.NaN, scoring='accuracy')
      start = time()
      grid_search.fit(X)

      print("GridSearchCV took %.2f seconds for %d candidate parameter 
      settings."
      % (time() - start, len(grid_search.cv_results_['params'])))
      report(grid_search.cv_results_)

要测试上述功能,我需要以下代码:

X, W = train_test_split(all_data, test_size=0.2, random_state=42)
clf_lof = LocalOutlierFactor(novelty=True, contamination='auto')
lof_param_dist_rand = {'n_neighbors': np.arange(6, 101, 1), 'leaf_size': 
                      np.arange(30, 101, 10)}
lof_param_grid_exhaustive = {'n_neighbors': np.arange(6, 101, 1), 
                           'leaf_size': np.arange(30, 101, 10)}
gridsearch(clf=clf_lof, param_dist_rand=lof_param_dist_rand, 
param_grid_exhaustive=lof_param_grid_exhaustive, X=X)


clf_svm = svm.OneClassSVM()
svm_param_dist_rand = {'nu': np.arange(0, 1.1, 0.01), 'kernel': ['rbf', 
                     'linear','poly','sigmoid'], 'degree': np.arange(0, 7, 
                      1), 'gamma': scipy.stats.expon(scale=.1),}
svm_param_grid_exhaustive = {'nu': np.arange(0, 1.1, 0.01), 'kernel': 
                            ['rbf', 'linear','poly','sigmoid'], 'degree': 
                            np.arange(0, 7, 1), 'gamma': 0.25}
gridsearch(clf=clf_svm, param_dist_rand=svm_param_dist_rand, 
param_grid_exhaustive=svm_param_grid_exhaustive, X=X)

最初,我没有同时为两个scoring方法都设置GridSearch参数,但出现了这个错误:

TypeError: If no scoring is specified, the estimator passed should have a 'score' method.

然后我添加了scoring='accuracy',因为我想使用测试精度来判断不同模型配置的性能。现在我收到此错误:

TypeError: __call__() missing 1 required positional argument: 'y_true'

我没有标签,因为我有来自一个类的数据,而没有来自计数器类的数据,因此我不知道该如何处理该错误。另外,我查看了此question中的建议,但没有帮助。任何帮助将不胜感激。

编辑: 根据@FChm建议提供的示例数据,请找到示例.csv数据文件here。文件的简短描述: 由我提供给模型的四列要素(由PCA生成)组成。

0 个答案:

没有答案