Question

我在scikit-learn中创建动态管道，并将评分函数设置为GridSearchCV上的参数字符串：

gs = GridSearchCV(pipeline, grid, scoring='accuracy')

但是，当我尝试获得用于评估预测的评分函数时，这就是我得到的：

  File "app/experimenter/sklearn/sklearn-dask-tests.py", line 127, in run_pipeline
    print(evaluator(expected, predicted))
TypeError: __call__() takes at least 4 arguments (3 given)

这是代码：

gs.fit(train_data, train_target)

predicted = gs.predict(test_data)

evaluator = gs.scorer_

print(evaluator(expected, predicted))

所以从我看到的问题是评估者实际上是make_scorer(accuracy_score)。如果我将估算器添加为第一个参数，我想有可能让print(evaluator(expected, predicted))工作，但我如何从管道中正确地获取它？

因为当我gs.best_estimator_时，我得到了这个：

Pipeline(steps=[('mapper', DataFrameMapper(default=False, df_out=False,
        features=[('Sex', LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False))],
        sparse=False)), ('DecisionTree', DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3,
            max_features=1, max_...      min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'))])

Answer 1

是的，这就是GridSearchCV的理想行为。从gs.best_estimator_返回的管道对象已经安装在整个train_data上，网格搜索中找到了最佳参数。

您需要将该管道对象发送到evaluator。但您目前使用的evaluator是错误的。

make_scorer的scorer_是做什么的，它需要测试数据，对其进行预测，然后通过将其与实际数据进行比较来计算得分。

因此，它的签名是：

scorer(estimator, X_test, y_test)

但你正试图将其用作：

evaluator(expected, predicted)

它不起作用，因为：

不符合得分手的签名
它需要X数据，而不是预测的y数据。

因此，如果您已经拥有数据的实际值和预测值，则可以使用：

accuracy_score(expected, predicted)

如果您想使用scorer_（您的evaluator），则不应提供predicted，而是提供test_data（predicted evaluator(gs.best_estimator_, test_data, expected) 1}}得到了）

mWebView.setDownloadListener(new DownloadListener() {
public void onDownloadStart(String url, String userAgent,
            String contentDisposition, String mimetype,
            long contentLength) {
    Intent i = new Intent(Intent.ACTION_VIEW);
    i.setData(Uri.parse(url));
    startActivity(i);
}

使用GridSearchCV使用的评分函数来获取预测

1 个答案: