Question

我试图使用from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model.logistic import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.grid_search import GridSearchCV import pandas as pd from sklearn.cross_validation import train_test_split from sklearn.metrics import precision_score, recall_score, accuracy_score from sklearn.preprocessing import LabelBinarizer import numpy as np pipeline = Pipeline([ ('vect', TfidfVectorizer(stop_words='english')), ('clf', LogisticRegression) ]) parameters = { 'vect__max_df': (0.25, 0.5, 0.75), 'vect__stop_words': ('english', None), 'vect__max_features': (2500, 5000, 10000, None), 'vect__ngram_range': ((1, 1), (1, 2)), 'vect__use_idf': (True, False), 'vect__norm': ('l1', 'l2'), 'clf__penalty': ('l1', 'l2'), 'clf__C': (0.01, 0.1, 1, 10) } if __name__ == '__main__': grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='accuracy', cv = 3) df = pd.read_csv('SMS Spam Collection/SMSSpamCollection', delimiter='\t', header=None) lb = LabelBinarizer() X, y = df[1], np.array([number[0] for number in lb.fit_transform(df[0])]) X_train, X_test, y_train, y_test = train_test_split(X, y) grid_search.fit(X_train, y_train) print('Best score: ', grid_search.best_score_) print('Best parameter set:') best_parameters = grid_search.best_estimator_.get_params() for param_name in sorted(best_parameters): print(param_name, best_parameters[param_name])包与python-3.4进行网格搜索，

Fitting 3 folds for each of 1536 candidates, totalling 4608 fits
Traceback (most recent call last):
  File "/home/xiangru/PycharmProjects/machine_learning_note_with_sklearn/grid search.py", line 36, in <module>
    grid_search.fit(X_train, y_train)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 732, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 493, in _fit
    base_estimator = clone(self.estimator)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 47, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 45, in clone
    new_object_params = estimator.get_params(deep=False)
TypeError: get_params() missing 1 required positional argument: 'self'

但是，它无法成功运行，错误消息如下所示：

if __name__ == '__main__':
    pipeline.get_params()

我也尝试过只使用

$(function() {

var $articles = $('.article');

$(".langButton").click(function() {
    var language = $(this).attr("data-language");
    $articles.hide(); // Hide them all
    $("." + language).show(); // than show the needed ones
});
});

function sortUsingNestedText(parent, childSelector, keySelector) {
    var items = parent.children(childSelector).sort(function(a, b) {
        var vA = $(keySelector, a).text();
        var vB = $(keySelector, b).text();
        return (vA < vB) ? -1 : (vA > vB) ? 1 : 0;
    });
    parent.append(items);
}

$('#sEthnicity').data("sortKey", "span.article ethnicity");
$('#sGender').data("sortKey", "span.article gender");
$('#sPet').data("sortKey", "span.article pet");
$('#sSubject').data("sortKey", "span.article subject");

$("button.langButton").click(function() {
    sortUsingNestedText($('#sortThis'), "div", $(this).data("sortKey"));
});

它给出了相同的错误消息。谁知道如何解决这个问题？

Answer 1

此错误几乎总是具有误导性，实际意味着您在类上调用实例方法，而不是实例（如调用dict.keys()而不是{{ 1}}在名为d.keys()的{{1}}上。^*

这正是这里发生的事情。 The docs暗示dict属性（如初始值设定项的d参数）不是估算器实例，它是估算器输入和＆＃34;为每个网格点实例化该类型的对象。＆＃34;

因此，如果要调用方法，则必须为某些特定网格点构造该类型的对象。

然而，通过快速浏览一下文档，如果您试图获取用于最佳估算器的特定实例的参数，并且返回最佳分数，那么就不会是best_estimator_？（我很抱歉这部分有点猜测......）

对于estimator电话，你肯定有一个实例。并且该方法的唯一documentation是一个参数规范，它表明它需要一个可选参数best_params_。但在幕后，它可能会将Pipeline调用转发给其中一个属性。使用deep，看起来您正在使用类 get_params()构建它，而不是该类的实例，所以如果那是什么它最终转发到，这将解释问题。

_{*错误的原因＆＃34;缺少1个必需的位置参数：＆＃39; self＆＃39;＆＃34;而不是＆＃34;必须在实例上调用＆＃34;或者在Python中的某些东西，('clf', LogisticRegression)被有效地转换为LogisticRegression，并且明确地以这种方式调用它是完全合法的（有时是有用的），因此Python无法真正实现告诉你d.keys()是非法的，只是因为它错过了dict.keys(d)参数。}

Answer 2

我终于解决了问题。原因正如abarnert所说的那样。

首先我尝试了：

pipeline = LogisticRegression()

parameters = {
    'penalty': ('l1', 'l2'),
    'C': (0.01, 0.1, 1, 10)
}

并且效果很好。

凭借这种直觉，我将管道修改为：

pipeline = Pipeline([
    ('vect', TfidfVectorizer(stop_words='english')),
    ('clf', LogisticRegression())
])

请注意()之后有LogisticRegression。这次是有效的。

Answer 3

更改 LogisticRegression

pipeline = Pipeline([
('vect', TfidfVectorizer(stop_words='english')),
('clf', LogisticRegression)
 ])

到

LogisticRegression()

问题就解决了。

TypeError：get_params（）缺少1个必需的位置参数：＆＃39; self＆＃39;

3 个答案: