TypeError:get_params()缺少1个必需的位置参数:' self'

时间:2015-05-04 09:40:02

标签: python scikit-learn

我试图使用from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model.logistic import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.grid_search import GridSearchCV import pandas as pd from sklearn.cross_validation import train_test_split from sklearn.metrics import precision_score, recall_score, accuracy_score from sklearn.preprocessing import LabelBinarizer import numpy as np pipeline = Pipeline([ ('vect', TfidfVectorizer(stop_words='english')), ('clf', LogisticRegression) ]) parameters = { 'vect__max_df': (0.25, 0.5, 0.75), 'vect__stop_words': ('english', None), 'vect__max_features': (2500, 5000, 10000, None), 'vect__ngram_range': ((1, 1), (1, 2)), 'vect__use_idf': (True, False), 'vect__norm': ('l1', 'l2'), 'clf__penalty': ('l1', 'l2'), 'clf__C': (0.01, 0.1, 1, 10) } if __name__ == '__main__': grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='accuracy', cv = 3) df = pd.read_csv('SMS Spam Collection/SMSSpamCollection', delimiter='\t', header=None) lb = LabelBinarizer() X, y = df[1], np.array([number[0] for number in lb.fit_transform(df[0])]) X_train, X_test, y_train, y_test = train_test_split(X, y) grid_search.fit(X_train, y_train) print('Best score: ', grid_search.best_score_) print('Best parameter set:') best_parameters = grid_search.best_estimator_.get_params() for param_name in sorted(best_parameters): print(param_name, best_parameters[param_name]) 包与python-3.4进行网格搜索,

Fitting 3 folds for each of 1536 candidates, totalling 4608 fits
Traceback (most recent call last):
  File "/home/xiangru/PycharmProjects/machine_learning_note_with_sklearn/grid search.py", line 36, in <module>
    grid_search.fit(X_train, y_train)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 732, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 493, in _fit
    base_estimator = clone(self.estimator)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 47, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 45, in clone
    new_object_params = estimator.get_params(deep=False)
TypeError: get_params() missing 1 required positional argument: 'self'

但是,它无法成功运行,错误消息如下所示:

if __name__ == '__main__':
    pipeline.get_params()

我也尝试过只使用

$(function() {

var $articles = $('.article');

$(".langButton").click(function() {
    var language = $(this).attr("data-language");
    $articles.hide(); // Hide them all
    $("." + language).show(); // than show the needed ones
});
});

function sortUsingNestedText(parent, childSelector, keySelector) {
    var items = parent.children(childSelector).sort(function(a, b) {
        var vA = $(keySelector, a).text();
        var vB = $(keySelector, b).text();
        return (vA < vB) ? -1 : (vA > vB) ? 1 : 0;
    });
    parent.append(items);
}

$('#sEthnicity').data("sortKey", "span.article ethnicity");
$('#sGender').data("sortKey", "span.article gender");
$('#sPet').data("sortKey", "span.article pet");
$('#sSubject').data("sortKey", "span.article subject");

$("button.langButton").click(function() {
    sortUsingNestedText($('#sortThis'), "div", $(this).data("sortKey"));
});

它给出了相同的错误消息。 谁知道如何解决这个问题?

3 个答案:

答案 0 :(得分:28)

此错误几乎总是具有误导性,实际意味着您在类上调用实例方法,而不是实例(如调用dict.keys()而不是{{ 1}}在名为d.keys()的{​​{1}}上。 *

这正是这里发生的事情。 The docs暗示dict属性(如初始值设定项的d参数)不是估算器实例,它是估算器输入和&#34;为每个网格点实例化该类型的对象。&#34;

因此,如果要调用方法,则必须为某些特定网格点构造该类型的对象。

然而,通过快速浏览一下文档,如果您试图获取用于最佳估算器的特定实例的参数,并且返回最佳分数,那么就不会是best_estimator_? (我很抱歉这部分有点猜测......)

对于estimator电话,你肯定有一个实例。并且该方法的唯一documentation是一个参数规范,它表明它需要一个可选参数best_params_。但在幕后,它可能会将Pipeline调用转发给其中一个属性。使用deep,看起来您正在使用 get_params()构建它,而不是该类的实例,所以如果那是什么它最终转发到,这将解释问题。

*错误的原因&#34;缺少1个必需的位置参数:&#39; self&#39;&#34;而不是&#34;必须在实例上调用&#34;或者在Python中的某些东西,('clf', LogisticRegression)被有效地转换为LogisticRegression,并且明确地以这种方式调用它是完全合法的(有时是有用的),因此Python无法真正实现告诉你d.keys()是非法的,只是因为它错过了dict.keys(d)参数。

答案 1 :(得分:17)

我终于解决了问题。原因正如abarnert所说的那样。

首先我尝试了:

pipeline = LogisticRegression()

parameters = {
    'penalty': ('l1', 'l2'),
    'C': (0.01, 0.1, 1, 10)
}

并且效果很好。

凭借这种直觉,我将管道修改为:

pipeline = Pipeline([
    ('vect', TfidfVectorizer(stop_words='english')),
    ('clf', LogisticRegression())
])

请注意()之后有LogisticRegression。 这次是有效的。

答案 2 :(得分:-1)

更改 LogisticRegression

pipeline = Pipeline([
('vect', TfidfVectorizer(stop_words='english')),
('clf', LogisticRegression)
 ])

LogisticRegression()

问题就解决了。