我试图使用from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.metrics import precision_score, recall_score, accuracy_score
from sklearn.preprocessing import LabelBinarizer
import numpy as np
pipeline = Pipeline([
('vect', TfidfVectorizer(stop_words='english')),
('clf', LogisticRegression)
])
parameters = {
'vect__max_df': (0.25, 0.5, 0.75),
'vect__stop_words': ('english', None),
'vect__max_features': (2500, 5000, 10000, None),
'vect__ngram_range': ((1, 1), (1, 2)),
'vect__use_idf': (True, False),
'vect__norm': ('l1', 'l2'),
'clf__penalty': ('l1', 'l2'),
'clf__C': (0.01, 0.1, 1, 10)
}
if __name__ == '__main__':
grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='accuracy', cv = 3)
df = pd.read_csv('SMS Spam Collection/SMSSpamCollection', delimiter='\t', header=None)
lb = LabelBinarizer()
X, y = df[1], np.array([number[0] for number in lb.fit_transform(df[0])])
X_train, X_test, y_train, y_test = train_test_split(X, y)
grid_search.fit(X_train, y_train)
print('Best score: ', grid_search.best_score_)
print('Best parameter set:')
best_parameters = grid_search.best_estimator_.get_params()
for param_name in sorted(best_parameters):
print(param_name, best_parameters[param_name])
包与python-3.4进行网格搜索,
Fitting 3 folds for each of 1536 candidates, totalling 4608 fits
Traceback (most recent call last):
File "/home/xiangru/PycharmProjects/machine_learning_note_with_sklearn/grid search.py", line 36, in <module>
grid_search.fit(X_train, y_train)
File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 732, in fit
return self._fit(X, y, ParameterGrid(self.param_grid))
File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 493, in _fit
base_estimator = clone(self.estimator)
File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 47, in clone
new_object_params[name] = clone(param, safe=False)
File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
return estimator_type([clone(e, safe=safe) for e in estimator])
File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
return estimator_type([clone(e, safe=safe) for e in estimator])
File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
return estimator_type([clone(e, safe=safe) for e in estimator])
File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
return estimator_type([clone(e, safe=safe) for e in estimator])
File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 45, in clone
new_object_params = estimator.get_params(deep=False)
TypeError: get_params() missing 1 required positional argument: 'self'
但是,它无法成功运行,错误消息如下所示:
if __name__ == '__main__':
pipeline.get_params()
我也尝试过只使用
$(function() {
var $articles = $('.article');
$(".langButton").click(function() {
var language = $(this).attr("data-language");
$articles.hide(); // Hide them all
$("." + language).show(); // than show the needed ones
});
});
function sortUsingNestedText(parent, childSelector, keySelector) {
var items = parent.children(childSelector).sort(function(a, b) {
var vA = $(keySelector, a).text();
var vB = $(keySelector, b).text();
return (vA < vB) ? -1 : (vA > vB) ? 1 : 0;
});
parent.append(items);
}
$('#sEthnicity').data("sortKey", "span.article ethnicity");
$('#sGender').data("sortKey", "span.article gender");
$('#sPet').data("sortKey", "span.article pet");
$('#sSubject').data("sortKey", "span.article subject");
$("button.langButton").click(function() {
sortUsingNestedText($('#sortThis'), "div", $(this).data("sortKey"));
});
它给出了相同的错误消息。 谁知道如何解决这个问题?
答案 0 :(得分:28)
此错误几乎总是具有误导性,实际意味着您在类上调用实例方法,而不是实例(如调用dict.keys()
而不是{{ 1}}在名为d.keys()
的{{1}}上。 *
这正是这里发生的事情。 The docs暗示dict
属性(如初始值设定项的d
参数)不是估算器实例,它是估算器输入和&#34;为每个网格点实例化该类型的对象。&#34;
因此,如果要调用方法,则必须为某些特定网格点构造该类型的对象。
然而,通过快速浏览一下文档,如果您试图获取用于最佳估算器的特定实例的参数,并且返回最佳分数,那么就不会是best_estimator_
? (我很抱歉这部分有点猜测......)
对于estimator
电话,你肯定有一个实例。并且该方法的唯一documentation是一个参数规范,它表明它需要一个可选参数best_params_
。但在幕后,它可能会将Pipeline
调用转发给其中一个属性。使用deep
,看起来您正在使用类 get_params()
构建它,而不是该类的实例,所以如果那是什么它最终转发到,这将解释问题。
*错误的原因&#34;缺少1个必需的位置参数:&#39; self&#39;&#34;而不是&#34;必须在实例上调用&#34;或者在Python中的某些东西,('clf', LogisticRegression)
被有效地转换为LogisticRegression
,并且明确地以这种方式调用它是完全合法的(有时是有用的),因此Python无法真正实现告诉你d.keys()
是非法的,只是因为它错过了dict.keys(d)
参数。
答案 1 :(得分:17)
我终于解决了问题。原因正如abarnert所说的那样。
首先我尝试了:
pipeline = LogisticRegression()
parameters = {
'penalty': ('l1', 'l2'),
'C': (0.01, 0.1, 1, 10)
}
并且效果很好。
凭借这种直觉,我将管道修改为:
pipeline = Pipeline([
('vect', TfidfVectorizer(stop_words='english')),
('clf', LogisticRegression())
])
请注意()
之后有LogisticRegression
。
这次是有效的。
答案 2 :(得分:-1)
更改 LogisticRegression
pipeline = Pipeline([
('vect', TfidfVectorizer(stop_words='english')),
('clf', LogisticRegression)
])
到
LogisticRegression()
问题就解决了。