我想尝试不同的管道配置进行文本分类。
我做了这个
pipe = Pipeline([('c_vect', CountVectorizer()),('feat_select', SelectKBest()),
('ridge', RidgeClassifier())])
parameters = {'c_vect__max_features': [10, 50, 100, None],
'feat_select__score_func': [chi2, f_classif, mutual_info_classif, SelectFdr, SelectFwe, SelectFpr],
'ridge__solver': ['sparse_cg', 'lsqr', 'sag'], 'ridge__tol': [1e-2, 1e-3], 'ridge__alpha': [0.01, 0.1, 1]}
gs_clf = GridSearchCV(pipe, parameters, n_jobs=5)
gs_clf = gs_clf.fit(clean_train_data, train_labels_list)
但是我得到了这个错误,即使SelectFdr应该是可用的功能选择功能之一,根据SelectKBest的文档:http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html
Traceback (most recent call last):
File ".../anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.p
y", line 350, in __call__
return self.func(*args, **kwargs)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 1
31, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File ".../anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 1
31, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File ".../anaconda3/lib/python3.5/site-packages/sklearn/model_selection/_validation.py", line
437, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/pipeline.py", line 257, in fit
Xt, fit_params = self._fit(X, y, **fit_params)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/pipeline.py", line 222, in _fit
**fit_params_steps[name])
File ".../anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/memory.py", line 362
, in __call__
return self.func(*args, **kwargs)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/pipeline.py", line 589, in _fit_trans
form_one
res = transformer.fit_transform(X, y, **fit_params)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/base.py", line 521, in fit_transform
return self.fit(X, y, **fit_params).transform(X)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/feature_selection/base.py", line 76,
in transform
mask = self.get_support()
File ".../anaconda3/lib/python3.5/site-packages/sklearn/feature_selection/base.py", line 47,
in get_support
mask = self._get_support_mask()
File ".../anaconda3/lib/python3.5/site-packages/sklearn/feature_selection/univariate_selectio
n.py", line 503, in _get_support_mask
scores = _clean_nans(self.scores_)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/feature_selection/univariate_selectio
n.py", line 30, in _clean_nans
scores = as_float_array(scores, copy=True)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py", line 93, in as_
float_array
return X.astype(return_dtype)
TypeError: float() argument must be a string or a number, not 'SelectFdr'
知道为什么会这样吗?
答案 0 :(得分:1)
SelectFdr,SelectFwe,SelectFpr是类,如SelectKBest。他们没有得分功能。
可用的评分函数为given in documentation:
For regression: f_regression, mutual_info_regression For classification: chi2, f_classif, mutual_info_classif
默认情况下,这些类(SelectFdr,SelectFwe,SelectFpr)使用评分函数f_classif
。所以你需要从参数中删除它们。
如果您想使用它们:您可以像这样更改参数网格:
parameters = {'c_vect__max_features': [10, 50, 100, None],
'feat_select':[SelectKBest(), SelectFdr(), SelectFwe(), SelectFdr()]
'feat_select__score_func': [chi2, f_classif, mutual_info_classif],
'ridge__solver': ['sparse_cg', 'lsqr', 'sag'],
'ridge__tol': [1e-2, 1e-3], 'ridge__alpha': [0.01, 0.1, 1]}
注意新的参数&#34; feat_select&#34; 。是的,您甚至可以在发送到GridSearchCV时更改管道内的变换器对象。希望这可以帮助。请问是否有任何疑问。