我正在尝试执行网格搜索CV,我收到以下错误:
TypeError:单例数组数组(None,dtype = object)不能被视为有效集合。
不确定导致这种情况的原因 - 我们非常感谢任何有关错误的帮助。
代码如下:
# text pipeline
text_steps = [('feature extractor', SelectColumnsTransfomer(text_features)),
('tf-idf', Tfidf),
('classifier', MLPclf)]
# define steps
pl_text = Pipeline(text_steps)
parameters = {
'tf-idf__max_df': (0.5, 0.75, 1.0),
'tf-idf__max_features': (5000, 10000, 50000),
'tf-idf__ngram_range': ((1, 1), (1, 2)), # unigrams or bigrams
'classifier__alpha': (0.00001, 0.000001, 1e-05),
'classifier__hidden_layer_sizes': ((1, 5), (1, 5, 10))
}
# find the best parameters for both the feature extraction and the
# classifier
grid_search = GridSearchCV(pl_text, parameters, verbose=1, refit=True)
grid_search.fit(X_train)
SelectColumnsTransfer class:
class SelectColumnsTransfomer(BaseEstimator, TransformerMixin):
""" A DataFrame transformer that provides column selection
Allows to select columns by name from pandas dataframes in scikit-learn
pipelines.
Parameters
----------
columns : list of str, names of the dataframe columns to select
Default: []
"""
def __init__(self, columns=[]):
self.columns = columns
def transform(self, X, **transform_params):
""" Selects columns of a DataFrame
Parameters
----------
X : pandas DataFrame
Returns
----------
trans : pandas DataFrame
contains selected columns of X
"""
trans = X[self.columns].copy()
return trans
def fit(self, X, y=None, **fit_params):
""" Do nothing function
Parameters
----------
X : pandas DataFrame
y : default None
Returns
----------
self
"""
return self
Dataframe具有以下功能:
地区性别年龄文字