网格搜索CV - TypeError:单例数组数组(None,dtype = object)不能被视为有效集合

时间:2017-12-02 20:37:56

标签: scikit-learn

我正在尝试执行网格搜索CV,我收到以下错误:

TypeError:单例数组数组(None,dtype = object)不能被视为有效集合。

不确定导致这种情况的原因 - 我们非常感谢任何有关错误的帮助。

代码如下:

# text pipeline
text_steps = [('feature extractor', SelectColumnsTransfomer(text_features)),
              ('tf-idf', Tfidf),
              ('classifier', MLPclf)]

# define steps
pl_text = Pipeline(text_steps)

parameters = {
    'tf-idf__max_df': (0.5, 0.75, 1.0),
    'tf-idf__max_features': (5000, 10000, 50000),
    'tf-idf__ngram_range': ((1, 1), (1, 2)),  # unigrams or bigrams
    'classifier__alpha': (0.00001, 0.000001, 1e-05),
    'classifier__hidden_layer_sizes': ((1, 5), (1, 5, 10))
}

# find the best parameters for both the feature extraction and the
# classifier
grid_search = GridSearchCV(pl_text, parameters, verbose=1, refit=True)
grid_search.fit(X_train)

SelectColumnsTransfer class:

class SelectColumnsTransfomer(BaseEstimator, TransformerMixin):
    """ A DataFrame transformer that provides column selection

Allows to select columns by name from pandas dataframes in scikit-learn
pipelines.

Parameters
----------
columns : list of str, names of the dataframe columns to select
    Default: []

"""

def __init__(self, columns=[]):
    self.columns = columns

def transform(self, X, **transform_params):
    """ Selects columns of a DataFrame

    Parameters
    ----------
    X : pandas DataFrame

    Returns
    ----------

    trans : pandas DataFrame
        contains selected columns of X
    """
    trans = X[self.columns].copy()
    return trans

def fit(self, X, y=None, **fit_params):
    """ Do nothing function

    Parameters
    ----------
    X : pandas DataFrame
    y : default None


    Returns
    ----------
    self
    """
    return self

Dataframe具有以下功能:

地区性别年龄文字

0 个答案:

没有答案