我试图用自己的FeatureSelector和TF-IDF矢量化器创建sklearn管道。但是没有成功。
class FeatureSelector(BaseEstimator, TransformerMixin):
def __init__(self, feature_names):
self.feature_names = feature_names
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.feature_names]
tfidf_vect = TfidfVectorizer(max_features=50000,ngram_range=(1,2))
feature_pipeline = make_pipeline( (FeatureSelector(['text']) ))
full_pipeline = Pipeline( steps = [( 'feature_pipeline', feature_pipeline ),('tfidf',tfidf_vect),('clf',SVM)])
full_pipeline.fit(train_x,y_train)
它向我显示以下错误。
“ ValueError:找到样本数量不一致的输入变量:[1,30597]”