Question

在尝试使用新的ColumnTransformer功能时，我尝试使用SKLearn 0.20.2制作管道。我的问题是我不断收到错误消息：

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

我有一列称为text的文本。我所有其他专栏本质上都是数字。我正在尝试在管道中使用Countvectorizer，但我认为这是麻烦所在。对此非常感谢。

from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
# plus other necessary modules

# mapped to column names from dataframe
numeric_features = ['hasDate', 'iterationCount', 'hasItemNumber', 'isEpic']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median'))
])

# mapped to column names from dataframe
text_features = ['text']
text_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent”')),
    ('vect', CountVectorizer())
])

preprocessor = ColumnTransformer(
    transformers=[('num', numeric_transformer, numeric_features),('text', text_transformer, text_features)]
)

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', MultinomialNB())
                     ])

x_train, x_test, y_train, y_test = train_test_split(features, labels, test_size=0.33)
clf.fit(x_train,y_train)

Answer 1

@SergeyBushmanov帮助我诊断标题中的错误，这是由于在文本上运行SimpleImputer造成的。

我还有一个错误，我将为此写一个新问题。

带有ColumnTransformer的SKLearn管道：“ numpy.ndarray”对象没有属性“ lower”

1 个答案: