如何使用sklearn.pipeline转换新数据

时间:2017-11-14 03:51:54

标签: python-2.7 scikit-learn tf-idf multilabel-classification

我已经使用TfIdfVectorizer变换器和OnevsRestClassifier估算器创建了一个管道,并按照以下方式对培训数据进行了培训

# Split data using train_test_split
print "Split data into train and test sets"
x_train, x_test, y_train, y_test = train_test_split(
    data_x, data_y, test_size=0.33)

# transform matrix of plots into lists to pass to a TfidfVectorizer
train_x = [x[0].strip() for x in x_train.tolist()]
test_x = [x[0].strip() for x in x_test.tolist()]

# Pipeline fit and transform
print "Learn the model using train data"
model = text_clf.fit(train_x, y_train)

# Predict the test data
print "Predict the recipients on test data"
predictions = model.predict(test_x)

现在,我想使用训练过的模型来预测新的未标记数据的类。 我试过这个并收到错误

# Read text from input
text = raw_input()
print "Input : ", text
new_data = text_clf.transform([text])
predict = model.predict(new_data) 

这是错误。我做错了什么?

AttributeError: 'OneVsRestClassifier' object has no attribute 'transform'

1 个答案:

答案 0 :(得分:1)

如果text_clfmodel是您建议的管道,则无需调用transform然后进行预测。只能打电话

predictions = model.predict([text]) 

管道将在内部自动将数据转换为可用格式(在中间变换器上使用transform())。

当您显式调用model.transform()时,管道假定管道内的所有估算器都有一个transform(),而这不是这里的情况。