我正在使用Scikit-learn进行文本分类。我使用Naives贝叶斯分类将非结构化文本(下面数据集中的详细信息列)分类为一组标记目标(类别),我得到了测试数据的准确性,但有些人可以告诉我如何打印,类别每个非结构化文本(来自下面数据集中的详细信息列)属于?
以下是我的示例数据集的样子。
Details |Category
-------------------------------------------------------------
Tanishq Jwellery Bangalore |jwellery
ODESK***BAL-28APR13 |Others
AEGON RELIGARE LIFE IN |Others
INTERNET PAYMENT #999999 |Transfer in for Card Payment
WWW.VISTAPRINT.IN |Others
Khazana Jwellery |jwellery
INTERNET PAYMENT #999999 |Transfer in for Card Payment
Indian Oil |Fuel
Touch foot wear |Clothing
import pandas as pd
import numpy as np
import scipy as sp
from sklearn.model_selection import train_test_split
u_cols = ['Details','Category']
k= pd.read_csv('mydatset.csv', delimiter='\t',usecols = u_cols)
data=k[1:1000]
target_one=data['Category']
from sklearn.cross_validation import train_test_split
def train(classifier, X, y):
X_train, X_test, y_train, y_test =
train_test_split(data.Details.values.astype('U'), target_one,
test_size=0.50, random_state=33)
classifier.fit(X_train, y_train)
print ("Accuracy: %s" % classifier.score(X_test, y_test))
return classifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
trial1 = Pipeline([('vectorizer', TfidfVectorizer()),
('classifier', MultinomialNB())])
train(trial1, data.Details.values.astype('U'), target_one)
答案 0 :(得分:0)
您需要将返回的分类器存储到某个对象,然后在其上调用predict()。 像这样:
trained_clf = train(trial1, data.Details.values.astype('U'), target_one)
predictions = trained_clf.predict(unstructured_data.Details.values.astype('U'))
更新: - 如果您正在讨论train()函数内部的准确性,那么您可以使用以下内容打印预测类别:
def train(classifier, X, y):
...
...
print ("Accuracy: %s" % classifier.score(X_test, y_test))
print ("Predictions: %s" % classifier.predict(X_test)
return classifier