找到正确和错误分类的数据

时间:2018-02-07 10:34:57

标签: python machine-learning scikit-learn data-science text-classification

我想找到成功分类的原始数据,并且在应用Multiomial Nieves Bayes分类算法后不进行分类。 例如,在应用Multinomail Naives Bayes分类后,我的准确率为88%。 我想知道12%的未分类数据以及88%的分类数据。  提前致谢

我的数据集:

+----------------------+------------+
| Details              | Category   |
+----------------------+------------+
| Any raw text1        | cat1       |
+----------------------+------------+
| any raw text2        | cat1       |
+----------------------+------------+
| any raw text5        | cat2       |
+----------------------+------------+
| any raw text7        | cat1       |
+----------------------+------------+
| any raw text8        | cat2       |
+----------------------+------------+
| Any raw text4        | cat4       |
+----------------------+------------+
| any raw text5        | cat4       |
+----------------------+------------+
| any raw text6        | cat3       |
+----------------------+------------+

我的代码:

import pandas as pd
import numpy as np
import scipy as sp
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt  
from sklearn.model_selection import train_test_split 
data= pd.read_csv('mydat.xls', delimiter='\t',usecols=
['Details','Category'],encoding='utf-8')
target_one=data['Category']
target_list=data['Category'].unique()         
x_train, x_test, y_train, y_test = train_test_split(data.Details, 
data.Category, random_state=42)
vect = CountVectorizer(ngram_range=(1,2))
#converting traning features into numeric vector
X_train = vect.fit_transform(x_train.values.astype('U'))
#converting training labels into numeric vector
X_test = vect.transform(x_test.values.astype('U'))
# start = time.clock()

mnb = MultinomialNB(alpha =0.13)

mnb.fit(X_train,y_train)

result= mnb.predict(X_test)


# mnb.predict_proba(x_test)[0:10,1]
accuracy_score(result,y_test)

1 个答案:

答案 0 :(得分:0)

只需迭代您的数据:

XamlReader

你可以将它们添加到两个不同的列表中,或者只打印id或做你想做的任何事情。