在多类贝叶斯分类器中评分精度和召回率

时间:2018-06-11 14:27:46

标签: python machine-learning scikit-learn bayesian

我是机器学习的初学者,我正在尝试应用贝叶斯多类分类算法,然后进行准确度分数。第一部分有效,计算的第二部分不起作用,但我希望能够在分类中计算出程序的精度和召回程度。

import sklearn
import numpy as np
from sklearn import svm
from pprint import pprint
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_files 
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import accuracy_score

labels = ["price", "personal", "delivery", "store", "product"]
# Download data
docs_to_train = load_files('train')

train_X, test_X, train_y, test_y = train_test_split(docs_to_train.data, docs_to_train.target,test_size = 3)

count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(docs_to_train.data)

tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts)
X_train_tf = tf_transformer.transform(X_train_counts)
X_train_tf.shape

tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
X_train_tfidf.shape

# Train Bayes classifier
clf = MultinomialNB().fit(X_train_tfidf, docs_to_train.target)

docs_test = ["the shop is beautiful", "the price is expensive", "they are nice the staff",
         "I loved the shop", "there are not many products", "choose the products on the site",
         "more choice in the store", "the packaging is beautiful", "the packaging is very ugly",
         "lack of products", "there are problems on the site", "the shop is very beautiful",
         "the botique is very beautiful", "I love this store", "the pri are not good", "it's expensive",
         "my package did not arrive", "the delivery time is too long", "the cashiers are nice",
         "the value for money is average", "I have not received my order",
         'very disappointed, and I will not come back', 'the delivery is very long',
         "my package did not always arrive", "help sellers to look for a specific product",
         'all very nice and professional cashiers', 'the saleswoman is disagreeable',
         'she's very well advised', 'a little cheaper']

X_new_counts = count_vect.transform(docs_test)
X_new_tfidf = tfidf_transformer.transform(X_new_counts)

prediction = clf.predict(X_new_tfidf)

# print results: it's works!
for doc, category in zip(docs_test, prediction):
    print('COMMENT:', '%r \nLABEL: %s' % (doc, docs_to_train.target_names[category]))
  

评论:'商店很漂亮'

     

标签:交付

     

评论:“价格昂贵”

     

标签:价格

     

评论:'他们很擅长工作人员'

     

标签:个人

     

评论:“j'aiadorélemagasin”

     

LABEL:store

     

(...)

但得分不起作用

#score
print('Test accuracy is {}'.format(accuracy_score(X_new_tfidf[category], prediction)))

错误:

ValueError                                Traceback (most recent call last)
<ipython-input-34-5e2d40fed3ab> in <module>()
 85 
 86 #score
---> 87 print('Test accuracy is {}'.format(accuracy_score(X_new_tfidf[category], prediction)))
~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)
174 
175     # Compute accuracy for each possible representation
--> 176     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
177     if y_type.startswith('multilabel'):
178         differing_labels = count_nonzero(y_true - y_pred, axis=1)

~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py  in _check_targets(y_true, y_pred)
 69     y_pred : array or indicator matrix
 70     """
---> 71     check_consistent_length(y_true, y_pred)
 72     type_true = type_of_target(y_true)
 73     type_pred = type_of_target(y_pred)

~/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
202     if len(uniques) > 1:
203         raise ValueError("Found input variables with inconsistent numbers of"
--> 204                          " samples: %r" % [int(l) for l in lengths])
205 
206 

ValueError: Found input variables with inconsistent numbers of samples: [1, 29]

我添加了火车数据:

标签:投放

减少时间段的交付

准时交货

通过短信宣布交货日期和大致时间

让别人了解他的订单

减少订单和交货之间的延迟

留意脆弱的包裹

我的包裹仍然不在家

标签:商店

我对商店的组织非常不满意

我对您的商店非常满意

扩大商店

打开其他商店

关于这家商店,一切都很完美

商店陈列的小家具

标签:员工

卖家的友善非常好

更多卖家

有更多的人在场,建议我们

非常欢迎

乐于助人的员工

员工比其他员工更好

星期六有更多工作人员

在那里工作的人很好

卖家非常热情

女售货员不欢迎顾客

标签:价格

价格下跌

降低价格

促销适应我的要求

物有所值更便宜

地毯价格差不多50欧元

不要提高产品价格

它并不昂贵

标签:产品

恢复礼品包装

我本来喜欢礼品包装

有更多股票

检查项目的可用性

家具供应

商店中的更多选择

很多项目不可用

有效的产品

可复制的豁免

import sklearn
import numpy as np
from sklearn import svm
from pprint import pprint
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_files 
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import accuracy_score

X_train = np.array(["prix à baisser",
                "l'amabilité du vendeur était très agréable",
                "baisser les prix",
                "je n'ai pas encore reçu ma commande",
                "plus de vendeur conseil",
                "avoir plus de personnes sur le plancher pour nous conseiller",
                "des promotions adaptées à mes demandes",
                "rapport qualité prix",
                "moins chers",
                "très bonne accueil",
                "durée entre la commande et la livraison assez long",
                "personnel serviable",
                "pratiquement 50 euros le tapis ça fait cher",
                "les employés sont plus gentils les uns que les autres",
                "plus de personnel le samedi",
                "ne pas augmenter le tarif des produits",
                "la remise des produits est en retard",
                "ça vaut le coût",
                "le personnel est parfait mais les produits sont trop chèrs",
                "la somme demandé est correcte et le magasinier était agréable",
                "en plus les gens qu'y travaillent sont sympas",
                "très bon accueil mais le montant des produits sont exorbitant",
                "mon colis n'est pas arrivé"])
y_train_text = [["prix"],["personnel"],["prix"],["livraison"],["personnel"],
            ["personnel"],["prix"],["prix"],["prix"],["livraison"],["personnel"],
            ["personnel"],["prix"],["personnel"],["personnel"],["livraison"],["prix"],["prix"],
            ["personnel", "prix"],["prix", "personnel"],["personnel"],["personnel","prix"],["livraison"]]

X_test = np.array(['un peu moins cher',
               'le passage à la caisse est parfois fort long',
               'il pourrait avoir plus souvent des prix ou offres promotionnels',
               'aide des vendeurs pour chercher un produit spécifique',
               'moins coûteux pour les frais de port',
               'un service de livraison plus compétant',
               'toutes caissières très gentilles et professionnelles',
               'la vendeuse est désagreable',
               'sensibiliser les livreurs aux colis qu?ils transportent',
               'les employés ne connaissaient pas le prix',
               'elle nous a tres bien conseillé'])

count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)

tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts)
X_train_tf = tf_transformer.transform(X_train_counts)
X_train_tf.shape

tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
X_train_tfidf.shape

# Train Bayes classifier
clf = MultinomialNB().fit(X_train_tfidf, y_train_text)


X_new_counts = count_vect.transform(X_test)
X_new_tfidf = tfidf_transformer.transform(X_new_counts)

prediction = clf.predict(X_new_tfidf)

# print results: it's works!
for doc, category in zip(X_test, prediction):
    print('COMMENT:', '%r \nLABEL: %s' % (doc, y_train_text_names[category]))

print('Test accuracy is {}'.format(accuracy_score(X_new_tfidf[category], prediction)))

0 个答案:

没有答案