我创建了一个模型,我在其中运行Naive Bayes以获得预期的输出。
from textblob.classifiers import NaiveBayesClassifier as NBC
from textblob import TextBlob
training_corpus = [
('Agree Completely Agree Strongly Agree Somewhat Disagree Somewhat Disagree Strongly Completely Disagree','TRUE'),
('Concerned 2 3 4 5 6 7 - Comfortable','TRUE'),
('1 - disagree strongly 2 - disagree somewhat 3 - neither agree nor disagree 4 - agree somewhat 5 - agree strongly','TRUE'),
('1 - doesn\'t apply at all 2 3 4 5 6 7 - applies completely','TRUE'),
('1 - extremely new and different 2 3 4 5 6 7 - not at all new & different','TRUE'),
('1 - extremely relevant 2 3 4 5 6 7 - not at all relevant','TRUE'),
('1 - I don\'t want brands to engage with me at all on social media 2 3 4 5 6 7 - I love to engage with brands on social media','TRUE'),
('1 - Most Important 2 3 4 5 - Least Important','TRUE'),
('pepsi','FALSE'),
('coca cola','FALSE'),
('hyundai','FALSE'),
('Audio quality','FALSE'),
('Product features ','FALSE'),
('Content ','FALSE')
]
test_corpus = [
('1 - Agree Completely 2 - Agree Strongly 3 - Agree Somewhat 4 - Disagree Somewhat 5 - Disagree Strongly 6 - Completely Disagree','TRUE'),
('1 - Concerned 2 3 4 5 6 7 - Comfortable','TRUE'),
('Content ','FALSE'),
('Ease of navigation','FALSE')
]
model = NBC(training_corpus)
print(model.classify('pepsi'))
print(model.accuracy(test_corpus)*100)
当我运行此代码时,它显示100%的效率,但每次都返回FALSE。我不确定是什么问题,但这不是预期的输出。
答案 0 :(得分:0)
您的型号还可以,它只是您的数据和分类器 我的意思是通过训练您提供的数据,它运作良好,让我们进行一些测试:
def test(s):
prob_dist = model.prob_classify(s)
print("classifiying", s)
print("possibility of being FALSE:", round(prob_dist.prob("FALSE"), 2),
"possibility of being TRUE:" ,round(prob_dist.prob("TRUE"), 2))
print('-'*70)
test_cases = ['1', '1 - ', '2', '2 3 4 5', '1- 2 3 4 5', 'pepsi', 'coca', 'BMW']
for tc in test_cases:
test(tc)
现在这里是输出,它非常好,
classifiying 1
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0
----------------------------------------------------------------------
classifiying 1 -
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0
----------------------------------------------------------------------
classifiying 2
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0
----------------------------------------------------------------------
classifiying 2 3 4 5
possibility of being FALSE: 0.05 possibility of being TRUE: 0.95
----------------------------------------------------------------------
classifiying 1- 2 3 4 5
possibility of being FALSE: 0.0 possibility of being TRUE: 1.0
----------------------------------------------------------------------
classifiying pepsi
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0
----------------------------------------------------------------------
classifiying coca
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0
----------------------------------------------------------------------
classifiying BMW
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0
--------------------------------------------------------------------
好的,现在你想知道为什么分类器会这样吗? 看看你的代码,你在哪里提到过特征向量? no where,因此它使用默认函数将特征向量提取为explained here。 (你可以看一下source code)
例如,您可以看到模型特征:model.show_informative_features()
>>> Most Informative Features
contains(4) = False FALSE : TRUE = 5.6 : 1.0
contains(3) = False FALSE : TRUE = 5.6 : 1.0
contains(5) = False FALSE : TRUE = 5.6 : 1.0
contains(2) = False FALSE : TRUE = 5.6 : 1.0
contains(1) = False FALSE : TRUE = 3.3 : 1.0
contains(7) = False FALSE : TRUE = 2.4 : 1.0
contains(6) = False FALSE : TRUE = 2.4 : 1.0
contains(at) = False FALSE : TRUE = 1.9 : 1.0
contains(all) = False FALSE : TRUE = 1.9 : 1.0
contains(not) = False FALSE : TRUE = 1.3 : 1.0