我正在努力获得情绪分析的正面和负面预测的精确度,回忆率和F度量。 我使用python 3.6如下:
import nltk
from nltk.tokenize import word_tokenize
# Trainng data
train = [('I love this sandwich.', 'pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg') ]
# Test data
test = [('The beer was good.', 'pos'),
('I do not enjoy my job', 'neg'),
("I ain't feeling dandy today.", 'neg'),
("I feel amazing!", 'pos'),
('Gary is a friend of mine.', 'pos'),
("I can't believe I'm doing this.", 'neg') ]
# Tokenize Training words
Training_words = set(word.lower() for passage in train for word in word_tokenize(passage[0]))
# Training feature sets
training_set = [({word: (word in word_tokenize(x[0])) for word in Training_words}, x[1]) for x in train]
# Tokenize Test words
Test_words = set(word.lower() for passage in test for word in word_tokenize(passage[0]))
# Test feature sets
test_set= [({word: (word in word_tokenize(x[0])) for word in Test_words}, x[1]) for x in test]
# Naive Bayes classifier
classifier = nltk.NaiveBayesClassifier.train(training_set)
# Informative Features
classifier.show_most_informative_features()
# print the accuracy
print("accuracy %",(nltk.classify.accuracy(classifier, test_set))*100)
上面的代码显示了Naive Bayes分类器的信息特征和准确性。 我尝试了下面的代码来获得正面和负面预测的精确度,回忆率和F度量。
from nltk.metrics.scores import (precision, recall)
import collections
import nltk.metrics
refsets = collections.defaultdict(set)
testsets = collections.defaultdict(set)
for i, (feats, label) in enumerate(test_set):
refsets[label].add(i)
observed = classifier.classify(feats)
testsets[observed].add(i)
print ('NB pos precision %', nltk.metrics.precision(refsets['pos'], testsets['pos'])*100)
print ('NB pos recall %', nltk.metrics.recall(refsets['pos'], testsets['pos'])*100)
print ('NB pos F-measure %', nltk.metrics.f_measure(refsets['pos'], testsets['pos'])*100)
print ('NB neg precision %', nltk.metrics.precision(refsets['neg'], testsets['neg'])*100)
print ('NB neg recall %', nltk.metrics.recall(refsets['neg'], testsets['neg'])*100)
print ('NB neg F-measure %', nltk.metrics.f_measure(refsets['neg'], testsets['neg'])*100)
我需要你的肝脏..