NLTK - 获取分类器准确性时出错

时间:2018-01-27 20:25:45

标签: python nlp nltk

Python / NLTK相当新的,如果这是一个基本问题,请原谅我。

分类器似乎正在运行/工作正常但在尝试通过ValueError检索准确性时,我遇到了[({xxx})]

当测试集包含在[xxx]中时,这是否与results = classifier.classify_many([fs for (fs, l) in gold]) ValueError: too many values to unpack (expected 2)` 中包含的训练集相关?

错误说明:

 train = [('train', 'train'),
('next train in', 'train'),
('When is the next train', 'train'),
('How long until the next train', 'train'),
("Where is the next train", 'train'),
('dart', 'train'),
('next dart in', 'train'),
('When is the next dart', 'train'),
('How long until the next dart', 'train'),
("Where is the next dart", 'train'),
("Show me where", 'map'),
("Directions to", 'map'),
('map', 'map')]


all_words = set(word.lower() for passage in train for word in word_tokenize(passage[0]))
t = [({word: (word in word_tokenize(x[0])) for word in all_words}, x[1]) for x in train]
classifier = nltk.NaiveBayesClassifier.train(t)
classifier.show_most_informative_features()


test_sentence = 'Whatever my message is, hopefully something about trains'

test_sent_features = {word.lower(): (word in word_tokenize(test_sentence.lower())) for word in all_words}

print(classifier.classify(test_sent_features))
print(nltk.classify.accuracy(classifier, test_sent_features))

代码

{{1}}

我确定有一些简单的东西我可以忽视,但我似乎无法发现它。非常感谢有关此的任何意见,谢谢。

2 个答案:

答案 0 :(得分:1)

在for循环中使用enumerate功能。
for index, item in enumerate(yourlist):

答案 1 :(得分:0)

是的,你做错了。想一想:分类器模块如何能够计算准确度,除非给它答案?

必须使用标记数据列表调用accuracy()函数(“标签”是所需的分类),与调用train()的方式相同。它需要一个完整的列表(不只是一个句子),以便它可以告诉你它计算的答案的百分比是正确的。