错误讯息:
追踪(最近一次通话): File" /Users/ABHINAV/Documents/test2.py" ;,第58行,在 classifier = NaiveBayesClassifier.train(trainfeats) 火车" /Library/Python/2.7/site-packages/nltk/classify/naivebayes.py" ;,第194行,在火车上 对于featureset,labeled_featuresets中的标签: ValueError:要解压缩的值太多 [在17.0s完成,退出代码为1]
当我尝试在一组数据上实现朴素贝叶斯时,我收到此错误。这是代码:
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews
def word_feats(words):
return dict([(word, True) for word in words])
negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')
negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]
negcutoff = len(negfeats)*3/4
poscutoff = len(posfeats)*3/4
trainfeats=[('good'),('pos'),
('quick'),('pos'),
('easy'),('pos'),
('big'),('pos'),
('iterested'),('pos'),
('important'),('pos'),
('new'),('pos'),
('patient'),('pos'),
('few'),('neg'),
('bad'),('neg'),
]
test=[
('general'),('pos'),
('many'),('pos'),
('efficient'),('pos'),
('great'),('pos'),
('interested'),('pos'),
('top'),('pos'),
('easy'),('pos'),
('big'),('pos'),
('new'),('pos'),
('wonderful'),('pos'),
('important'),('pos'),
('best'),('pos'),
('more'),('pos'),
('patient'),('pos'),
('last'),('pos'),
('worse'),('neg'),
('terrible'),('neg'),
('awful'),('neg'),
('bad'),('neg'),
('minimal'),('neg'),
('incomprehensible'),('neg'),
]
classifier = NaiveBayesClassifier.train(trainfeats)
print 'accuracy:', nltk.classify.util.accuracy(classifier, test)
classifier.show_most_informative_features()
答案 0 :(得分:2)
<强> TLDR 强>
你需要这个:
trainfeats=[('good','pos'),
('quick','pos'),
...
而不是:
trainfeats=[('good'),('pos'),
('quick'),('pos'),
...
<强>解释强>
ValueError: too many values to unpack
内的关键错误是NaiveBayesClassifier.train
,您可以在此行中调用:
classifier = NaiveBayesClassifier.train(trainfeats)
&#39;解包的价值太多&#39;意味着程序期望迭代中有一定数量的值,并且它接收的数量超过该数量。例如,从您的错误消息中,该行引发了错误:
for featureset, label in labeled_featuresets:
这个for循环期望事物的对被标记为“感觉集”,并且它会将该对中的一个成员分配给featureset
,和label
的一名成员。如果labeled_featuresets
实际上有三元组,例如[(1,2,3),(1,2,3)...]然后程序不知道如何处理第三个元素,所以它会抛出错误。
以下是您传入该功能的内容,我认为该内容最终为labeled_featuresets
:
trainfeats=[('good'),('pos'),
('quick'),('pos'),
('easy'),('pos'),
...
您似乎正在尝试通过将该列表中的项目缩进为成对来创建元组列表(这可以防止您获得的错误),但是那些&#39> 。 Python不会使用缩进来推断元组,只有括号。我认为这就是你的目标:
trainfeats=[('good','pos'),
('quick','pos'),
('easy','pos'),
...
用括号括起每对,创建一个元组列表而不是单个元素列表。
答案 1 :(得分:0)
trainfeat
变量应为:
trainfeats=[({'good':True,'quick':True,'easy':True,
'big':True,'interested':True,'important':True,
'new':True,'patient':True},'pos'),({'few':True,'bad':True},'neg')]
这是nltk中标记功能集的正确格式。
类似地,测试变量应为:
test=[({'general':True,'many':True,'efficient':True,'great':True,'interested':True,'top':True,'easy':True,'big':True,'new':True,'wonderful':True,'important':True,'best':True,'more':True,'patient':True,'last':True},'pos'),({'worse':True,'terrible':True,'awful':True,'bad':True,'minimal':True,'incomprehensible':True},'neg')]