当我尝试运行以下代码的最后一点时,我收到一个错误,我无法弄清楚原因。
import random
combined_list = h_sub_text + s_sub_text
print(len(combined_list))
random.shuffle(combined_list)
training_part = int(len(combined_list) * .7)
print(len(combined_list))
training_set = combined_list[:training_part]
test_set = combined_list[training_part:]
print (len(train_set))
print (len(test_set))
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
classifier = NaiveBayesClassifier.train(train_set)
accuracy = nltk.classify.util.accuracy(classifier, test_set)
print("Accuracy is: ", accuracy * 100)
我收到此错误:
ValueError Traceback (most recent call last)
<ipython-input-57-151936e75238> in <module>()
2 from nltk.classify import NaiveBayesClassifier
----> 4 classifier = NaiveBayesClassifier.train(training_set)
C:\Program Files (x86)\Anaconda3\lib\site-packages\nltk\classify\naivebayes.py in train(cls, labeled_featuresets, estimator)
--> 194 for featureset, label in labeled_featuresets:
195 label_freqdist[label] += 1
196 for fname, fval in featureset.items():
ValueError: too many values to unpack (expected 2)
提前致谢。
答案 0 :(得分:1)
问题的根源是传递给 NaiveBayesClassifier.train()的 train_set 的值。真的知道我们应该知道它的样子。 无论它是什么导致“nltk”模块出错。
来自http://www.nltk.org/_modules/nltk/classify/naivebayes.html的NLTK源代码:
@classmethod def train(cls, labeled_featuresets, estimator=ELEProbDist): """ :param labeled_featuresets: A list of classified featuresets, i.e., a list of tuples ``(featureset, label)``.
train()的参数是元组列表。因此,如果在预期只有2个时尝试解包太多值时会出现错误,那就不是你传入的错误。无论是普通数组还是大于2的数组数组。