ValueError:解压缩scikitlearn.train的值太多

时间:2014-05-13 07:11:22

标签: python nltk

我现在正在做情绪分析,我想测试一些分类器的准确性。如果我没有将trainset转换为dict,则错误为"AttributeError: 'tuple' object has no attribute 'iterkeys'" 然而,在我将其转换为dict之后,我收到了错误:

Traceback (most recent call last):
  File "E:\Python27\accuracy.py", line 204, in <module>
    print 'BernoulliNB`s accuracy is %f' %score(BernoulliNB())
  File "E:\Python27\accuracy.py", line 200, in score
    classifier.train(trainset)
    File "E:\Python27\lib\site-packages\nltk\classify\scikitlearn.py", line 93, in train
        for fs, label in labeled_featuresets:
    ValueError: too many values to unpack

部分代码:

trainset = extracted_pos_features[50:]+extracted_neg_features[50:]
testset = extracted_pos_features[:50]+extracted_neg_features[:50]
dict1 = {}
for i,j in trainset:
    dict1.setdefault(j,[]).append(i)

trainset = dict1

test, tag_test = zip(*testset)

def score(classifier):
    classifier = SklearnClassifier(classifier)
    classifier.train(trainset)
    pred = classifier.batch_classify(test)
    return accuracy_score(tag_test, pred)

print 'BernoulliNB`s accuracy is %f' %score(BernoulliNB())

dict1中有两个键&nbsp;&#39; neg&#39;和&#39; pos&#39;分别有多个值:

dict1

{'neg': [('tone', 'ultimately'), ('tragedy', 'core'), ('ultimately', 'dulls'), ('update', 'dreary'), ('version', 'looks'), ('voice', 'lack'), ('worst', 'film'), ('yarn', 'eloquent'), ('makes', 'little'), ('makes', 'maryam'), ('remain', 'true'), ('screen', 'time'), ('sluggish', 'time'), ('thesis', 'makes'), ('time', 'machine'), ('true', 'chan'), ('true', 'original'), ('unashamedly', 'makes'), ('time', 'true')], 

'pos': [('rock', 'destined'), ('schwarzenegger', 'van'), ('screenplay', 'curls'), ('segal', 'gorgeously'), ('slice', 'asian'), ('snappy', 'screenplay'), ('somehow', 'pulls'), ('sometimes', 'movies'), ('splash', 'arnold'), ('start', 'emerges'), ('steers', 'snappy'), ('steven', 'segal'), ('top', 'game'), ('trilogy', 'huge'), ('van', 'damme'), ('vision', 'effective'), ('wasabi', 'start'), ('words', 'adequately'), ('cat', 'offers'), ('emerges', 'rare'), ('game', 'offers'), ('offers', 'refreshingly'), ('rare', 'combination'), ('rare', 'issue'), ('offers', 'rare')]}

有谁知道如何修复它? 非常感谢你。

1 个答案:

答案 0 :(得分:0)

这是我在使用dict时忘记在列表中使用items()时所犯的典型错误:

dct = {"aaa": 11, "bbb: 22, "ccc": 33}

for key, val in dct.items():
    print "key", key
    print "val", val

不使用item,迭代器将返回密钥本身,并尝试将其用作列表。

在你的情况下,它试图使用key作为字符串作为字符列表,并且因为你的字符串并不总是只有两个字符,它有不同数量的项(chararacters)来解压缩成两个变量{{1} }