TL; DR

Question

我是编程新手，但一遍又一遍地查看我的代码，看不出任何错误。我不知道如何继续进行，因为无论我尝试什么，这个错误都会弹出。我会在这里发布完整的代码。

非常感谢任何帮助，谢谢！

Traceback (most recent call last):
  File "code/test.py", line 109, in <module>
    print("voted_classifier accuracy percent:", (nltk.classify.accuracy(voted_classifier, testing_set))*100)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/classify/util.py", line 87, in accuracy
    results = classifier.classify_many([fs for (fs, l) in gold])
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/classify/api.py", line 77, in classify_many
    return [self.classify(fs) for fs in featuresets]
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/classify/api.py", line 56, in classify
    raise NotImplementedError()
NotImplementedError

我还尝试在顶部的类上引发NotImplementedError异常，但它没有改变Python中的输出。

这是错误：

value_freq = {v:[h[0] for h in hand].count(v) for v in set([h[0] for h in hand])}
clean_hand = [h for h in hand if value_freq[h[0]]!=4]

Answer 1

正如评论中所述，ClassiferI api中有一些不好的意外，例如classify api，当重写时classify_many调用ClassifierI。在考虑NaiveBayesClassifier与nltk.classify.util.accuracy()对象紧密相关时，这可能不是一件坏事。

但是对于OP中的特殊用途，意大利面条代码并不受欢迎。

TL; DR

查看https://www.kaggle.com/alvations/sklearn-nltk-voteclassifier

在长

从回溯中，错误从ClassifierI.classify()调用ClassifierI.classify()开始。

ClassifierI.classify_many()通常用于对 ONE 文档进行分类，输入是具有二进制值的featureset字典。

accuracy()应该对 MULTIPLE 文档进行分类，输入是具有二进制值的featureset字典列表。

所以快速破解是覆盖VotedClassifier函数的方式，以便ClassifierI不会依赖于classify()对classify_many()的{{1}}定义1}}。这也意味着我们不会继承ClassifierI。恕我直言，如果您不需要classify()以外的其他功能，则无需继承ClassifierI可能附带的行李：

def my_accuracy(classifier, gold):
    documents, labels = zip(*gold)
    predictions = classifier.classify_documents(documents)
    correct = [y == y_hat for y, y_hat in zip(labels, predictions)]
    if correct:
        return sum(correct) / len(correct)
    else:
        return 0

class VotraClassifier:
    def __init__(self, *classifiers):
        self._classifiers = classifiers

    def classify_documents(self, documents):
        return [self.classify_many(doc) for doc in documents]

    def classify_many(self, features):
        votes = []
        for c in self._classifiers:
            v = c.classify(features)
            votes.append(v)
        return mode(votes)

    def confidence(self, features):
        votes = []
        for c in self._classifiers:
            v = c.classify(features)
            votes.append(v)

        choice_votes = votes.count(mode(votes))
        conf = choice_votes / len(votes)
        return conf

现在，如果我们使用新的my_accuracy()对象调用新VotedClassifier：

voted_classifier = VotraClassifier(nltk_nb, 
                                  NuSVC_classifier,
                                  LinearSVC_classifier,
                                  SGDClassifier_classifier,
                                  MNB_classifier,
                                  BernoulliNB_classifier,
                                  LogisticRegression_classifier)

my_accuracy(voted_classifier, testing_set)

[OUT]：

0.86

注意：在改变文档然后拿出一组来测试分类器的准确性时，有一定的随机性。

我的建议是执行以下操作，而不是简单的random.shuffle(documents)

用各种随机种子重复实验。
对于每个随机种子，进行10倍交叉验证。

如何从nltk.classify ClassifierI解决NotImplementedError？

我还尝试在顶部的类上引发NotImplementedError异常，但它没有改变Python中的输出。

1 个答案:

TL; DR

在长