我正在尝试运行以下nltk站点中提供的决策树代码 http://www.nltk.org/howto/classify.html
>>> train = [
... (dict(a=1,b=1,c=1), 'y'),
... (dict(a=1,b=1,c=1), 'x'),
... (dict(a=1,b=1,c=0), 'y'),
... (dict(a=0,b=1,c=1), 'x'),
... (dict(a=0,b=1,c=1), 'y'),
... (dict(a=0,b=0,c=1), 'y'),
... (dict(a=0,b=1,c=0), 'x'),
... (dict(a=0,b=0,c=0), 'x'),
... (dict(a=0,b=1,c=1), 'y'),
... ]
>>>
>>>
>>> test = [
... (dict(a=1,b=0,c=1)), # unseen
... (dict(a=1,b=0,c=0)), # unseen
... (dict(a=0,b=1,c=1)), # seen 3 times, labels=y,y,x
... (dict(a=0,b=1,c=0)), # seen 1 time, label=x
... ]
>>>
>>>
>>> import nltk
>>> classifier = nltk.classify.DecisionTreeClassifier.train(train, entropy_cutoff=0, support_cutoff=0)
>>> sorted(classifier.labels())
['x', 'y']
>>> print(classifier)
c=0? .................................................. x
a=0? ................................................ x
a=1? ................................................ y
c=1? .................................................. y
>>> classifier.batch_classify(test)
['y', 'y', 'y', 'x']
>>> for pdist in classifier.batch_prob_classify(test):
... print('%.4f %.4f' % (pdist.prob('x'), pdist.prob('y')))
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "//anaconda/lib/python2.7/site-packages/nltk/classify/api.py", line 87, in batch_prob_classify
return [self.prob_classify(fs) for fs in featuresets]
File "//anaconda/lib/python2.7/site-packages/nltk/classify/api.py", line 67, in prob_classify
raise NotImplementedError()
NotImplementedError
>>>
问题是batch_prob_classify函数的问题。任何人都可以建议如何解决问题以及如何获得概率分布值。
答案 0 :(得分:1)
DecisionTreeClassifier
使用概率类MLEProbDist
,它没有任何prob
方法。另一方面,NaiveBayesClassifier
使用概率等级ELEProbDist
,概率等级继承自LidstoneProbDist
概率等级,做提供prob
方法。
因此,除非您想创建DecisionTreeClassifier
的子类并自行添加prob
方法,否则您可能希望使用NaiveBayesClassifier
代替:
>>> classifier = nltk.classify.NaiveBayesClassifier.train(train) # note the use of NaiveBayesClassifier here
>>> for pdist in classifier.batch_prob_classify(test):
print('%.4f %.4f' % (pdist.prob('x'), pdist.prob('y')))
0.3104 0.6896
0.5746 0.4254
0.3685 0.6315
0.6365 0.3635
正如@Mike指出的那样,你收到了预期的结果。您可能会对页面前面的一个非常类似的例子感到困惑。