Question

我正在尝试使用nltk在python中创建一个简单的最大熵分类器。

我知道特征权重应该是：

柏林->德国：3.72

柏林->美国：-3.72

汉堡->德国：3.72

汉堡->美国：-3.72

纽约->德国：-7.44

纽约->美国：7.44

但是使用nltk会得到不同的结果。有人可以帮我吗？

import nltk, nltk.classify.util, nltk.metrics
from nltk.classify import MaxentClassifier

def word_feats(words):
 return dict([(word, True) for word in words])

trainfeats = []
trainfeats = trainfeats + [(word_feats(['New York']), 'America')]
trainfeats = trainfeats + [(word_feats(['Berlin', 'Hamburg']), 'Germany')]
trainfeats = trainfeats + [(word_feats(['New York', 'Berlin', 'Hamburg',]), 'America')]
trainfeats = trainfeats + [(word_feats(['New York', 'Berlin', 'Hamburg']), 'Germany')]

algorithm  = MaxentClassifier.ALGORITHMS[0]
classifier = MaxentClassifier.train(trainfeats, algorithm="iis",max_iter=1000)

classifier.show_most_informative_features(10)

Output:

           997          -0.28857        0.750
           998          -0.28857        0.750
           999          -0.28857        0.750
         Final          -0.28857        0.750

  -5.464 New York==True and label is 'Germany'
   4.086 New York==True and label is 'America'
  -2.965 Berlin==True and label is 'America'
  -2.965 Hamburg==True and label is 'America'
   1.810 Berlin==True and label is 'Germany'
   1.810 Hamburg==True and label is 'Germany'

Python Maxent分类器功能权重

0 个答案: