我正在尝试使用nltk在python中创建一个简单的最大熵分类器。
我知道特征权重应该是:
柏林->德国:3.72
柏林->美国:-3.72
汉堡->德国:3.72
汉堡->美国:-3.72
纽约->德国:-7.44
纽约->美国:7.44
但是使用nltk会得到不同的结果。有人可以帮我吗?
import nltk, nltk.classify.util, nltk.metrics
from nltk.classify import MaxentClassifier
def word_feats(words):
return dict([(word, True) for word in words])
trainfeats = []
trainfeats = trainfeats + [(word_feats(['New York']), 'America')]
trainfeats = trainfeats + [(word_feats(['Berlin', 'Hamburg']), 'Germany')]
trainfeats = trainfeats + [(word_feats(['New York', 'Berlin', 'Hamburg',]), 'America')]
trainfeats = trainfeats + [(word_feats(['New York', 'Berlin', 'Hamburg']), 'Germany')]
algorithm = MaxentClassifier.ALGORITHMS[0]
classifier = MaxentClassifier.train(trainfeats, algorithm="iis",max_iter=1000)
classifier.show_most_informative_features(10)
Output:
997 -0.28857 0.750
998 -0.28857 0.750
999 -0.28857 0.750
Final -0.28857 0.750
-5.464 New York==True and label is 'Germany'
4.086 New York==True and label is 'America'
-2.965 Berlin==True and label is 'America'
-2.965 Hamburg==True and label is 'America'
1.810 Berlin==True and label is 'Germany'
1.810 Hamburg==True and label is 'Germany'