NLTK NaiveBayesClassifier输入格式

时间:2014-08-06 08:46:45

标签: python nltk

我对这个问题感到非常难过。我对python和NLTK比较陌生。我正在尝试制作一个朴素的贝叶斯分类器,我不确定输入应该是元组列表,还是字典或列表,它是两个列表的元组。

以下内容返回AttributeError: 'str' object has no attribute 'items'

错误
[('maggie: just a push button. and the electric car uses sensors to drive itself. \n', 'notending')]

以下格式会返回以下错误AttributeError: 'list' object has no attribute 'items'

[([['the', 'fire', 'chief', 'says', 'someone', 'started', 'the', 'blaze', 'on', 'purpose', 'as', 'a', 'controlled', 'burn', ',', 'but', 'it', 'quickly', 'got', 'out', 'of', 'hand', '.']], 'notending')]

如果我使用字典,我会收到以下错误ValueError: too many values to unpack

{'everyone: bye!': 'ending'}

我将朴素贝叶斯分类器称为classifier = nltk.NaiveBayesClassifier.train(d_train)

我不确定这里有什么问题。非常感谢您的帮助。感谢。

1 个答案:

答案 0 :(得分:6)

from nltk.classify import NaiveBayesClassifier
from nltk.corpus import stopwords
stopset = list(set(stopwords.words('english')))

def word_feats(words):
    return dict([(word, True) for word in words.split() if word not in stopset])

posids = ['I love this sandwich.', 'I feel very good about these beers.']
negids = ['I hate this sandwich.', 'I feel worst about these beers.']
pos_feats = [(word_feats(f), 'positive') for f in posids ]
neg_feats = [(word_feats(f), 'negative') for f in negids ]
print pos_feats
print neg_feats
trainfeats = pos_feats + neg_feats
classifier = NaiveBayesClassifier.train(trainfeats)

看看正面和负面的壮举

[({'I': True, 'love': True, 'sandwich.': True}, 'positive'), ({'I': True, 'feel': True, 'good': True, 'beers.': True}, 'positive')]
[({'I': True, 'hate': True, 'sandwich.': True}, 'negative'), ({'I': True, 'feel': True, 'beers.': True, 'worst': True}, 'negative')]

所以,如果你给出句子'我讨厌一切'分类

print classifier.classify(word_feats('I hate everything'))

您将获得结果为'否定'。