除了忽略重复的单词之外,有谁可以帮助我理解下面代码中format(sentence)
函数的用途?为什么在每个单词后添加True
?
{'I': True, 'Oh': True, 'yes': True, 'got': True, 'that': True, '.': True}
{'?': True, 'mad': True, 'Are': True, 'you': True}
我们可以将句子传递给分类器而不在每个单词后标记True
吗?
import nltk
from nltk.tokenize import word_tokenize
from nltk.classify import NaiveBayesClassifier
def format(sentence):
return {word:True for word in word_tokenize(sentence)}
s0= 'Oh yes. I got that'
s1= 'Are you mad?'
trainData = [[formatMe(s0),'pos'],[formatMe(s1),'neg']]
model = NaiveBayesClassifier.train(trainData)
print(model.classify(format(s1)))