假设我想知道一个人有多少宗教信仰,而不是将他归类为religious
或non-religious
我想知道他有宗教信仰的可能性。所以,我使用Naive Bayes
工具包创建了一个简单的nltk
分类器。但它似乎不起作用,我得到两个测试样本的概率为100%。
训练集大小= 5
1.0
1.0
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
dataPos = [('god jesus god church'), ('buddha enlightenment love god'),
('jesus love krishna'), ('Hare krishna Hare Krishna love'),
('god jesus church love')]
def create_word_features(words):
useful_words = [word for word in words if word not in stopwords.words("english")]
my_dict = dict([(word, True) for word in useful_words])
return my_dict
pos_views = []
for item in dataPos:
words = item.split(' ')
pos_views.append((create_word_features(words), "positive"))
train_set = pos_views[:]
print( 'Train Set Size = %d' %(len(train_set)) )
# Train
classifier = NaiveBayesClassifier.train(train_set)
# Testing Sample 1
person1 = '''
Love god krishna jesus
'''
words = word_tokenize(person1)
words = create_word_features(words)
prob_dist = classifier.prob_classify(words)
print(prob_dist.prob("positive"))
# Testing Sample 2
person2 = '''
I hate god hate
'''
words = word_tokenize(person2)
words = create_word_features(words)
prob_dist = classifier.prob_classify(words)
print(prob_dist.prob("positive"))
我认为问题在于只有一个班级,但我想训练分类器的方式可以让任何人告诉我如何解决这个问题。