我是Python的新手,并且我一直在尝试构建Naive Bayes分类器,但是它似乎将垃圾邮件的优先级高于Ham。我知道有很多要问的问题,但是我希望熟悉Naive Bayes的人能指出我做错了什么。附带说明:我跳过了朴素贝叶斯方程的分母部分;一个共同的分母不应该有所作为,对吗?
以下是我遵循的指南的链接:https://towardsdatascience.com/unfolding-na%C3%AFve-bayes-from-scratch-2e86dcae4b01
这是我的代码:
import csv
ham = 0
spam = 0
dictionarySpam = {}
dictionaryHam = {}
dictionaryTotal = {}
userString = input("Enter your string")
userString = userString.replace('.', ' ')
a = "!@#$%^&*()_+=-?,><':;[]/"
userStringList = userString.split()
print(userStringList)
totalNumHam = 0
with open('spam.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
if "ham" in row[0]:
ham += 1
elif "spam" in row[0]:
spam += 1
words = row[1].split()
for word in words:
word = word.replace('.', ' ')
for char in a:
word = word.replace(char, '')
dictionaryTotal[word.lower()] = dictionaryTotal.get(word.lower(), 0) + 1
if "ham" in row[0].lower():
dictionaryHam[word.lower()] = dictionaryHam.get(word.lower(), 0) + 1
elif "spam" in row[0].lower():
dictionarySpam[word.lower()] = dictionarySpam.get(word.lower(), 0) + 1
probHam = 1;
probSpam = 1;
print("HAM cases: ", ham)
print("SPAM cases: ", spam)
print(dictionaryHam)
print(dictionarySpam)
for item in userStringList:
if item in dictionaryHam:
probHam = probHam * ((dictionaryHam[item] + 1) / (sum(dictionaryHam.values()) + len(dictionaryTotal) + 1))
elif item not in dictionaryHam:
probHam = probHam * (1 / (sum(dictionaryHam.values()) + len(dictionaryTotal) + 1))
if item in dictionarySpam:
probSpam = probSpam * ((dictionarySpam[item] + 1) / (sum(dictionarySpam.values()) + len(dictionaryTotal) + 1))
elif item not in dictionaryHam:
probHam = probSpam * (1 / (sum(dictionarySpam.values()) + len(dictionaryTotal) + 1))
print("OUT: ", probHam)
probHam = probHam * (ham / (ham + spam))
probSpam = probSpam * (spam / (ham + spam))
print(probHam)
print(probSpam)
if probHam > probSpam:
print("This message is HAM")
else:
print("This message is SPAM")
答案 0 :(得分:0)
我认为您已经弄乱了自己的手脚:
probHam = probHam * ((dictionaryHam[item] + 1) / (sum(dictionaryHam.values()) + len(dictionaryTotal) + 1))