我最近开始了机器学习教程,第一个教程是监督学习(垃圾邮件和火腿),我从实施它开始。
my implementation:
---------total spam count-------------
hi free offers for you and the ! ....
5 3 9 4 4 6 8 6
---------total ham count-------------
hi free offers for you and the ! ....
3 5 3 7 3 4 6 2
mail_1 : hi! how are you here are some free offers for you !!!
hi how are you here are some free offers for you !!!
1 1 2 1 1 2 1 1 1 1 1 4
s[T] = c_spam(T) / ( c_spam(T) + c_ham(T) )
s[T] = how spammy is the word T
c_spam(T) = how many spam messages contain the word T
c_ham(T) = how many non-spam message contain the word T
现在我有两个问题:
1)这种实施是否正确?
2)现在在这台机器的结果之后,如果我发现新邮件是垃圾邮件,那么我是否需要更新旧的垃圾邮件模型?