Question

我认为我已正确实施了大部分内容。有一部分让我困惑：

零频率问题：当每个类值没有出现属性值时，为每个属性值类组合（拉普拉斯估计器）的计数加1。

以下是我的一些客户端代码：

//Clasify
string text = "Claim your free Macbook now!";
double posteriorProbSpam = classifier.Classify(text, "spam");
Console.WriteLine("-------------------------");
double posteriorProbHam = classifier.Classify(text, "ham");

现在说某个地方的训练数据中有“免费”这个词

//Training
classifier.Train("ham", "Attention: Collect your Macbook from store.");
*Lot more here*
classifier.Train("spam", "Free macbook offer expiring.");

但这个词出现在我的“垃圾邮件”类别的培训数据中，而不是“火腿”。所以当我去计算posteriorProbHam时，当我遇到“免费”这个词时，我该怎么办。

enter image description here

Answer 1

如果你想一想不加一个是什么意思，那就没有意义了：在火腿上看一次“免费”可以减少垃圾邮件中“免费”的可能性。

朴素的贝叶斯和零频率问题

1 个答案: