我对NaïveBayes分类方法有疑问。 我跑了,虽然我认为这是一个简单的例子,但遇到了障碍。
基本上这是我想要做的分类:
我希望能够获取一些培训数据:
input1 | input2 | input3 | class
1 3 3 1
2 1 1 2
1 1 1 3
3 3 3 1
并将它们分类为1-3级。
据我所知,你先计算先验概率 在这种情况下,类将是
class 1 = P(c_1) = 0.50
class 2 = P(c_2) = 0.25
class 3 = P(c_3) = 0.25
因此非常有意义。他们都加1和它的 很容易看出这些数字的来源。
因此,由于这些值的数字性质,我想简化 他们进入范围。所以我将我的数据重建为:
所以无论如何我到达那张桌子。继续贝叶斯部分:
P(Class 1 | avg_speed_1): 0.5
P(Class 1 | avg_speed_2): 0
P(Class 1 | avg_speed_3): 0
P(Class 2 | avg_speed_1): 0
P(Class 2 | avg_speed_2): 0.25
P(Class 2 | avg_speed_3): 0
P(Class 3 | avg_speed_1): 0
P(Class 3 | avg_speed_2): 0
P(Class 3 | avg_speed_3): 0.25
P(Class 1 | avg_distance_1): 0.5
P(Class 1 | avg_distance_2): 0
P(Class 1 | avg_distance_3): 0
P(Class 2 | avg_distance_1): 0
P(Class 2 | avg_distance_2): 0.25
P(Class 2 | avg_distance_3): 0
P(Class 3 | avg_distance_1): 0
P(Class 3 | avg_distance_2): 0
P(Class 3 | avg_distance_3): 0.25
P(Class 1 | avg_elev_gain_1): 0.5
P(Class 1 | avg_elev_gain_2): 0
P(Class 1 | avg_elev_gain_3): 0
P(Class 2 | avg_elev_gain_1): 0
P(Class 2 | avg_elev_gain_2): 0
P(Class 2 | avg_elev_gain_3): 0
P(Class 3 | avg_elev_gain_1): 0
P(Class 3 | avg_elev_gain_2): 0
P(Class 3 | avg_elev_gain_3): 0.5
现在这一切对我来说仍然有意义。然而,每个班级仍然增加到1 当我去计算每个班级的概率时,0会搞砸计算
以第一堂课为例:
P(Class 1 | avg_speed_1) *
P(Class 1 | avg_speed_2) *
P(Class 1 | avg_speed_3) *
P(Class 1 | avg_distance_1) *
P(Class 1 | avg_distance_2) *
P(Class 1 | avg_distance_3) *
P(Class 1 | avg_elev_gain_1) *
P(Class 1 | avg_elev_gain_2) *
P(Class 1 | avg_elev_gain_3) *
P(Class 1) = 0
我发现这总是等于零,因为有很多 输入元素仍为零!我哪里做错了?!?这是否意味着我的训练数据不足?
话虽如此,NaïveBayes方法甚至是接近这种分类的正确方法?
任何想法都将不胜感激