You have classification data with classes Y ∈ {+1, −1} and features Fi ∈ {+1, −1} for
i ∈ {1, . . . , K}. In an attempt to turbocharge your classifier, you duplicate each feature, so now each example
has 2K features, with FK+i = Fi for i ∈ {1, . . . , K}. The following questions compare the original feature set
with the doubled one. You may assume that in the case of ties, class +1 is always chosen. Assume that there
are equal numbers of training examples in each class.
该解决方案表明,这会导致过度自信。但是如何?
在朴素贝叶斯中,我们假定给定类标签,每个功能都独立于其他功能。
假设其中一个示例具有功能{1,-1}。
P(y = -1 | x_1 = 1, x_2 = -1) = P(y=-1)P(x_1 = 1 | y= -1)
X P(x_2 = -1 | y=-1)
如果将功能加倍,我们将改写为:
P(y = 1 | x_1 = 1, x_2 = 1, x_3, = -1, x_4 = -1) =
P(y=-1) x P(x_1 = 1 | y=-) * P(x_2 = 1 | y=-) X P(x_3 = -1 | y=-)
X P(x_3 = -1 | y=-)
每个概率都小于1-那么在双重特征示例中,乘以更多的分数会不会导致较小的概率(从而降低可信度分类)?