Question

我希望简单的问题。

如果我有一组这样的数据：

Classification  attribute-1  attribute-2

Correct         dog          dog 
Correct         dog          dog
Wrong           dog          cat 
Correct         cat          cat
Wrong           cat          dog
Wrong           cat          dog

那么属性-2相对于属性-1的信息增益是多少？

我计算了整个数据集的熵： - （3/6）log2（3/6） - （3/6）log2（3/6）= 1

然后我被困住了！我想你还需要计算属性1和属性2的熵吗？然后在信息增益计算中使用这三个计算？

任何帮助都会很棒，

谢谢你:)。

Answer 1

首先，您必须计算每个属性的熵。之后，您计算信息增益。请给我一点时间，我将展示应该如何完成。

for attribute-1

attr-1=dog:
info([2c,1w])=entropy(2/3,1/3)

attr-1=cat
info([1c,2w])=entropy(1/3,2/3)

attribute-1的值：

info([2c,1w],[1c,2w])=(3/6)*info([2c,1w])+(3/6)*info([1c,2w])

获得属性-1：

gain("attr-1")=info[3c,3w]-info([2c,1w],[1c,2w])

你必须为下一个属性做同样的事情。

熵和信息增益

1 个答案: