A=c("f","t","t","f","t","f","f","f","t","f")
B=c("t","t","t","t","t","f","f","f","t","t")
class=c("+","+","+","-","+","-","-","-","-","-")
df=data.frame(A,B,class)
df
A B class
1 f t +
2 t t +
3 t t +
4 f t -
5 t t +
6 f f -
7 f f -
8 f f -
9 t t -
10 f t -
我根据类对属性A或B进行了分区,如下所示:
{A}
[T , F]
/ \
------- -------
[3+,1-] [1+,5-]
{B}
[T , F]
/ \
------- -------
[4+,3-] [0+,3-]
取决于上面的公式,我通过R中的代码计算熵。
1-属性A
t=table(A,class)
t
class
A - +
f 5 1
t 1 3
prop1=t[1,]/sum(t[1,])
prop1
- +
0.8333333 0.1666667
prop2=t[2,]/sum(t[2,])
prop2
- +
0.25 0.75
H1=-(prop1[1]*log2(prop1[1]))-(prop1[2]*log2(prop1[2]))
H1
0.6500224
H2=-(prop2[1]*log2(prop2[1]))-(prop2[2]*log2(prop2[2]))
H2
0.8112781
entropy=(table(A)[1]/length(A))*H1 +(table(A)[2]/length(A))*H2
entropy
0.7145247
2-属性B
t=table(B,class)
t
class
B - +
f 3 0
t 3 4
prop1=t[1,]/sum(t[1,])
prop1
- +
1 0
prop2=t[2,]/sum(t[2,])
prop2
- +
0.4285714 0.5714286
H1=-(prop1[1]*log2(prop1[1]))-(prop1[2]*log2(prop1[2]))
H1
NaN
H2=-(prop2[1]*log2(prop2[1]))-(prop2[2]*log2(prop2[2]))
H2
0.9852281
entropy=(table(B)[1]/length(B))*H1 +(table(B)[2]/length(B))*H2
entropy
NaN
当我计算属性B的熵时,结果给出NaN归因于零(0)(log2(0)是错误的)。在这种情况下如何解决此错误或如何使H1
给我零而不是NaN