X=np.array([7.20E+01,2.40E+01,0.00E+00,9.00E+00,0.00E+00,3.00E+00,0.00E+00,5.40E01,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,3.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,1.50E+01,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,1.11E+02,2.70E+01,0.00E+00,6.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
3.00E+00,
0.00E+00,
0.00E+00,
1.70E+01,
3.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
8.00E+00,
5.20E+01,
1.80E+01,
5.20E+01,
5.20E+01,
5.00E+01,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00,
0.00E+00])
y=np.array([0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
0.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00
1.00E+00])
这是X(我为了简单起见,我刚刚采用了一个功能)和146个样本的y。前73个是类(0),另外73个是类(1)。
现在我想计算这个特征的chi2分数。我使用了sklearn.feature_selection.chi2,它给了我答案579
,如果我在scipy.stats.chi2_contingency给它21作为答案。
使用的代码 -
obs = np.array([[0, 19], [73,54]])
scipy.stats.chi2_contingency(obs,correction=False)
这给出了21作为答案我认为应该是正确的答案,因为公式是(a*d-b*c)**2*float(n)/((a+c)*(b+d)*(a+b)*(c+d))
但是sklearn用这段代码给出了579 -
X_d= X.reshape(-1,1)
y_d=y.reshape(-1,1)
print(sklearn.feature_selection.chi2(X_d, y_d))
为什么两种情况下chi2值都不同?
编辑 - 如何在scipy案例中创建列联表
参考公式 -
所以我在0(负类)得到的非零值的数量是19.样本的总数是146,并且positve类中的非零值的数量是0.基于此信息,我获得的ABCDN的值是0 19 73 54 146
我将其提供给scipy.stats函数。