如何将mutual_info_classif与离散数据

时间:2018-03-12 14:51:21

标签: python scikit-learn

基本上,我尝试使用[this] [1]函数和以下脚本来获得有关离散特征的一些见解:

from sklearn.feature_selection import mutual_info_classif

mutual_info_classif(r1[to_consider].values, r1['Y'].values, discrete_features='True')

但它会引发错误:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-52-c20422a21616> in <module>()
      1 from sklearn.feature_selection import mutual_info_classif
      2 
----> 3 mutual_info_classif(np.array(r1[to_consider].values), np.array(r1['Y'].values), discrete_features='True')

~\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\feature_selection\mutual_info_.py in mutual_info_classif(X, y, discrete_features, n_neighbors, copy, random_state)
    448     check_classification_targets(y)
    449     return _estimate_mi(X, y, discrete_features, True, n_neighbors,
--> 450                         copy, random_state)

~\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\feature_selection\mutual_info_.py in _estimate_mi(X, y, discrete_features, discrete_target, n_neighbors, copy, random_state)
    259         if discrete_features.dtype != 'bool':
    260             discrete_mask = np.zeros(n_features, dtype=bool)
--> 261             discrete_mask[discrete_features] = True
    262         else:
    263             discrete_mask = discrete_features

IndexError: arrays used as indices must be of integer (or boolean) type

  [1]: http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html

我试图理解为什么会这样。 在下面,它是我的数据预览(热编码):

r1[to_consider].values
array([[ 0,  7,  1,  1,  2,  5],
       [ 1,  0,  1,  0,  0,  5],
       [ 0,  0,  1,  1,  6,  5],
       ...,
       [ 0,  0,  1,  1,  6,  3],
       [ 3, 11,  2,  2, 10,  5],
       [ 0,  0,  1,  1,  9,  0]], dtype=int64)

r1['Y'].values
array([0, 0, 1, ..., 0, 0, 0], dtype=int8)

1 个答案:

答案 0 :(得分:0)

我假设你的

  

R1

是一个DataFrame。 试试这个而不是输入值。

from sklearn.feature_selection import mutual_info_classif

mutual_info_classif(r1[to_consider], r1['Y'], discrete_features='True')