基本上,我尝试使用[this] [1]函数和以下脚本来获得有关离散特征的一些见解:
from sklearn.feature_selection import mutual_info_classif
mutual_info_classif(r1[to_consider].values, r1['Y'].values, discrete_features='True')
但它会引发错误:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-52-c20422a21616> in <module>()
1 from sklearn.feature_selection import mutual_info_classif
2
----> 3 mutual_info_classif(np.array(r1[to_consider].values), np.array(r1['Y'].values), discrete_features='True')
~\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\feature_selection\mutual_info_.py in mutual_info_classif(X, y, discrete_features, n_neighbors, copy, random_state)
448 check_classification_targets(y)
449 return _estimate_mi(X, y, discrete_features, True, n_neighbors,
--> 450 copy, random_state)
~\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\feature_selection\mutual_info_.py in _estimate_mi(X, y, discrete_features, discrete_target, n_neighbors, copy, random_state)
259 if discrete_features.dtype != 'bool':
260 discrete_mask = np.zeros(n_features, dtype=bool)
--> 261 discrete_mask[discrete_features] = True
262 else:
263 discrete_mask = discrete_features
IndexError: arrays used as indices must be of integer (or boolean) type
[1]: http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html
我试图理解为什么会这样。 在下面,它是我的数据预览(热编码):
r1[to_consider].values
array([[ 0, 7, 1, 1, 2, 5],
[ 1, 0, 1, 0, 0, 5],
[ 0, 0, 1, 1, 6, 5],
...,
[ 0, 0, 1, 1, 6, 3],
[ 3, 11, 2, 2, 10, 5],
[ 0, 0, 1, 1, 9, 0]], dtype=int64)
和
r1['Y'].values
array([0, 0, 1, ..., 0, 0, 0], dtype=int8)
答案 0 :(得分:0)
我假设你的
R1
是一个DataFrame。 试试这个而不是输入值。
from sklearn.feature_selection import mutual_info_classif
mutual_info_classif(r1[to_consider], r1['Y'], discrete_features='True')