尝试理解信息增益我使用Fast Information Gain computation中的代码:
def information_gain(x, y):
def _entropy(values):
counts = np.bincount(values)
probs = counts[np.nonzero(counts)] / float(len(values))
return - np.sum(probs * np.log(probs))
def _information_gain(feature, y):
feature_set_indices = np.nonzero(feature)[1]
feature_not_set_indices = [i for i in feature_range if i not in feature_set_indices]
entropy_x_set = _entropy(y[feature_set_indices])
entropy_x_not_set = _entropy(y[feature_not_set_indices])
return entropy_before - (((len(feature_set_indices) / float(feature_size)) * entropy_x_set)
+ ((len(feature_not_set_indices) / float(feature_size)) * entropy_x_not_set))
feature_size = x.shape[0]
feature_range = range(0, feature_size)
entropy_before = _entropy(y)
information_gain_scores = []
for feature in x.T:
information_gain_scores.append(_information_gain(feature, y))
return information_gain_scores, []
返回错误:
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-337-6aeae4409ab7> in <module>()
2 y = np.array([np.array([1])])
3
----> 4 information_gain(x , y)
5
<ipython-input-332-a4588f8e6e4c> in information_gain(x, y)
17 feature_size = x.shape[0]
18 feature_range = range(0, feature_size)
---> 19 entropy_before = _entropy(y)
20 information_gain_scores = []
21
<ipython-input-332-a4588f8e6e4c> in _entropy(values)
2
3 def _entropy(values):
----> 4 counts = np.bincount(values)
5 probs = counts[np.nonzero(counts)] / float(len(values))
6 return - np.sum(probs * np.log(probs))
ValueError: object too deep for desired array
当我使用代码调用时:
x = np.array([np.array([1,2,3])])
y = np.array([np.array([1])])
information_gain(x , y)
我是否正确设置了x
和y
值?
如果我将y更改为1d数组:
x = np.array([[1,2,3]])
y = np.array([1])
information_gain(x , y)
收到错误:
--------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-344-165206c62b0a> in <module>()
1 x = np.array([[1,2,3]])
2 y = np.array([1])
----> 3 information_gain(x , y)
<ipython-input-332-a4588f8e6e4c> in information_gain(x, y)
21
22 for feature in x.T:
---> 23 information_gain_scores.append(_information_gain(feature, y))
24 return information_gain_scores, []
<ipython-input-332-a4588f8e6e4c> in _information_gain(feature, y)
7
8 def _information_gain(feature, y):
----> 9 feature_set_indices = np.nonzero(feature)[1]
10 feature_not_set_indices = [i for i in feature_range if i not in feature_set_indices]
11 entropy_x_set = _entropy(y[feature_set_indices])
IndexError: tuple index out of range