计算信息增益

时间:2017-12-11 17:29:59

标签: python machine-learning

尝试理解信息增益我使用Fast Information Gain computation中的代码:

def information_gain(x, y):

    def _entropy(values):
        counts = np.bincount(values)
        probs = counts[np.nonzero(counts)] / float(len(values))
        return - np.sum(probs * np.log(probs))

    def _information_gain(feature, y):
        feature_set_indices = np.nonzero(feature)[1]
        feature_not_set_indices = [i for i in feature_range if i not in feature_set_indices]
        entropy_x_set = _entropy(y[feature_set_indices])
        entropy_x_not_set = _entropy(y[feature_not_set_indices])

        return entropy_before - (((len(feature_set_indices) / float(feature_size)) * entropy_x_set)
                                 + ((len(feature_not_set_indices) / float(feature_size)) * entropy_x_not_set))

    feature_size = x.shape[0]
    feature_range = range(0, feature_size)
    entropy_before = _entropy(y)
    information_gain_scores = []

    for feature in x.T:
        information_gain_scores.append(_information_gain(feature, y))
    return information_gain_scores, []

返回错误:

--------------------------------------------------------------------------
ValueError                               Traceback (most recent call last)
<ipython-input-337-6aeae4409ab7> in <module>()
      2 y = np.array([np.array([1])])
      3 
----> 4 information_gain(x , y)
      5 

<ipython-input-332-a4588f8e6e4c> in information_gain(x, y)
     17     feature_size = x.shape[0]
     18     feature_range = range(0, feature_size)
---> 19     entropy_before = _entropy(y)
     20     information_gain_scores = []
     21 

<ipython-input-332-a4588f8e6e4c> in _entropy(values)
      2 
      3     def _entropy(values):
----> 4         counts = np.bincount(values)
      5         probs = counts[np.nonzero(counts)] / float(len(values))
      6         return - np.sum(probs * np.log(probs))

ValueError: object too deep for desired array

当我使用代码调用时:

x = np.array([np.array([1,2,3])])
y = np.array([np.array([1])])

information_gain(x , y)

我是否正确设置了xy值?

如果我将y更改为1d数组:

x = np.array([[1,2,3]])
y = np.array([1])
information_gain(x , y)

收到错误:

--------------------------------------------------------------------------
IndexError                               Traceback (most recent call last)
<ipython-input-344-165206c62b0a> in <module>()
      1 x = np.array([[1,2,3]])
      2 y = np.array([1])
----> 3 information_gain(x , y)

<ipython-input-332-a4588f8e6e4c> in information_gain(x, y)
     21 
     22     for feature in x.T:
---> 23         information_gain_scores.append(_information_gain(feature, y))
     24     return information_gain_scores, []

<ipython-input-332-a4588f8e6e4c> in _information_gain(feature, y)
      7 
      8     def _information_gain(feature, y):
----> 9         feature_set_indices = np.nonzero(feature)[1]
     10         feature_not_set_indices = [i for i in feature_range if i not in feature_set_indices]
     11         entropy_x_set = _entropy(y[feature_set_indices])

IndexError: tuple index out of range

0 个答案:

没有答案