决策树中熵的Matlab函数

时间:2015-04-04 22:03:17

标签: matlab octave

嗨,有一位Matlab大师!

一个月前我开始在某处学习MATLAB(在我的试用许可证到期后,我切换到了八度)。我正在编写一个函数(仅用于教育需求)来计算熵(例如在决策树的叶子中),而且我被卡住了。我在下面收到错误:

>> my_entropy(cell3, false)
f = -0
f =

  -0  -0

f =

  -0  -0   3

error: my_entropy: A(I,J): column index out of bounds; value 2 out of bound 1
error: called from:
error:   C:\big data\octave\my_entropy.m at line 29, column 13

将5.04.15更新为@Daniel建议

# The main difference between MATLAB bundled entropy function
# and this custom function is that they use a transformation to uint8
# and the bundled entropy() function is used mostly for signal processing
# while I simply use a straightforward solution usefull e.g. for learning trees

function f = my_entropy(data, weighted)
  # function accepts only cell arrays;
  # weighted tells whether return one weighed average entropy
  # or return a vector of entropies per bucket
  # moreover, I find vectors as the only representation of "buckets"
  # in other words, vector = bucket (leaf of decision tree)
  if nargin < 2
    weighted = true;
  end;

  rows = @(x) size(x,1);
  cols = @(x) size(x,2);

  if weighted
    f = 0;
  else
    f = [];
  end;

  for r = 1:rows(data)

    for c = 1:cols(data{r}) # in most cases this will be 1:1

      omega = sum(data{r,c});
      epsilon = 0;

      for b = 1:cols(data{r,c})
        epsilon = epsilon + ( (data{r,c}(b) / omega) * (log2(data{r,c}(b) / omega)) );
      end;

      entropy = -epsilon;

      if weighted
        f = f + entropy
      else
        f = [f entropy]
      end;

    end;

  end;

end;

# test cases

cell1 = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] }
cell2 = { [16],[12];[16],[2];[2 2 2 2 2 2 2 2],[8 8];[12],[8 8];[16],[8 8] }
cell3 = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] }

输入

c = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] }

my_entropy的答案(c,false) 应该是

[0, 0, 3, 0, 0]

这张图片有助于形象化

Marbles as data

一个桶是一个matlab矢量,整个palet是一个matlab单元格, 数字是不同的各种数据的计数。因此,在这张图片中,中间单元{2,2}具有熵3,而其他桶(单元)具有熵0。

建议如何修复它的帮助表示赞赏, 最好的祝福! :)

1 个答案:

答案 0 :(得分:0)

错误在于for c = 1:cols(cell{r})

您想要单元格的cols数,这是cols(cell)。你写的内容返回了单元格第r个元素的cols数。

您应该避免使用等同于cell

等函数中构建的变量名