编辑：

Question

我正在尝试在python中重写这个matlab / octave repo。我想出了一个似乎是熵函数的实现（见下文）。经过一些研究，我在google上发现我可以使用scipy's entropy implementation来进行python。但是在阅读了更多关于scipy的熵公式（例如S = -sum（pk * log（pk），axis = 0））之后，我怀疑这两个是在计算相同的东西......

有人能证实我的想法吗？

%�������� author by YangSong 2010.11.16 C230
%file:ys_sampEntropy.m
% code is called from line 101 of algotrading.m
%  =>   entropy180(i)=ys_sampEntropy(kmeans180s1(i,1:180));
% where kmeans180s1 is an array of size 100x181 containing the kmeans  
% centroids and the prize label at position 181.  

function sampEntropy=ys_sampEntropy(xdata)
m=2;
n=length(xdata);
r=0.2*std(xdata);%ƥ��ģ��������ֵ
%r=0.05;
cr=[];
gn=1;
gnmax=m;
while gn<=gnmax
      d=zeros(n-m+1,n-m);% ���ž��������ľ���  
      x2m=zeros(n-m+1,m);%���ű任�������      
      cr1=zeros(1,n-m+1);%���Ž����ľ���
      k=1;

      for i=1:n-m+1

          for j=1:m
              x2m(i,j)=xdata(i+j-1);
          end

      end
      x2m;

      for i=1:n-m+1

          for j=1:n-m+1

              if i~=j
                 d(i,k)=max(abs(x2m(i,:)-x2m(j,:)));%��������Ԫ�غ���ӦԪ�صľ���
                 k=k+1;
              end

          end

          k=1;
      end
      d;

      for i=1:n-m+1
          [k,l]=size(find(d(i,:)<r));%����RС�ĸ������͸�L
          cr1(1,i)=l;
      end
      cr1;

      cr1=(1/(n-m))*cr1;
      sum1=0;

      for i=1:n-m+1

          if cr1(i)~=0
             %sum1=sum1+log(cr1(i));
             sum1=sum1+cr1(i);
          end  %if����

      end  %for����

      cr1=1/(n-m+1)*sum1;
      cr(1,gn)=cr1;
      gn=gn+1;
      m=m+1;
end        %while����
cr;

sampEntropy=log(cr(1,1))-log(cr(1,2));

Answer 1

代码非常难以理解，但很明显，这不是离散变量的Shannon熵计算的实现，如scipy中所实现的。相反，这模糊地看起来像用于估计连续变量的熵的Kozachenko-Leonenko k-最近邻估计（Kozachenko＆amp; Leonenko 1987）。

该估算器的基本思想是查看相邻数据点之间的平均距离。直觉是，如果距离很大，数据中的离散度很大，因此熵很大。实际上，不是采用最近邻距离，而是倾向于采用k-最近邻距离，这倾向于使估计更加稳健。

代码显示了一些距离计算

d(i,k)=max(abs(x2m(i,:)-x2m(j,:)));

并且有一些点数比某个固定距离更近：

[k,l]=size(find(d(i,:)<r));

然而，很明显，这不完全是Kozachenko-Leonenko估计，而是一些屠杀版本。

如果你最终想要计算Leonenko估算器，我有一些代码来实现我的github效果：

https://github.com/paulbrodersen/entropy_estimators

编辑：

在看到这个烂摊子后，我不再确定他/她实际上是否使用（试图使用？）离散变量的经典香农信息定义，即使输入是连续的：

  for i=1:n-m+1
      [k,l]=size(find(d(i,:)<r));%����RС�ĸ������͸�L
      cr1(1,i)=l;
  end
  cr1;

  cr1=(1/(n-m))*cr1;

for循环计算比r更接近的数据点数，然后片段中的最后一行将该数字除以某个间隔以获得密度。

然后将这些密度总结如下：

  for i=1:n-m+1

      if cr1(i)~=0
         %sum1=sum1+log(cr1(i));
         sum1=sum1+cr1(i);
      end  %if����

  end  %for����

然后我们得到这些位（再次！）：

  cr1=1/(n-m+1)*sum1;
  cr(1,gn)=cr1;

并且

sampEntropy=log(cr(1,1))-log(cr(1,2));

我的大脑拒绝相信返回的值可能是您的平均log(p)，但我不再100％确定。

无论哪种方式，如果您想计算连续变量的熵，您应该为数据拟合分布，或者您应该使用Kozachenko-Leonenko估算器。请写出更好的代码。

Answer 2

    ##Entropy
def entropy(Y):
    """
    Also known as Shanon Entropy
    Reference: https://en.wikipedia.org/wiki/Entropy_(information_theory)
    """
    unique, count = np.unique(Y, return_counts=True, axis=0)
    prob = count/len(Y)
    en = np.sum((-1)*prob*np.log2(prob))
    return en


#Joint Entropy
def jEntropy(Y,X):
    """
    H(Y;X)
    Reference: https://en.wikipedia.org/wiki/Joint_entropy
    """
    YX = np.c_[Y,X]
    return entropy(YX)

#Conditional Entropy
def cEntropy(Y, X):
    """
    conditional entropy = Joint Entropy - Entropy of X
    H(Y|X) = H(Y;X) - H(X)
    Reference: https://en.wikipedia.org/wiki/Conditional_entropy
    """
    return jEntropy(Y, X) - entropy(X)


#Information Gain
def gain(Y, X):
    """
    Information Gain, I(Y;X) = H(Y) - H(Y|X)
    Reference: https://en.wikipedia.org/wiki/Information_gain_in_decision_trees#Formal_definition
    """
    return entropy(Y) - cEntropy(Y,X)

Answer 3

这就是我使用的：

def entropy(data,bins=None):
    if bins is None : bins =  len(np.unique(data))
    cx = np.histogram(data, bins)[0]
    normalized = cx/float(np.sum(cx))
    normalized = normalized[np.nonzero(normalized)]
    h = -sum(normalized * np.log2(normalized))
    return h





"""
Approximate entropy : used to quantify the amount of regularity and the unpredictability of fluctuations 
over time-series data.

The presence of repetitive patterns of fluctuation in a time series renders it more predictable than a 
time series in which such patterns are absent. 
ApEn reflects the likelihood that similar patterns of observations will not be followed by additional 
similar observations.
[7] A time series containing many repetitive patterns has a relatively small ApEn, 
a less predictable process has a higher ApEn.

    U: time series
    The value of "m" represents the (window) length of compared run of data, and "r" specifies a filtering level.

    https://en.wikipedia.org/wiki/Approximate_entropy

Good m,r values :
   m = 2,3 : rolling window
   r = 10% - 25% of seq std-dev

"""

def approx_entropy(U, m, r):

    def _maxdist(x_i, x_j):
        return max([abs(ua - va) for ua, va in zip(x_i, x_j)])

    def _phi(m):
        x = [[U[j] for j in range(i, i + m - 1 + 1)] for i in range(N - m + 1)]
        C = [len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0) for x_i in x]
        return (N - m + 1.0)**(-1) * sum(np.log(C))

    N = len(U)

    return abs(_phi(m + 1) - _phi(m))

熵python实现

3 个答案:

编辑：