Question

我正在尝试估计随机变量（RVs）的熵，其中涉及步骤p_X * log(p_X)的计算。例如，

import numpy as np
X = np.random.rand(100)   
binX = np.histogram(X, 10)[0] #create histogram with 10 bins
p_X = binX / np.sum(binX)
ent_X = -1 * np.sum(p_X * np.log(p_X))

有时p_X应该为零，从数学上讲，整个术语为零。但是python将p_X * np.log(p_X)设为NaN，并将整个求和结果设为NaN。有什么方法可以管理（无需对NaN进行任何明确的检查）使p_X * np.log(p_X)为零时使p_X为零吗？感谢您的任何见解和纠正，并在此先感谢：）

Answer 1

如果您有scipy，请使用scipy.special.xlogy(p_X,p_X)。它不仅可以解决您的问题，而且还有一个好处，那就是它比p_X*np.log(p_X)快一点。

Answer 2

您可以使用np.ma.log，它会屏蔽0，并使用filled方法用0来填充被屏蔽的数组：

np.ma.log(p_X).filled(0)

例如：

np.ma.log(range(5)).filled(0)
# array([0.        , 0.        , 0.69314718, 1.09861229, 1.38629436])

X = np.random.rand(100)   
binX = np.histogram(X, 10)[0] #create histogram with 10 bins
p_X = binX / np.sum(binX)
ent_X = -1 * np.sum(p_X * np.ma.log(p_X).filled(0))

Answer 3

对于您而言，您可以使用nansum，因为在0中添加sum与忽略NaN一样：

ent_X = -1 * np.nansum(p_X * np.log(p_X))

处理零乘以NaN

3 个答案: