Question

在以下代码中，density=True返回每个仓位处的概率密度函数。现在，如果必须计算P（x），我是否可以说hist在显示概率？例如，如果第一个bin的平均值为0.5，我可以说在x = 0.5时，hist [0]的概率是多少？我必须使用使用P（x）的KL散度。

x = np.array([0,0,0,0,0,3,3,2,2,2,1,1,1,1,])
hist,bin_edges= np.histogram(x,bins=10,density=True)

Answer 1

设置density=True时，NumPy返回概率密度函数（假设p）。从理论上讲，p(0.5) = 0是因为概率定义为PDF曲线下的面积。您可以阅读有关它的更多详细信息here。因此，如果要计算概率，则必须定义所需范围，并对该范围内的所有PDF值求和。

对于KL，我可以分享我的相互信息计算解决方案（基本上是KL）：

def mutual_information(x, y, sigma=1):
    bins = (256, 256)
    # histogram
    hist_xy = np.histogram2d(x, y, bins=bins)[0]

    # smooth it out for better results
    ndimage.gaussian_filter(hist_xy, sigma=sigma, mode='constant', output=hist_xy)

    # compute marginals
    hist_xy = hist_xy + EPS # prevent division with 0
    hist_xy = hist_xy / np.sum(hist_xy)
    hist_x = np.sum(hist_xy, axis=0)
    hist_y = np.sum(hist_xy, axis=1)

    # compute mi
    mi = (np.sum(hist_xy * np.log(hist_xy)) - np.sum(hist_x * np.log(hist_x)) - np.sum(hist_y * np.log(hist_y)))
    return mi

编辑： KL 可以这样计算（请注意，我没有对此进行测试！）：

def kl(x, y, sigma=1):
    # histogram
    hist_xy = np.histogram2d(x, y, bins=bins)[0]

    # smooth it out for better results
    ndimage.gaussian_filter(hist_xy, sigma=sigma, mode='constant', output=hist_xy)

    # compute marginals
    hist_xy = hist_xy + EPS # prevent division with 0
    hist_xy = hist_xy / np.sum(hist_xy)
    hist_x = np.sum(hist_xy, axis=0)
    hist_y = np.sum(hist_xy, axis=1)

    kl = -np.sum(hist_x * np.log(hist_y / hist_x ))
    return kl

此外，为了获得最佳结果，您应该使用一些启发式方法来计算sigma，例如A rule-of-thumb bandwidth estimator。

如何使用numpy.histogram计算概率，然后将其用于计算KL散度？

1 个答案: