针对概率归一化的2D直方图

时间:2018-06-20 03:31:39

标签: python matplotlib histogram

我有一个2D数据集,我想绘制一个2D直方图,直方图上的每个单元格代表数据点的概率。因此,为了获得概率,我需要对直方图数据进行归一化,以使其总和为1。这是我从2Dhistogram文档中获得的示例:

xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
#create edges of bins

#create random data points
x=np.random.normal(2,1,100)
y=np.random.normal(1,1,100)
H,xedges,yedges = np.histogram2d(x,y,bins=(xedges,yedges))
#setting normed=True in histogram2d doesn't seem to do what I need

H=H.T
#weirdly histogram2d swaps the x,y axis, so transpose to restore it.

fig = plt.figure(figsize=(7,3))
plt.imshow(H,interpolation='nearest',origin='low',extent=[xedges[0], xedges[-1],yedges[0],yedges[-1]])
plt.show()

Resulting plot

首先,一个np.sum(H)给出类似于86的值。我希望每个单元格代表该合并单元格上的数据的概率,因此它们的总和应为1。此外,如何绘制图例使用imshow将颜色强度映射为其值?

谢谢!

1 个答案:

答案 0 :(得分:0)

尝试使用normed参数。同样,对于docs,H中的值将计算为bin_count / sample_count / bin_area。因此,我们计算垃圾箱的面积,然后乘以H即可得出垃圾箱的概率。

xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
# create edges of bins

x = np.random.normal(2, 1, 100) # create random data points
y = np.random.normal(1, 1, 100)
H, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges), normed=True)
areas = np.matmul(np.array([np.diff(xedges)]).T, np.array([np.diff(yedges)]))
# setting normed=True in histogram2d doesn't seem to do what I need

fig = plt.figure(figsize=(7, 3))
im = plt.imshow(H*areas, interpolation='nearest', origin='low', extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar(im)
plt.show()