如何从2dhistogram

时间:2018-04-12 01:30:07

标签: python pandas numpy matplotlib histogram

我有一个xy坐标数据集,用于制作散点图。我已经设置了2dhistogram来创建一个覆盖此图的网格。当每个bin中有任何散点时,我想要bin坐标。下面的代码显示散点图,并在任何散点点位于其中时突出显示bin。

以下是我到目前为止的一个例子:

import random
import matplotlib.pyplot as plt
import numpy as np

x = [random.randrange(1,100,1) for _ in range (10000)]
y = [random.randrange(1,100,1) for _ in range (10000)]

fig, ax = plt.subplots()

ax.set_xlim(0,100)
ax.set_ylim(0,100)

bins = [np.linspace(*ax.get_xlim(), 50),
        np.linspace(*ax.get_ylim(), 50)]

zi, xi, yi = np.histogram2d(x, y, bins=bins)
zi = np.ma.masked_equal(zi, 0)

ax.pcolormesh(xi, yi, zi.T)    
ax.set_xticks(bins[0], minor=True)
ax.set_yticks(bins[1], minor=True)
ax.grid(True, which='minor')

scat = ax.scatter(x, y, s = 1) 

这会显示散点并突出显示它所在的bin。我希望当散点位于其中时,返回每个bin的坐标。

1 个答案:

答案 0 :(得分:0)

您可以使用xybins计算每个(x,y)对的bin位置。基本上,寻找x和y区间以及(x,y)对之间的符号变化,这告诉我们这个点位于什么区域。

import random
import matplotlib.pyplot as plt
import numpy as np

def used_bins(x, y, bins):
    bin_idxs = []
    for xelem, yelem in zip(x, y):
        xbin = ((bins[0] - xelem) < 0).sum()
        ybin = ((bins[1] - yelem) < 0).sum()
        bin_idxs.append((xbin, ybin))

    return bin_idxs


x = [random.randrange(1,100,1) for _ in range (10)]
y = [random.randrange(1,100,1) for _ in range (10)]

fig, ax = plt.subplots()

ax.set_xlim(0,100)
ax.set_ylim(0,100)

bins = [np.linspace(*ax.get_xlim(), 100),
        np.linspace(*ax.get_ylim(), 50)]

zi, xi, yi = np.histogram2d(x, y, bins=bins)
zi = np.ma.masked_equal(zi, 0)

ax.pcolormesh(xi, yi, zi.T)    
ax.set_xticks(bins[0], minor=True)
ax.set_yticks(bins[1], minor=True)
ax.grid(True, which='minor')

scat = ax.scatter(x, y, s = 1) 

# compute the x, y bin index for each x,y element
bin_idxs = used_bins(x, y, bins)

示例输出:

[(85,45),(74,27),(65,43),(8,49),(8,19),(89,25),(51,30),(38,17) ),(98,12),(24,6)]