我有二维数据,我有一堆用scipy.stats.binned_statistic_2d
生成的二维箱。对于每个数据点,我想要它占据的bin的索引。这正是np.digitize
的用途,但据我所知,它只涉及一维数据。 This stackexchange似乎有一个答案,但这完全归结为n维。对于两个维度,是否有更直接的解决方案?
答案 0 :(得分:5)
您已经可以从scipy.stats.binned_statistic_2d
的第四个返回变量获取每个观察的bin索引:
Returns: statistic : (nx, ny) ndarray The values of the selected statistic in each two-dimensional bin xedges : (nx + 1) ndarray The bin edges along the first dimension. yedges : (ny + 1) ndarray The bin edges along the second dimension. binnumber : 1-D ndarray of ints This assigns to each observation an integer that represents the bin in which this observation falls. Array has the same length as values.
答案 1 :(得分:0)
使用numpy的简单解决方案:
bins = [[0.3, 0.5, 0.7], [0.3, 0.7]]
values = np.random.random((10, 2))
digitized = []
for i in range(len(bins)):
digitized.append(np.digitize(values[:, i], bins[i], right=False))
digitized = np.concatenate(digitized).reshape(10, 2)