Question

我有二维数据，我有一堆用scipy.stats.binned_statistic_2d生成的二维箱。对于每个数据点，我想要它占据的bin的索引。这正是np.digitize的用途，但据我所知，它只涉及一维数据。 This stackexchange似乎有一个答案，但这完全归结为n维。对于两个维度，是否有更直接的解决方案？

Answer 1

您已经可以从scipy.stats.binned_statistic_2d的第四个返回变量获取每个观察的bin索引：

Returns:  
  statistic : (nx, ny) ndarray
      The values of the selected statistic in each two-dimensional bin
  xedges : (nx + 1) ndarray
      The bin edges along the first dimension.
  yedges : (ny + 1) ndarray
      The bin edges along the second dimension.
  binnumber : 1-D ndarray of ints
      This assigns to each observation an integer that represents the bin
      in which this observation falls. Array has the same length as values.

Answer 2

使用numpy的简单解决方案：

bins = [[0.3, 0.5, 0.7], [0.3, 0.7]]
values = np.random.random((10, 2))
digitized = []
for i in range(len(bins)):
    digitized.append(np.digitize(values[:, i], bins[i], right=False))
digitized = np.concatenate(digitized).reshape(10, 2)

二维np.digitize

2 个答案: