Question

我有一个Nx3阵列 mm 。函数调用

c,edg,idx = scipy.stats.binned_statistic_dd(mm,[], statistic='count',bins=(30,20,10),rg=((3,5),(2,8),(4,6)))

返回 idx ，这是一个1d的整数数组，代表 mm 的每个元素落入的bin，而 edg 是包含bin边缘的3个数组的列表

我需要的是找到给定bin的bin边缘，并在idx中给出它的binnumber。例如，给定 idx = [24,153，...，72]我想找到说bin 153的边缘，即该bin位于 edg 的位置。当然，我可以通过mm [153]找到bin 153中的元素，但不能找到边缘。

为了清楚起见，我发布了这个Nx3案例。实际上，我正在寻找NxD案例的解决方案。

Answer 1

首先要熟悉np.unravel_index。它转换了一个平坦的索引＆＃34; （即binnumber！）到一个坐标元组。您可以将平面索引视为arr.ravel()的索引，将坐标元组视为arr的索引。例如，如果在下图中我们将数字0,1,2,3,4,5视为二进制数：

   | 0 | 1 | 2 |
---+---+---+---|
 0 | 0 | 1 | 2 |
 1 | 3 | 4 | 5 |
   +---+---+---|

然后np.unravel_index(4, (2,3))

In [65]: np.unravel_index(4, (2,3))
Out[65]: (1, 1)

等于(1,1)，因为形状(2,3)数组中的第4个bin编号具有坐标(1,1)。

好的。接下来，我们需要知道内部scipy.stats.binned_statistic_dd为给定的bin边添加两条边以处理异常值：

bin_edges = [np.r_[-np.inf, edge, np.inf] for edge in bin_edges]

因此对应于箱号的边缘坐标由

给出

edge_index = np.unravel_index(binnumber, [len(edge)-1 for edge in bin_edges])

（我们使用len(edge)-1，因为数组轴的形状比边数。）

例如：

import itertools as IT
import numpy as np
import scipy.stats as stats

sample = np.array(list(IT.product(np.arange(5)-0.5, 
                                  np.arange(5)*10-5, 
                                  np.arange(5)*100-50)))
bins = [np.arange(4),
        np.arange(4)*10,
        np.arange(4)*100]

statistic, bin_edges, binnumber = stats.binned_statistic_dd(
    sample=sample, values=sample, statistic='count', 
    bins=bins, 
    range=[(0,100)]*3)

bin_edges = [np.r_[-np.inf, edge, np.inf] for edge in bin_edges]
edge_index = np.unravel_index(binnumber, [len(edge)-1 for edge in bin_edges])


for samp, idx in zip(sample, zip(*edge_index)):
    vert = [edge[i] for i, edge in zip(idx, bin_edges)]
    print('{} goes in bin with left-most corner: {}'.format(samp, vert))

产量

[ -0.5  -5.  -50. ] goes in bin with left-most corner: [-inf, -inf, -inf]
[ -0.5  -5.   50. ] goes in bin with left-most corner: [-inf, -inf, 0.0]
[  -0.5   -5.   150. ] goes in bin with left-most corner: [-inf, -inf, 100.0]
[  -0.5   -5.   250. ] goes in bin with left-most corner: [-inf, -inf, 200.0]
...

如何查找scipy.stats.binned_statistic_dd（）返回的给定bin号的bin边缘？

1 个答案: