沿轴计算直方图

时间:2017-05-24 08:02:01

标签: python performance numpy scipy vectorization

有没有办法计算沿nD阵列轴的许多直方图?我目前使用的方法使用## Unit: milliseconds ## expr min lq mean median uq max ## original_add(dt, "a", "b", "c") 2.493553 3.499039 6.585651 3.607101 4.390051 114.0612 ## my_add(dt, "a", "b", "c") 11.821820 14.512878 28.387841 17.412433 19.642231 117.6359 ## list_access_add(dt, "a", "b", "c") 2.161276 3.133110 6.874885 3.218185 3.407776 107.6853 ## david_add(dt, "a", "b", "c") 2.237089 3.313133 6.047832 3.381757 3.788558 103.7532 ## neval ## 100 ## 100 ## 100 ## 100 循环迭代所有其他轴,并为每个生成的1D数组计算for

numpy.histogram()

毋庸置疑,这是非常缓慢的,但我找不到使用import numpy import itertools data = numpy.random.rand(4, 5, 6) # axis=-1, place `200001` and `[slice(None)]` on any other position to process along other axes out = numpy.zeros((4, 5, 200001), dtype="int64") indices = [ numpy.arange(4), numpy.arange(5), [slice(None)] ] # Iterate over all axes, calculate histogram for each cell for idx in itertools.product(*indices): out[idx] = numpy.histogram( data[idx], bins=2 * 100000 + 1, range=(-100000 - 0.5, 100000 + 0.5), )[0] out.shape # (4, 5, 200001) numpy.histogramnumpy.histogram2d解决此问题的方法。

2 个答案:

答案 0 :(得分:6)

这是一种利用高效工具np.searchsortednp.bincount的矢量化方法。 searchsorted为我们提供了基于垃圾箱放置每个元素的信息,bincount为我们计算了数据。

实施 -

def hist_laxis(data, n_bins, range_limits):
    # Setup bins and determine the bin location for each element for the bins
    R = range_limits
    N = data.shape[-1]
    bins = np.linspace(R[0],R[1],n_bins+1)
    data2D = data.reshape(-1,N)
    idx = np.searchsorted(bins, data2D,'right')-1

    # Some elements would be off limits, so get a mask for those
    bad_mask = (idx==-1) | (idx==n_bins)

    # We need to use bincount to get bin based counts. To have unique IDs for
    # each row and not get confused by the ones from other rows, we need to 
    # offset each row by a scale (using row length for this).
    scaled_idx = n_bins*np.arange(data2D.shape[0])[:,None] + idx

    # Set the bad ones to be last possible index+1 : n_bins*data2D.shape[0]
    limit = n_bins*data2D.shape[0]
    scaled_idx[bad_mask] = limit

    # Get the counts and reshape to multi-dim
    counts = np.bincount(scaled_idx.ravel(),minlength=limit+1)[:-1]
    counts.shape = data.shape[:-1] + (n_bins,)
    return counts

运行时测试

原创方法 -

def org_app(data, n_bins, range_limits):
    R = range_limits
    m,n = data.shape[:2]
    out = np.zeros((m, n, n_bins), dtype="int64")
    indices = [
        np.arange(m), np.arange(n), [slice(None)]
    ]

    # Iterate over all axes, calculate histogram for each cell
    for idx in itertools.product(*indices):
        out[idx] = np.histogram(
            data[idx],
            bins=n_bins,
            range=(R[0], R[1]),
        )[0]
    return out

计时和验证 -

In [2]: data = np.random.randn(4, 5, 6)
   ...: out1 = org_app(data, n_bins=200001, range_limits=(- 2.5, 2.5))
   ...: out2 = hist_laxis(data, n_bins=200001, range_limits=(- 2.5, 2.5))
   ...: print np.allclose(out1, out2)
   ...: 
True

In [3]: %timeit org_app(data, n_bins=200001, range_limits=(- 2.5, 2.5))
10 loops, best of 3: 39.3 ms per loop

In [4]: %timeit hist_laxis(data, n_bins=200001, range_limits=(- 2.5, 2.5))
100 loops, best of 3: 3.17 ms per loop

因为在循环解决方案中,我们正在循环前两个轴。所以,让我们增加它们的长度,因为这将向我们展示向量化的有多好 -

In [59]: data = np.random.randn(400, 500, 6)

In [60]: %timeit org_app(data, n_bins=21, range_limits=(- 2.5, 2.5))
1 loops, best of 3: 9.59 s per loop

In [61]: %timeit hist_laxis(data, n_bins=21, range_limits=(- 2.5, 2.5))
10 loops, best of 3: 44.2 ms per loop

In [62]: 9590/44.2          # Speedup number
Out[62]: 216.9683257918552

答案 1 :(得分:1)

第一个解决方案提供了一个很好的简短成语,它使用了numpy sortedsearch,这需要花费大量的时间并进行大量搜索。但是numpy在其源代码中有一条快速路线,实际上是在Python中完成的,它可以在数学上处理相等的bin边缘范围。此解决方案仅使用矢量化的减法和乘法以及一些比较。

此解决方案将遵循numpy代码对搜索进行排序,键入归因,并处理权重以及复数。基本上,这是第一个结合numpy直方图快速路由,一些额外类型和迭代详细信息等的解决方案。

_range = range
def hist_np_laxis(a, bins=10, range=None, weights=None):
    # Initialize empty histogram
    N = a.shape[-1]
    data2D = a.reshape(-1,N)
    limit = bins*data2D.shape[0]
    # gh-10322 means that type resolution rules are dependent on array
    # shapes. To avoid this causing problems, we pick a type now and stick
    # with it throughout.
    bin_type = np.result_type(range[0], range[1], a)
    if np.issubdtype(bin_type, np.integer):
        bin_type = np.result_type(bin_type, float)
    bin_edges = np.linspace(range[0],range[1],bins+1, endpoint=True, dtype=bin_type)
    # Histogram is an integer or a float array depending on the weights.
    if weights is None:
        ntype = np.dtype(np.intp)
    else:
        ntype = weights.dtype
    n = np.zeros(limit, ntype)
    # Pre-compute histogram scaling factor
    norm = bins / (range[1] - range[0])
    # We set a block size, as this allows us to iterate over chunks when
    # computing histograms, to minimize memory usage.
    BLOCK = 65536
    # We iterate over blocks here for two reasons: the first is that for
    # large arrays, it is actually faster (for example for a 10^8 array it
    # is 2x as fast) and it results in a memory footprint 3x lower in the
    # limit of large arrays.
    for i in _range(0, data2D.shape[0], BLOCK):
        tmp_a = data2D[i:i+BLOCK]
        block_size = tmp_a.shape[0]
        if weights is None:
            tmp_w = None
        else:
            tmp_w = weights[i:i + BLOCK]
        # Only include values in the right range
        keep = (tmp_a >= range[0])
        keep &= (tmp_a <= range[1])
        if not np.logical_and.reduce(np.logical_and.reduce(keep)):
            tmp_a = tmp_a[keep]
            if tmp_w is not None:
                tmp_w = tmp_w[keep]
        # This cast ensures no type promotions occur below, which gh-10322
        # make unpredictable. Getting it wrong leads to precision errors
        # like gh-8123.
        tmp_a = tmp_a.astype(bin_edges.dtype, copy=False)

        # Compute the bin indices, and for values that lie exactly on
        # last_edge we need to subtract one
        f_indices = (tmp_a - range[0]) * norm
        indices = f_indices.astype(np.intp)
        indices[indices == bins] -= 1

        # The index computation is not guaranteed to give exactly
        # consistent results within ~1 ULP of the bin edges.
        decrement = tmp_a < bin_edges[indices]
        indices[decrement] -= 1
        # The last bin includes the right edge. The other bins do not.
        increment = ((tmp_a >= bin_edges[indices + 1])
                     & (indices != bins - 1))
        indices[increment] += 1

        ((bins*np.arange(i, i+block_size)[:,None] * keep)[keep].reshape(indices.shape) + indices).reshape(-1)
        #indices = scaled_idx.reshape(-1)
        # We now compute the histogram using bincount
        if ntype.kind == 'c':
            n.real += np.bincount(indices, weights=tmp_w.real,
                                  minlength=limit)
            n.imag += np.bincount(indices, weights=tmp_w.imag,
                                  minlength=limit)
        else:
            n += np.bincount(indices, weights=tmp_w,
                             minlength=limit).astype(ntype)
    n.shape = a.shape[:-1] + (bins,)
    return n
data = np.random.randn(4, 5, 6)
out1 = hist_laxis(data, n_bins=200001, range_limits=(- 2.5, 2.5))
out2 = hist_np_laxis(data, bins=200001, range=(- 2.5, 2.5))
print(np.allclose(out1, out2))
True
%timeit hist_np_laxis(data, bins=21, range=(- 2.5, 2.5))
92.1 µs ± 504 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit hist_laxis(data, n_bins=21, range_limits=(- 2.5, 2.5))
55.1 µs ± 3.66 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

尽管在较小的示例甚至较大的示例中,第一个解决方案都更快:

data = np.random.randn(400, 500, 6)
%timeit hist_np_laxis(data, bins=21, range=(- 2.5, 2.5))
264 ms ± 2.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit hist_laxis(data, n_bins=21, range_limits=(- 2.5, 2.5))
71.6 ms ± 377 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

这并不总是更快:

data = np.random.randn(400, 6, 500)

%timeit hist_np_laxis(data, bins=101, range=(- 2.5, 2.5))
71.5 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit hist_laxis(data, n_bins=101, range_limits=(- 2.5, 2.5))
76.9 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

但是,仅当最后一个轴较大时,numpy的变化才会更快。并且其增加幅度非常小。在我尝试过的所有其他情况下,无论仓位数量和前两个尺寸的大小如何,第一个解决方案都快得多。尽管我还没有找到一种更快的方法,但是唯一重要的一行((bins*np.arange(i, i+block_size)[:,None] * keep)[keep].reshape(indices.shape) + indices).reshape(-1)可能会更加乐观。

这也意味着O(n)的向量化操作的绝对数量超过了排序和重复增量搜索的O(n log n)。

但是,实际用例的最后一个轴包含大量数据,而先前的轴包含少量数据。因此,实际上,第一种解决方案中的样本太虚构,无法满足所需的性能。

直方图的轴加法在numpy仓库https://github.com/numpy/numpy/issues/13166中被视为一个问题。

xhistogram库也试图解决此问题:https://xhistogram.readthedocs.io/en/latest/