python:矢量化累积计数

时间:2018-02-08 16:57:40

标签: arrays numpy vectorization counting cumsum

我有一个numpy数组,并希望计算每个值的出现次数,但是,累积方式

in  = [0, 1, 0, 1, 2, 3, 0, 0, 2, 1, 1, 3, 3, 0, ...]
out = [0, 0, 1, 1, 0, 0, 2, 3, 1, 2, 3, 1, 2, 4, ...]

我想知道是否最好在col = i和row = in [i]

中创建一个(稀疏)矩阵
       1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0
       0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0
       0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0
       0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0

然后我们可以沿着行计算cumsums并从cumsums增量的位置提取数字。

然而,如果我们收集稀疏矩阵,不会变得密集?有没有一种有效的方法呢?

1 个答案:

答案 0 :(得分:2)

这是使用sorting -

的一种矢量化方法
def cumcount(a):
    # Store length of array
    n = len(a)

    # Get sorted indices (use later on too) and store the sorted array
    sidx = a.argsort()
    b = a[sidx]

    # Mask of shifts/groups
    m = b[1:] != b[:-1]

    # Get indices of those shifts
    idx = np.flatnonzero(m)

    # ID array that will store the cumulative nature at the very end
    id_arr = np.ones(n,dtype=int)
    id_arr[idx[1:]+1] = -np.diff(idx)+1
    id_arr[idx[0]+1] = -idx[0]
    id_arr[0] = 0
    c = id_arr.cumsum()

    # Finally re-arrange those cumulative values back to original order
    out = np.empty(n, dtype=int)
    out[sidx] = c
    return out

示例运行 -

In [66]: a
Out[66]: array([0, 1, 0, 1, 2, 3, 0, 0, 2, 1, 1, 3, 3, 0])

In [67]: cumcount(a)
Out[67]: array([0, 0, 1, 1, 0, 0, 2, 3, 1, 2, 3, 1, 2, 4])