Numpy sum运行长度为非零值

时间:2015-04-26 02:27:33

标签: python arrays performance numpy vectorization

寻找一个快速矢量化函数,它返回连续非零值的滚动数。每当遇到零时,计数应从0开始。结果应该与输入数组具有相同的形状。

给出这样的数组:

x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1])

该函数应该返回:

array([1, 2, 3, 0, 0, 1, 0, 1, 2])

3 个答案:

答案 0 :(得分:4)

本文列出了一种矢量化方法,基本上包括两个步骤:

  1. 初始化与输入向量x相同大小的零向量,并在与x的非零对应的位置设置1。

  2. 接下来,在该向量中,我们需要在每个"岛"的结束/停止位置之后立即减去每个岛的游程长度。目的是稍后再次使用cumsum,这将导致"岛屿#34;和其他地方的零。

  3. 以下是实施 -

    import numpy as np
    
    #Append zeros at the start and end of input array, x
    xa = np.hstack([[0],x,[0]])
    
    # Get an array of ones and zeros, with ones for nonzeros of x and zeros elsewhere
    xa1 =(xa!=0)+0
    
    # Find consecutive differences on xa1
    xadf = np.diff(xa1)
    
    # Find start and stop+1 indices and thus the lengths of "islands" of non-zeros
    starts = np.where(xadf==1)[0]
    stops_p1 = np.where(xadf==-1)[0]
    lens = stops_p1 - starts
    
    # Mark indices where "minus ones" are to be put for applying cumsum
    put_m1 = stops_p1[[stops_p1 < x.size]]
    
    # Setup vector with ones for nonzero x's, "minus lens" at stops +1 & zeros elsewhere
    vec = xa1[1:-1] # Note: this will change xa1, but it's okay as not needed anymore
    vec[put_m1] = -lens[0:put_m1.size]
    
    # Perform cumsum to get the desired output
    out = vec.cumsum()
    

    示例运行

    In [116]: x
    Out[116]: array([ 0. ,  2.3,  1.2,  4.1,  0. ,  0. ,  5.3,  0. ,  1.2,  3.1,  0. ])
    
    In [117]: out
    Out[117]: array([0, 1, 2, 3, 0, 0, 1, 0, 1, 2, 0], dtype=int32)
    

    运行时测试 -

    这里有一些运行时测试将建议的方法与其他itertools.groupby based approach进行比较 -

    In [21]: N = 1000000
        ...: x = np.random.rand(1,N)
        ...: x[x>0.5] = 0.0
        ...: x = x.ravel()
        ...: 
    
    In [19]: %timeit sumrunlen_vectorized(x)
    10 loops, best of 3: 19.9 ms per loop
    
    In [20]: %timeit sumrunlen_loopy(x)
    1 loops, best of 3: 2.86 s per loop
    

答案 1 :(得分:2)

您可以使用itertools.groupbynp.hstack

>>> import numpy as np
>>> x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1])
>>> from itertools import groupby

>>> np.hstack([[i if j!=0 else j for i,j in enumerate(g,1)] for _,g in groupby(x,key=lambda x: x!=0)])
array([ 1.,  2.,  3.,  0.,  0.,  1.,  0.,  1.,  2.])

我们可以基于非零元素对数组进行分组,然后使用列表推导并枚举用这些索引替换非零子数组,然后用np.hstack展平列表。

答案 2 :(得分:0)

这个子问题出现在 Kick Start 2021 年的 A 轮中。我的解决方案:

let user = document.getElementsByClassName("username")[0].value;
let pass = document.getElementsByClassName("password")[0].value;

事实上,这个问题还需要一个你向下连续计数的版本。因此,这里是另一个带有可选关键字参数的版本,它对 def current_run_len(a): a_ = np.hstack([0, a != 0, 0]) # first in starts and last in stops defined d = np.diff(a_) starts = np.where(d == 1)[0] stops = np.where(d == -1)[0] a_[stops + 1] = -(stops - starts) # +1 for behind-last return a_[1:-1].cumsum() 执行相同的操作:

rev=False

结果:

def current_run_len(a, rev=False):
    a_ = np.hstack([0, a != 0, 0])  # first in starts and last in stops defined
    d = np.diff(a_)
    starts = np.where(d == 1)[0]
    stops = np.where(d == -1)[0]
    if rev:
        a_[starts] = -(stops - starts)
        cs = -a_.cumsum()[:-2]
    else:
        a_[stops + 1] = -(stops - starts)  # +1 for behind-last
        cs = a_.cumsum()[1:-1]
    return cs
a = np.array([1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1])
print('a                             = ', a)
print('current_run_len(a)            = ', current_run_len(a))
print('current_run_len(a, rev=True)  = ', current_run_len(a, rev=True))

对于仅由 0 和 1 组成的数组,您可以将 a = [1 1 1 1 0 0 0 1 1 0 1 0 0 0 1] current_run_len(a) = [1 2 3 4 0 0 0 1 2 0 1 0 0 0 1] current_run_len(a, rev=True) = [4 3 2 1 0 0 0 2 1 0 1 0 0 0 1] 简化为 [0, a != 0, 0]。但发布的版本也适用于任意非零数字。