Question

我想用cumsum数组做非零numpy。只需在数组中跳过零并应用cumsum即可。假设我有一个np。数组

a = np.array([1,2,1,2,5,0,9,6,0,2,3,0])

我的结果应该是

[1,3,4,6,11,0,20,26,0,28,31,0]

我试过这个

a = np.cumsum(a[a!=0])

但结果是

[1,3,4,6,11,20,26,28,31]

有什么想法吗？

Answer 1

您需要屏蔽原始数组，以便只覆盖非零元素：

In [9]:
a = np.array([1,2,1,2,5,0,9,6,0,2,3,0])
a[a!=0] = np.cumsum(a[a!=0])
a

Out[9]:
array([ 1,  3,  4,  6, 11,  0, 20, 26,  0, 28, 31,  0])

另一种方法是使用np.where：

In [93]:
a = np.array([1,2,1,2,5,0,9,6,0,2,3,0])
a = np.where(a!=0,np.cumsum(a),a)
a

Out[93]:
array([ 1,  3,  4,  6, 11,  0, 20, 26,  0, 28, 31,  0])

<强>定时

In [91]:
%%timeit
a = np.array([1,2,1,2,5,0,9,6,0,2,3,0])
a[a!=0] = np.cumsum(a[a!=0])
a

The slowest run took 4.93 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 12.6 µs per loop

In [94]:    
%%timeit
a = np.array([1,2,1,2,5,0,9,6,0,2,3,0])
a = np.where(a!=0,np.cumsum(a),a)
a

The slowest run took 6.00 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 10.5 µs per loop

以上显示np.where比第一种方法

略快

Answer 2

在我看来，jotasi在对OP的评论中提出的建议是最惯用的。这里有一些时间，但请注意Shawn。 L''s答案返回一个Python列表，而不是NumPy数组，因此它们不具有严格的可比性。

import numpy as np

def jotasi(a):
  b = np.cumsum(a)
  b[a==0] = 0
  return b

def EdChum(a):
  a[a!=0] = np.cumsum(a[a!=0])
  return a

def ShawnL(a):
  b=np.cumsum(a)
  b = [b[i]  if ((i > 0 and b[i] != b[i-1]) or i==0) else 0 for i in range(len(b))]
  return b

def Ed2(a):
  return np.where(a!=0,np.cumsum(a),a)

为了测试，我在[0,100]中生成了一个1E5整数的NumPy数组。因此，大约1％是0.这些结果来自NumPy 1.9.2，Python 2.7.12，并且从最慢到最快呈现：

import timeit
a = np.random.random_integers(0,100,100000)

len(a[a==0]) #verify there are some 0's
1003

timeit.timeit("ShawnL(a)", "from __main__ import a,EdChum,ShawnL,jotasi,Ed2", number=250)
11.743098020553589
timeit.timeit("EdChum(a)", "from __main__ import a,EdChum,ShawnL,jotasi,Ed2", number=250)
0.1794271469116211
timeit.timeit("Ed2(a)", "from __main__ import a,EdChum,ShawnL,jotasi,Ed2", number=250)
0.1282949447631836
timeit.timeit("jotasi(a)", "from __main__ import a,EdChum,ShawnL,jotasi,Ed2", number=250)
0.09286999702453613

我有点惊讶的是，jotasi和Ed Chum的答案之间有如此大的差异 - 我猜想最小化布尔运算是显而易见的。毫无疑问，列表理解很慢。

Answer 3

试图简化它：）

b=np.cumsum(a)
[b[i]  if ((i > 0 and b[i] != b[i-1]) or i==0) else 0 for i in range(len(b))]

Python numpy非零的cumsum

3 个答案: