我有一个较小的一维NumPy数组,长度为100 ......
我想找到一个子数组出现的次数。假设数组的每个元素具有1或0。我想计算至少发生3个0的实例是一行。
对于np.array([0,0,0,0,1,0,0,1,0,0,0])
我想退回2
对于np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0])
我想退回2
对于np.array([0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0])
我想退回5
我尝试转换为string并使用string.count()。这很好,但是我需要一个更快的解决方案。我每分钟要执行数百万次此功能。
目前,我正在遍历数组,这很慢,但是令人惊讶的是(4X)比强制转换为字符串要快得多(我知道强制转换很慢,字符串操作也更慢...。)
任何想法都会受到赞赏。
关于itertools建议:
我在性能方面写了一张小支票:
import numpy as np
import itertools
import time
def itertools_solution(full_array):
trueFalse = full_array == 0
count = [ sum( 1 for _ in group ) for key, group in itertools.groupby( trueFalse ) if key ]
above = [val for val in count if val >= 3]
return len(above)
def looping_solution(full_array):
total_count = 0
running_count = 0
for counter, val in enumerate(full_array):
if val == 0:
running_count += 1
if running_count == 3:
total_count += 1
else:
running_count = 0
return total_count
a = np.array([[0,0,0,0,2,0,0,5,0,0,0,5,5,5],
[0,0,0,0,0,1,1,0,1,4,0,0,4,4],
[0,0,1,1,0,0,4,4,4,0,4,0,0,1],
[3,2,2,3,3,0,0,3,2,6,6,6,0,0],
[0,1,4,5,0,4,0,0,0,5,0,2,1,0],
[0,0,3,6,6,6,0,0,0,2,2,3,3,6],
[2,0,0,2,5,5,5,0,0,0,5,0,0,0],
[1,3,0,0,1,3,3,6,6,0,0,4,6,0],
[5,5,5,0,0,2,2,2,5,0,0,0,2,2],
[6,6,6,0,0,0,6,0,3,3,3,0,0,3],
[4,4,0,4,4,0,0,1,0,1,1,1,0,0]]).flatten()
time_start = time.time()
for cnt in range(1000):
itertools_solution(a)
print('itertools took %f seconds' % (time.time() - time_start))
time_start = time.time()
for cnt in range(1000):
looping_solution(a)
print('looping took %f seconds' % (time.time() - time_start))
有结果:
itertools花费了0.185000秒 循环花了0.038001秒
尽管如此,但不幸的是它不能解决我的性能问题...
答案 0 :(得分:1)
我们可以找到所有对象的位置,对它们进行比较,并计算差异大于3的实例
>>> def zero_ranges(arr, n):
... return np.where(np.diff(np.where(np.concatenate(([1], arr, [1]))==1)[0])>n)[0].size
...
>>> zero_ranges(np.array([0,0,0,0,1,0,0,1,0,0,0]), 3)
2
>>> zero_ranges(np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0]), 3)
2
>>> zero_ranges(np.array([0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]), 3)
5
>>>