Python:计算numpy数组中大小可变的子数组的实例

时间:2018-07-15 18:49:42

标签: python numpy

我有一个较小的一维NumPy数组,长度为100 ......

我想找到一个子数组出现的次数。假设数组的每个元素具有1或0。我想计算至少发生3个0的实例是一行。

对于np.array([0,0,0,0,1,0,0,1,0,0,0]) 我想退回2

对于np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0]) 我想退回2

对于np.array([0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]) 我想退回5

我尝试转换为string并使用string.count()。这很好,但是我需要一个更快的解决方案。我每分钟要执行数百万次此功能。

目前,我正在遍历数组,这很慢,但是令人惊讶的是(4X)比强制转换为字符串要快得多(我知道强制转换很慢,字符串操作也更慢...。)

任何想法都会受到赞赏。

关于itertools建议:

我在性能方面写了一张小支票:

import numpy as np
import itertools
import time

def itertools_solution(full_array):
  trueFalse = full_array == 0
  count = [ sum( 1 for _ in group ) for key, group in itertools.groupby( trueFalse ) if key ]
  above = [val for val in count if val >= 3]
  return len(above)

def looping_solution(full_array):
  total_count = 0
  running_count = 0
  for counter, val in enumerate(full_array):
    if val == 0:
      running_count += 1
      if running_count == 3:
        total_count += 1
    else:
      running_count = 0
  return total_count

a = np.array([[0,0,0,0,2,0,0,5,0,0,0,5,5,5],
[0,0,0,0,0,1,1,0,1,4,0,0,4,4],
[0,0,1,1,0,0,4,4,4,0,4,0,0,1],
[3,2,2,3,3,0,0,3,2,6,6,6,0,0],
[0,1,4,5,0,4,0,0,0,5,0,2,1,0],
[0,0,3,6,6,6,0,0,0,2,2,3,3,6],
[2,0,0,2,5,5,5,0,0,0,5,0,0,0],
[1,3,0,0,1,3,3,6,6,0,0,4,6,0],
[5,5,5,0,0,2,2,2,5,0,0,0,2,2],
[6,6,6,0,0,0,6,0,3,3,3,0,0,3],
[4,4,0,4,4,0,0,1,0,1,1,1,0,0]]).flatten()

time_start = time.time()
for cnt in range(1000):
  itertools_solution(a)
print('itertools took %f seconds' % (time.time() - time_start))
time_start = time.time()
for cnt in range(1000):
  looping_solution(a)
print('looping took %f seconds' % (time.time() - time_start))

有结果:

itertools花费了0.185000秒 循环花了0.038001秒

尽管如此,但不幸的是它不能解决我的性能问题...

1 个答案:

答案 0 :(得分:1)

我们可以找到所有对象的位置,对它们进行比较,并计算差异大于3的实例

>>> def zero_ranges(arr, n):
...    return np.where(np.diff(np.where(np.concatenate(([1], arr, [1]))==1)[0])>n)[0].size
...
>>> zero_ranges(np.array([0,0,0,0,1,0,0,1,0,0,0]), 3)
2
>>> zero_ranges(np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0]), 3)
2
>>> zero_ranges(np.array([0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]), 3)
5
>>>