计算Python中非零值的簇数?

时间:2016-12-31 21:56:43

标签: python pandas numpy

我的数据看起来像这样:

a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]

基本上,在非零数字之前有一堆零,我希望计算由零分隔的非零数字组的数量。在上面的示例数据中,有3组非零数据,因此代码应返回3.

  • 非零组之间的零数是可变的

在python中执行此操作的任何好方法? (还使用Pandas和Numpy来帮助解析数据)

5 个答案:

答案 0 :(得分:6)

使用a作为输入数组,我们可以使用矢量化解决方案 -

m = a!=0
out = (m[1:] > m[:-1]).sum() + m[0]

除了性能之外,我们可能会使用np.count_nonzero来计算bool非常有效,就像这样 -

out = np.count_nonzero(m[1:] > m[:-1]) + m[0] 

基本上,我们得到non-zeros的掩码并计算上升沿。为了考虑第一个也可能非零且不会有任何上升边缘的元素,我们需要检查它并加上总和。

此外,请注意,如果输入a是一个列表,我们需要使用m = np.asarray(a)!=0

三个案例的样本运行 -

In [92]: a  # Case1 :Given sample
Out[92]: 
array([ 0,  0,  0,  0,  0,  0, 10, 15, 16, 12, 11,  9, 10,  0,  0,  0,  0,
        0,  6,  9,  3,  7,  5,  4,  0,  0,  0,  0,  0,  0,  4,  3,  9,  7,
        1])

In [93]: m = a!=0

In [94]: (m[1:] > m[:-1]).sum() + m[0]
Out[94]: 3

In [95]: a[0] = 7  # Case2 :Add a non-zero elem/group at the start

In [96]: m = a!=0

In [97]: (m[1:] > m[:-1]).sum() + m[0]
Out[97]: 4

In [99]: a[-2:] = [0,4] # Case3 :Add a non-zero group at the end

In [100]: m = a!=0

In [101]: (m[1:] > m[:-1]).sum() + m[0]
Out[101]: 5

答案 1 :(得分:4)

您可以使用itertools.groupby()列表理解表达式表达为:

>>> from itertools import groupby

>>> len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])
3

答案 2 :(得分:2)

简单的python解决方案,通过跟踪前一个值(上升沿检测)来计算从0到非零的变化:

a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]

previous = 0
count = 0
for c in a:
    if previous==0 and c!=0:
        count+=1
    previous = c

print(count)  # 3

答案 3 :(得分:2)

  • pad数组,两边都为np.concatenate
  • 使用a == 0
  • 查找零位
  • 使用np.diff
  • 查找边界
  • 总结了sum
  • 找到的边界
  • 除以2,因为我们会找到我们想要的两倍
def nonzero_clusters(a):
    return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)

演示

nonzero_clusters(
    [0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
)

3
nonzero_clusters([0, 1, 2, 0, 1, 2])

2
nonzero_clusters([0, 1, 2, 0, 1, 2, 0])

2
nonzero_clusters([1, 2, 0, 1, 2, 0, 1, 2])

3

<强> 定时
a = np.random.choice((0, 1), 100000)
代码

from itertools import groupby

def div(a):
    m = a != 0
    return (m[1:] > m[:-1]).sum() + m[0]

def pir(a):
    return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)

def jean(a):
    previous = 0
    count = 0
    for c in a:
        if previous==0 and c!=0:
            count+=1
        previous = c
    return count

def moin(a):
    return len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])

def user(a):
    return sum([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])

enter image description here

答案 4 :(得分:1)

sum ([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])