列表中的连续零数

时间:2017-08-27 10:33:43

标签: python list numpy

我有一个由1&0和#0组成的列表,例如

[0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0]

我想输出另一个相同长度的列表,其中每个条目代表刚刚消失的连续0的数量,即上面例子的输出将是:

[0, 1, 2, 3, 4, 0, 0, 0, 0, 0, 1, 0]

请注意,输出列表的第一个条目始终为0,并且输入列表的最后一个条目并不重要。

到目前为止我已尝试过:

def zero_consecutive(input_list):
    output = [0]
    cons = 0
    for i in input_list[:-1]:
        if i == 0:
            cons += 1
            output.append(cons)
        else:
            cons = 0
            output.append(cons)

    return output

它适用于该示例,但可能有更有效的方法来涵盖更多边缘情况。

7 个答案:

答案 0 :(得分:6)

您可以编写生成器函数,然后将其强制转换为append,而不是list列表中所有内容的函数。一般来说,它更短,在大多数情况下甚至更快(同时做同样的事情)!

def zero_consecutive(input_list):
    yield 0
    cons = 0
    for i in input_list[:-1]:
        if i == 0:
            cons += 1
        else:
            cons = 0
        yield cons

>>> list(zero_consecutive([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0]))
[0, 1, 2, 3, 4, 0, 0, 0, 0, 0, 1, 0]

答案 1 :(得分:4)

你说你真的对一个非常快速的解决方案感兴趣。如果性能至关重要,您可以使用C扩展类型,例如使用Cython。

我正在使用IPython,所以我只使用cythonmagic:

%load_ext cython

让Cython编译这个迭代器类:

%%cython

cdef class zero_consecutive_cython(object):
    cdef long cons
    cdef object input_list
    cdef int started

    def __init__(self, input_list):
        self.input_list = iter(input_list[:-1])
        self.cons = 0
        self.started = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.started == 0:
            self.started = 1
            return 0
        item = next(self.input_list)
        if item == 0:
            self.cons += 1
        else:
            self.cons = 0
        return self.cons

它与the other answer中提到的生成器函数基本相同,但它更快:

import numpy as np

def zero_consecutive_numpy(input_list):  # from https://stackoverflow.com/a/45905344/5393381
    a = np.array(input_list)
    idx = np.flatnonzero(a[1:] != a[:-1])+2
    out = np.ones(a.size,dtype=int)   
    out[0] = 0

    if len(idx)==0:
        out = np.arange(a.size)
    elif len(idx)==1:
        out[idx[0]] = -a.size
        np.cumsum(out, out=out)
        out[out<0] = 0
    else:    
        out[idx[0]] = 2-idx[1]
        if len(idx)%2==1:
            out[idx[-1]] = -a.size
            out[idx[2:-1:2]] = 1-idx[3:-1:2] - idx[1:-3:2]
        else:
            out[idx[2::2]] = 1-idx[3::2] - idx[1:-2:2]
        np.cumsum(out, out=out)
        out[out<0] = 0
    return out

def zero_consecutive_python(input_list):  # from https://stackoverflow.com/a/45904440/5393381
    yield 0
    cons = 0
    for i in input_list[:-1]:
        if i == 0:
            cons += 1
        else:
            cons = 0
        yield cons

np.random.seed(0)

for n in [200, 2000, 20000, 100000]:
    print(n)
    a = np.repeat(np.arange(n)%2, np.random.randint(3,8,(n))).tolist()

    %timeit list(zero_consecutive_python(a))
    %timeit list(zero_consecutive_cython(a))
    %timeit zero_consecutive_numpy(a)

给我这个结果:

200
380 µs ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)    # python
122 µs ± 1.06 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)   # cython
488 µs ± 7.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)    # numpy
2000
3.49 ms ± 26.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)    # python
1.07 ms ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)   # cython
3.85 ms ± 288 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)     # numpy
20000
42.9 ms ± 3.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)     # python
15 ms ± 778 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)       # cython
33.9 ms ± 670 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)      # numpy
100000
199 ms ± 2.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)        # python
77.8 ms ± 507 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)      # cython
173 ms ± 4.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)      # numpy

至少在我的电脑上,似乎这可以比其他方法击败2-3倍。

答案 2 :(得分:3)

这是一个矢量化解决方案 -

def zero_consecutive_vectorized(input_list):
    a = np.array(input_list)
    idx = np.flatnonzero(a[1:] != a[:-1])+2
    out = np.ones(a.size,dtype=int)   
    out[0] = 0

    if len(idx)==0:
        out = np.arange(a.size)
    elif len(idx)==1:
        out[idx[0]] = -a.size
        np.cumsum(out, out=out)
        out[out<0] = 0
    else:    
        out[idx[0]] = 2-idx[1]
        if len(idx)%2==1:
            out[idx[-1]] = -a.size
            out[idx[2:-1:2]] = 1-idx[3:-1:2] - idx[1:-3:2]
        else:
            out[idx[2::2]] = 1-idx[3::2] - idx[1:-2:2]
        np.cumsum(out, out=out)
        out[out<0] = 0
    return out

示例运行 -

In [493]: a = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [494]: zero_consecutive_vectorized(a)
Out[494]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

In [495]: a = [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In [496]: zero_consecutive_vectorized(a)
Out[496]: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [497]: a = [0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0]

In [498]: zero_consecutive_vectorized(a)
Out[498]: [0, 1, 2, 3, 4, 0, 0, 0, 0, 0, 1, 0]

运行时测试

针对@ MSeifert解决方案的时间安排,这个解决方案似乎与众多环路解决方案竞争激烈 -

In [579]: n = 10000

In [580]: a = np.repeat(np.arange(n)%2, np.random.randint(3,8,(n))).tolist()

In [581]: %timeit list(zero_consecutive(a))
     ...: %timeit zero_consecutive_vectorized(a)
     ...: 
100 loops, best of 3: 2.85 ms per loop
100 loops, best of 3: 1.96 ms per loop

In [582]: n = 60000

In [583]: a = np.repeat(np.arange(n)%2, np.random.randint(3,8,(n))).tolist()

In [584]: %timeit list(zero_consecutive(a))
     ...: %timeit zero_consecutive_vectorized(a)
     ...: 
100 loops, best of 3: 17.2 ms per loop
100 loops, best of 3: 12 ms per loop

答案 3 :(得分:2)

这有效:

def zero_consecutive(a):
    y = []
    for i, _ in enumerate(a):
        #prevents a StopIteration error
        if not(1 in a[:i]): y.append(i)
        else:
            index = next(j for j in range(i-1, -1, -1) if a[j])
            y.append(i - index - 1)
    return y

答案 4 :(得分:2)

以下是使用itertools.groupby检测零游程的方法:

from itertools import groupby

def zero_consecutive(input_list):
    result = [0]
    for k, values in groupby(input_list[:-1], bool):
        len_values = len(list(values))
        if k:
            result.extend([0] * len_values)
        else:
            result.extend(range(1, len_values + 1))
    return result

>>> zero_consecutive([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0])
[0, 1, 2, 3, 4, 0, 0, 0, 0, 0, 1, 0]

通过使用lambda表达式x == 0作为键进行分组,以便等效地处理非零值。这意味着该函数适用于包含0和1以外值的列表,例如:

>>> zero_consecutive([0, 0, 0, 0, 1, 2, 'a', 2, 1000, 0, 1, 0])
[0, 1, 2, 3, 4, 0, 0, 0, 0, 0, 1, 0]

答案 5 :(得分:1)

另一种使用numpyscipy的解决方案,以获得乐趣

import numpy as np
from scipy.ndimage.measurements import label
from scipy.ndimage.interpolation import shift

a = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0])
a_zeros = a == 0
labels = label(a_zeros)[0]

for l in np.unique(labels):
    a[labels == l] = a_zeros[labels == l].cumsum()

shift(a, 1, output=a)

>>> a
Out[1]:
array([0, 1, 2, 3, 4, 0, 0, 0, 0, 0, 1, 0])

如果你想要它的功能。

def zero_consecutive(array):
    a = array.copy()
    a_zeros = a == 0
    labels = label(a_zeros)[0]

    for l in np.unique(labels):
        a[labels == l] = a_zeros[labels == l].cumsum()

    shift(a, 1, output=a)
    return a

编辑:改进版

更好的表现。

import numpy as np
from scipy.ndimage.measurements import label
from scipy.ndimage.interpolation import shift
from scipy.ndimage.measurements import labeled_comprehension

def zero_consecutive(array):
    def func(a, idx):
        r[idx] = a.astype(bool).cumsum()
            return True
    r = np.zeros_like(array)
    labels, nlabels = label(array == 0)
    labeled_comprehension(labels, labels, np.arange(1, nlabels + 1), func, int, 0, pass_positions=True)

    return shift(r, 1)

答案 6 :(得分:-2)

list(map(int,list(''.join(['0' if elem=='' else ''.join(map(str,list(range(len(elem)+1)))) for elem in str([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0]).strip('[').strip(']').replace(', ','').split('1')])[0:-1])))

这个列表理解怎么样。