计算与元组元组中的模式匹配的元素

时间:2014-04-26 12:08:33

标签: python performance python-2.7 optimization

我有一个矩阵m,我想计算零的数量。

m=((2,0,2,2),(4,4,5,4),(0,9,4,8),(2,2,0,0))

我目前的代码如下:

def zeroCount(M):
    return [item for row in M for item in row].count(0)
    # list of lists is flattened to form single list, and number of 0 are counted

有没有办法更快地完成这项工作?目前,我在4乘4矩阵上执行函数需要0.4s来执行函数20,000次,其中矩阵同样可能包含零,因为它们不是。

一些可能的起点(但我无法比我的代码更快地工作)是其他问题:counting non-zero elements in numpy arrayfinding the indices of non-zero elementscounting non-zero elements in iterable

6 个答案:

答案 0 :(得分:4)

迄今为止最快:

def count_zeros(matrix):
    total = 0
    for row in matrix:
        total += row.count(0)
    return total

对于2D元组,你可以use a generator expression

def count_zeros_gen(matrix):
    return sum(row.count(0) for row in matrix)

时间比较:

%timeit [item for row in m for item in row].count(0) # OP
1000000 loops, best of 3: 1.15 µs per loop

%timeit len([item for row in m for item in row if item == 0]) # @thefourtheye
1000000 loops, best of 3: 913 ns per loop

%timeit sum(row.count(0) for row in m) 
1000000 loops, best of 3: 1 µs per loop

%timeit count_zeros(m)
1000000 loops, best of 3: 775 ns per loop

对于基线:

def f(m): pass
%timeit f(m)
10000000 loops, best of 3: 110 ns per loop

答案 1 :(得分:3)

这是我的答案。

reduce(lambda a, b: a + b, m).count(0)

时间:

%timeit count_zeros(m)                                        #@J.F. Sebastian
1000000 loops, best of 3: 813 ns per loop

%timeit len([item for row in m for item in row if item == 0]) #@thefourtheye
1000000 loops, best of 3: 974 ns per loop

%timeit reduce(lambda a, b: a + b, m).count(0)                #Mine
1000000 loops, best of 3: 1.02 us per loop

%timeit countzeros(m)                                         #@frostnational
1000000 loops, best of 3: 1.07 us per loop

%timeit sum(row.count(0) for row in m)                        #@J.F. Sebastian
1000000 loops, best of 3: 1.28 us per loop

%timeit [item for row in m for item in row].count(0)          #OP
1000000 loops, best of 3: 1.53 us per loop

@ thefourtheye是最快的。这是因为函数调用很少。

@ J.F。塞巴斯蒂安是我环境中最快的。我不知道为什么......

答案 2 :(得分:2)

您的解决方案的问题在于,您必须再次迭代列表以获得计数O(N)。但是len函数可以在O(1)中得到计数。

使用此

可以更快地完成此操作
def zeroCount(M):
    return len([item for row in M for item in row if item == 0])

答案 3 :(得分:2)

检查一下:

from itertools import chain, filterfalse # ifilterfalse for Python 2
def zeroCount(m):
    total = 0
    for x in filterfalse(bool, chain(*m)): 
        total += 1
    return total

Python 3.3.3上的性能测试:

from timeit import timeit
from itertools import chain, filterfalse
import functools

m = ((2,0,2,2),(4,4,5,4),(0,9,4,8),(2,2,0,0))

def zeroCountOP():
    return [item for row in m for item in row].count(0)

def zeroCountTFE():
    return len([item for row in m for item in row if item == 0])

def zeroCountJFS():
    return sum(row.count(0) for row in m)

def zeroCountuser2931409():
    # `reduce` is in `functools` in Py3k
    return functools.reduce(lambda a, b: a + b, m).count(0)

def zeroCount():
    total = 0
    for x in filterfalse(bool, chain(*m)): 
        total += 1
    return total

print('Original code     ', timeit(zeroCountOP, number=100000))
print('@J.F.Sebastian    ', timeit(zeroCountJFS, number=100000))
print('@thefourtheye     ', timeit(zeroCountTFE, number=100000))
print('@user2931409      ', timeit(zeroCountuser2931409, number=100000))
print('@frostnational    ', timeit(zeroCount, number=100000))

以上给出了这些结果:

Original code      0.244224319984056
@thefourtheye      0.22169152169497108
@user2931409       0.19247795242092186
@frostnational     0.18846473728790825
@J.F.Sebastian     0.1439318853410907

@ J.F.Sebastian的解决方案是胜利者,我的是一名亚军(大约慢了20%)。

Python 2和Python 3的综合解决方案:

import sys
import itertools

if sys.version_info < (3, 0, 0):
    filterfalse = getattr(itertools, 'ifilterfalse')
else:
    filterfalse = getattr(itertools, 'filterfalse')


def countzeros(matrix):
    ''' Make a good use of `itertools.filterfalse`
        (`itertools.ifilterfalse` in case of Python 2) to count 
        all 0s in `matrix`. '''
    counter = 0
    for _ in filterfalse(bool, itertools.chain(*matrix)):
        counter += 1
    return counter


if __name__ == '__main__':
    # Benchmark
    from timeit import repeat
    print(repeat('countzeros(((2,0,2,2),(4,4,5,4),(0,9,4,8),(2,2,0,0)))',
                 'from __main__ import countzeros',
                 repeat=10,
                 number=100000))

答案 4 :(得分:1)

使用numpy

import numpy

m=((2,0,2,2),(4,4,5,4),(0,9,4,8),(2,2,0,0))
numpy_m = numpy.array(m)
print numpy.sum(numpy_m == 0)

以上如何运作?首先,您的“矩阵”将转换为numpy数组(numpy.array(m))。然后,检查每个条目是否为零(numpy_m == 0)。这产生二进制数组。对此二进制数组求和可得出原始数组中的零元素数。

请注意,对于较大的矩阵,numpy显然是有效的。与普通的python代码相比,4x4可能太小而无法看到大的性能差异,尤其是如果你正在初始化如上所述的python“矩阵”。

答案 5 :(得分:0)

一个笨拙的解决方案是:

import numpy as np

m = ((2,0,2,2),(4,4,5,4),(0,9,4,8),(2,2,0,0))
mm = np.array(m)

def zeroCountSmci():
    return (mm==0).sum() # sums across all axes, by default