获取列表的条件子集的大小

时间:2018-02-01 10:32:27

标签: python performance list subset

假设您有一个包含任意数量项目的列表,并且您希望获得符合特定条件的项目数。我有两种方式以合理的方式做到这一点,但我不确定哪一个是最好的(更多 pythonic ) - 或者是否有更好的选择(不会牺牲太多的可读性)。< / p>

import numpy.random as nprnd
import timeit

my = nprnd.randint(1000, size=1000000)

def with_len(my_list):
    much = len([t for t in my_list if t >= 500])

def with_sum(my_list):
    many = sum(1 for t in my_list if t >= 500)

t1 = timeit.Timer('with_len(my)', 'from __main__ import with_len, my')
t2 = timeit.Timer('with_sum(my)', 'from __main__ import with_sum, my')

print("with len:",t1.timeit(1000)/1000)
print("with sum:",t2.timeit(1000)/1000)

这两种情况的表现几乎相同。但是,哪些更多 pythonic ?或者有更好的选择吗?

对于那些好奇的人,我测试了提出的解决方案(来自评论和答案),结果如下:

import numpy as np
import timeit
import functools

my = np.random.randint(1000, size=100000)

def with_len(my_list):
    return len([t for t in my_list if t >= 500])

def with_sum(my_list):
    return sum(1 for t in my_list if t >= 500)

def with_sum_alt(my_list):
    return sum(t >= 500 for t in my_list)

def with_lambda(my_list):
    return functools.reduce(lambda a, b: a + (1 if b >= 500 else 0), my_list, 0)

def with_np(my_list):
    return len(np.where(my_list>=500)[0])

t1 = timeit.Timer('with_len(my)', 'from __main__ import with_len, my')
t2 = timeit.Timer('with_sum(my)', 'from __main__ import with_sum, my')
t3 = timeit.Timer('with_sum_alt(my)', 'from __main__ import with_sum_alt, my')
t4 = timeit.Timer('with_lambda(my)', 'from __main__ import with_lambda, my')
t5 = timeit.Timer('with_np(my)', 'from __main__ import with_np, my')

print("with len:", t1.timeit(1000)/1000)
print("with sum:", t2.timeit(1000)/1000)
print("with sum_alt:", t3.timeit(1000)/1000)
print("with lambda:", t4.timeit(1000)/1000)
print("with np:", t5.timeit(1000)/1000)

Python 2.7

('with len:', 0.02201753337348283)
('with sum:', 0.022727363518455238)
('with sum_alt:', 0.2370256687439941) # <-- very slow!
('with lambda:', 0.026367264818657078)
('with np:', 0.0005811764306089913) # <-- very fast!

Python 3.6

with len: 0.017649643657480736
with sum: 0.0182978007766851
with sum_alt: 0.19659815740239048
with lambda: 0.02691670741400111
with np: 0.000534095418615152

2 个答案:

答案 0 :(得分:3)

第二个,with_sum更具pythonic意义上它使用更少的内存,因为它不构建整个列表,因为生成器表达式被馈送到sum()

答案 1 :(得分:1)

我和@Chris_Rands在一起。但就性能而言,使用numpy的方法更快:

import numpy as np

def with_np(my_list):
    return len(np.where(my_list>=500)[0])