哪个更快?检查某些内容是否在Python列表中?即会员资格与非会员资格

时间:2017-10-02 05:06:44

标签: python performance list membership

对于那些比我更了解计算机科学的人来说,这可能是一个菜鸟问题或者是显而易见的。也许这就是为什么我在搜索之后找不到Google或SO的任何内容。也许我没有使用正确的词汇。

标题说明了一切。如果我知道import React from 'react' import { connect } from 'react-redux' const mapStateToProps = (state) => ({ newCampaignInformation: state.DealReducer.newCampaignInformation })) connect(mapStateToProps)(CalendarPicker) 大部分时间都在x,那么以下哪项更快?

my_list

if x in my_list:
    func1(x)
else:
    func2(x)

列表的大小是否重要?例如。十元素与10,000元素?对于我的特定情况if x not in my_list: func2(x) else: func1(x) 由字符串和整数组成,但有没有人知道其他考虑是否适用于更复杂的类型,如dicts?

谢谢。

4 个答案:

答案 0 :(得分:4)

检查元素是否在列表中,或者元素是否在调用相同操作x in my_list的列表中,因此不应存在任何差异。

  

列表的大小是否重要?

检查元素是否在列表中是一个O(N)操作,这意味着大小确实很重要,大致成比例。

如果您需要进行大量检查,您可能需要查看set,检查set中的元素是否为O(1),这意味着检查时间不会改变就像set的大小增加一样。

答案 1 :(得分:2)

应该没有明显的性能差异。你最好不要写任何一个让你的代码更具可读性的文章。任何一个都是O(n)复杂度,并且主要取决于元素在列表中的位置。此外,您应该避免过早优化,对大多数用例来说并不重要,如果确实如此,通常最好不要使用其他数据结构。

如果要以更快的性能查找,请使用dicts,它们可能具有O(1)复杂性。 有关详细信息,请参阅https://wiki.python.org/moin/TimeComplexity

答案 2 :(得分:1)

Python包含一个模块和函数timeit,它可以告诉您执行代码片段需要多长时间。该片段必须是单个语句,这样就不会像if那样直接对复合语句进行计时,但我们可以将语句包装在函数中并为函数调用时间。

调用timeit.timeit()比使用jupyter笔记本并在一行的开头使用魔术%timeit魔术语句更容易。

这证明了长列表或简短,成功或失败,您询问的两种方式,检查in alistnot in alist,在测量的可变性范围内给出相同的时间。

import random

# set a seed so results will be repeatable
random.seed(456789)
# a 10K long list of junk with no value greater than 100
my_list = [random.randint(-100, 100) for i in range(10000)] 
def func1(x):
    # included just so we get a function call
    return True
def func2(x):
    # included just so we get a function call
    return False
def way1(x):
    if x in my_list:
        result = func1(x)
    else:
        result = func2(x)
    return result
def way2(x):
    if x not in my_list:
        result = func2(x)
    else:
        result = func1(x)
    return result
%timeit way1(101) # failure with large list

The slowest run took 8.29 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 207 µs per loop
%timeit way1(0) # success with large list

The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.04 µs per loop
my_list.index(0)

186
%timeit way2(101) # failure with large list

The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 208 µs per loop
%timeit way2(0) # success with large list

The slowest run took 7.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.01 µs per loop
my_list = my_list[:10] # now make it a short list
print(my_list[-1]) # what is the last value

-37
# Run the same stuff again against the smaller list, showing that it is
# much faster but still way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list

The slowest run took 18.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 417 ns per loop
The slowest run took 13.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 403 ns per loop
The slowest run took 5.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 427 ns per loop
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 386 ns per loop
# run the same again to get an idea of variability between runs so we can
# be sure that way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list

The slowest run took 8.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 406 ns per loop
The slowest run took 4.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 398 ns per loop

答案 3 :(得分:0)

软件实现中的一个期望特征是具有低coupling。您的实现不应该通过Python解释器测试列表成员资格的方式来定义,因为这是一种高级别的耦合。可能是实施方式发生了变化,而且不再是更快的方式。

在这种情况下,我们应该关注的是,列表中成员资格的测试与列表的大小呈线性关系。如果需要更快的成员资格测试,您可以使用集合。