Question

我试图在python2中比较in和binarySearch的复杂性。期望O（1）表示in和O（logn）表示binarySearch。但是，结果出人意料。程序定时不正确还是有其他错误？

以下是代码：

import time

x = [x for x in range(1000000)]
def Time_in(alist,item):
    t1  = time.time()
    found = item in alist
    t2 = time.time()
    timer = t2 - t1  
    return found, timer

def Time_binarySearch(alist, item):
    first = 0
    last = len(alist)-1
    found = False 
    t1 = time.time()
    while first<=last and not found:
        midpoint = (first + last)//2
        if alist[midpoint] == item:
            found = True
        else:
            if item < alist[midpoint]:
                last = midpoint-1
            else:
                first = midpoint+1
    t2 = time.time()
    timer = t2 - t1
    return found, timer

print "binarySearch: ", Time_binarySearch(x, 600000)
print "in: ", Time_in(x, 600000)

结果是：

Answer 1

二进制搜索速度非常快，当您尝试打印时间时，它只会打印0.0。使用in需要足够长的时间才能看到它所用的一小部分时间。

in确实需要更长时间的原因是因为这是一个列表，而不是set或类似的数据结构;而对于一个集合，成员资格测试介于O（1）和O（logn）之间，在列表中，必须按顺序检查每个元素，直到匹配，或者列表用尽。

以下是一些基准测试代码：

from __future__ import print_function

import bisect
import timeit


def binarysearch(alist, item):
    first = 0
    last = len(alist) - 1
    found = False
    while first <= last and not found:
        midpoint = (first + last) // 2
        if alist[midpoint] == item:
            found = True
        else:
            if item < alist[midpoint]:
                last = midpoint - 1
            else:
                first = midpoint + 1
    return found


def bisect_index(alist, item):
    idx = bisect.bisect_left(alist, item)
    if idx != len(alist) and alist[idx] == item:
        found = True
    else:
        found = False
    return found


time_tests = [
    ('    600 in list(range(1000))',
     '600 in alist',
     'alist = list(range(1000))'),
    ('    600 in list(range(10000000))',
     '600 in alist',
     'alist = list(range(10000000))'),

    ('    600 in set(range(1000))',
     '600 in aset',
     'aset = set(range(1000))'),
    ('6000000 in set(range(10000000))',
     '6000000 in aset',
     'aset = set(range(10000000))'),

    ('binarysearch(list(range(1000)), 600)',
     'binarysearch(alist, 600)',
     'from __main__ import binarysearch; alist = list(range(1000))'),
    ('binarysearch(list(range(10000000)), 6000000)',
     'binarysearch(alist, 6000000)',
     'from __main__ import binarysearch; alist = list(range(10000000))'),

    ('bisect_index(list(range(1000)), 600)',
     'bisect_index(alist, 600)',
     'from __main__ import bisect_index; alist = list(range(1000))'),
    ('bisect_index(list(range(10000000)), 6000000)',
     'bisect_index(alist, 6000000)',
     'from __main__ import bisect_index; alist = list(range(10000000))'),
    ]

for display, statement, setup in time_tests:
    result = timeit.timeit(statement, setup, number=1000000)
    print('{0:<45}{1}'.format(display, result))

结果：

# Python 2.7

    600 in list(range(1000))                 5.29039907455
    600 in list(range(10000000))             5.22499394417
    600 in set(range(1000))                  0.0402979850769
6000000 in set(range(10000000))              0.0390179157257
binarysearch(list(range(1000)), 600)         0.961972951889
binarysearch(list(range(10000000)), 6000000) 3.014950037
bisect_index(list(range(1000)), 600)         0.421462059021
bisect_index(list(range(10000000)), 6000000) 0.634694814682

# Python 3.4

    600 in list(range(1000))                 8.578510413994081
    600 in list(range(10000000))             8.578105041990057
    600 in set(range(1000))                  0.04088461003266275
6000000 in set(range(10000000))              0.043901249999180436
binarysearch(list(range(1000)), 600)         1.6799193460028619
binarysearch(list(range(10000000)), 6000000) 6.099467994994484
bisect_index(list(range(1000)), 600)         0.5168328559957445
bisect_index(list(range(10000000)), 6000000) 0.7694612839259207

# PyPy 2.6.0 (Python 2.7.9)

    600 in list(range(1000))                 0.122292041779
    600 in list(range(10000000))             0.00196599960327
    600 in set(range(1000))                  0.101480007172
6000000 in set(range(10000000))              0.00759720802307
binarysearch(list(range(1000)), 600)         0.242530822754
binarysearch(list(range(10000000)), 6000000) 0.189949035645
bisect_index(list(range(1000)), 600)         0.132127046585
bisect_index(list(range(10000000)), 6000000) 0.197204828262

Answer 2

为什么在测试列表中是否包含元素时，是否期望O（1）？如果您对列表一无所知（就像它在示例中那样排序），那么您必须浏览每个元素并进行比较。

所以你得到O（N）。

Python列表不能假设您存储在其中的内容，因此他们必须使用list.__contains__的简单实现。如果您想要更快的测试，那么您可以尝试使用字典或设置。

Answer 3

以下是python中列表的所有方法的时间复杂性：

因为可以看出s中的x是O（n），它明显慢于binarySearch O（logn）。

binarySearch vs in，Unexpected Results（Python）

3 个答案: