binarySearch vs in,Unexpected Results(Python)

时间:2015-08-11 08:22:05

标签: python time-complexity binary-search

我试图在python2中比较in和binarySearch的复杂性。期望O(1)表示in和O(logn)表示binarySearch。但是,结果出人意料。程序定时不正确还是有其他错误?

以下是代码:

import time

x = [x for x in range(1000000)]
def Time_in(alist,item):
    t1  = time.time()
    found = item in alist
    t2 = time.time()
    timer = t2 - t1  
    return found, timer

def Time_binarySearch(alist, item):
    first = 0
    last = len(alist)-1
    found = False 
    t1 = time.time()
    while first<=last and not found:
        midpoint = (first + last)//2
        if alist[midpoint] == item:
            found = True
        else:
            if item < alist[midpoint]:
                last = midpoint-1
            else:
                first = midpoint+1
    t2 = time.time()
    timer = t2 - t1
    return found, timer

print "binarySearch: ", Time_binarySearch(x, 600000)
print "in: ", Time_in(x, 600000)

结果是:

enter image description here

3 个答案:

答案 0 :(得分:3)

二进制搜索速度非常快,当您尝试打印时间时,它只会打印0.0。使用in需要足够长的时间才能看到它所用的一小部分时间。

in确实需要更长时间的原因是因为这是一个列表,而不是set或类似的数据结构;而对于一个集合,成员资格测试介于O(1)和O(logn)之间,在列表中,必须按顺序检查每个元素,直到匹配,或者列表用尽。

以下是一些基准测试代码:

from __future__ import print_function

import bisect
import timeit


def binarysearch(alist, item):
    first = 0
    last = len(alist) - 1
    found = False
    while first <= last and not found:
        midpoint = (first + last) // 2
        if alist[midpoint] == item:
            found = True
        else:
            if item < alist[midpoint]:
                last = midpoint - 1
            else:
                first = midpoint + 1
    return found


def bisect_index(alist, item):
    idx = bisect.bisect_left(alist, item)
    if idx != len(alist) and alist[idx] == item:
        found = True
    else:
        found = False
    return found


time_tests = [
    ('    600 in list(range(1000))',
     '600 in alist',
     'alist = list(range(1000))'),
    ('    600 in list(range(10000000))',
     '600 in alist',
     'alist = list(range(10000000))'),

    ('    600 in set(range(1000))',
     '600 in aset',
     'aset = set(range(1000))'),
    ('6000000 in set(range(10000000))',
     '6000000 in aset',
     'aset = set(range(10000000))'),

    ('binarysearch(list(range(1000)), 600)',
     'binarysearch(alist, 600)',
     'from __main__ import binarysearch; alist = list(range(1000))'),
    ('binarysearch(list(range(10000000)), 6000000)',
     'binarysearch(alist, 6000000)',
     'from __main__ import binarysearch; alist = list(range(10000000))'),

    ('bisect_index(list(range(1000)), 600)',
     'bisect_index(alist, 600)',
     'from __main__ import bisect_index; alist = list(range(1000))'),
    ('bisect_index(list(range(10000000)), 6000000)',
     'bisect_index(alist, 6000000)',
     'from __main__ import bisect_index; alist = list(range(10000000))'),
    ]

for display, statement, setup in time_tests:
    result = timeit.timeit(statement, setup, number=1000000)
    print('{0:<45}{1}'.format(display, result))

结果:

# Python 2.7

    600 in list(range(1000))                 5.29039907455
    600 in list(range(10000000))             5.22499394417
    600 in set(range(1000))                  0.0402979850769
6000000 in set(range(10000000))              0.0390179157257
binarysearch(list(range(1000)), 600)         0.961972951889
binarysearch(list(range(10000000)), 6000000) 3.014950037
bisect_index(list(range(1000)), 600)         0.421462059021
bisect_index(list(range(10000000)), 6000000) 0.634694814682

# Python 3.4

    600 in list(range(1000))                 8.578510413994081
    600 in list(range(10000000))             8.578105041990057
    600 in set(range(1000))                  0.04088461003266275
6000000 in set(range(10000000))              0.043901249999180436
binarysearch(list(range(1000)), 600)         1.6799193460028619
binarysearch(list(range(10000000)), 6000000) 6.099467994994484
bisect_index(list(range(1000)), 600)         0.5168328559957445
bisect_index(list(range(10000000)), 6000000) 0.7694612839259207

# PyPy 2.6.0 (Python 2.7.9)

    600 in list(range(1000))                 0.122292041779
    600 in list(range(10000000))             0.00196599960327
    600 in set(range(1000))                  0.101480007172
6000000 in set(range(10000000))              0.00759720802307
binarysearch(list(range(1000)), 600)         0.242530822754
binarysearch(list(range(10000000)), 6000000) 0.189949035645
bisect_index(list(range(1000)), 600)         0.132127046585
bisect_index(list(range(10000000)), 6000000) 0.197204828262

答案 1 :(得分:2)

为什么在测试列表中是否包含元素时,是否期望O(1)? 如果您对列表一无所知(就像它在示例中那样排序),那么您必须浏览每个元素并进行比较。

所以你得到O(N)。

Python列表不能假设您存储在其中的内容,因此他们必须使用list.__contains__的简单实现。 如果您想要更快的测试,那么您可以尝试使用字典或设置。

答案 2 :(得分:0)

以下是python中列表的所有方法的时间复杂性: enter image description here

因为可以看出s中的x是O(n),它明显慢于binarySearch O(logn)。