我试图在python2中比较in和binarySearch的复杂性。期望O(1)表示in和O(logn)表示binarySearch。但是,结果出人意料。程序定时不正确还是有其他错误?
以下是代码:
import time
x = [x for x in range(1000000)]
def Time_in(alist,item):
t1 = time.time()
found = item in alist
t2 = time.time()
timer = t2 - t1
return found, timer
def Time_binarySearch(alist, item):
first = 0
last = len(alist)-1
found = False
t1 = time.time()
while first<=last and not found:
midpoint = (first + last)//2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1
t2 = time.time()
timer = t2 - t1
return found, timer
print "binarySearch: ", Time_binarySearch(x, 600000)
print "in: ", Time_in(x, 600000)
结果是:
答案 0 :(得分:3)
二进制搜索速度非常快,当您尝试打印时间时,它只会打印0.0
。使用in
需要足够长的时间才能看到它所用的一小部分时间。
in
确实需要更长时间的原因是因为这是一个列表,而不是set
或类似的数据结构;而对于一个集合,成员资格测试介于O(1)和O(logn)之间,在列表中,必须按顺序检查每个元素,直到匹配,或者列表用尽。
以下是一些基准测试代码:
from __future__ import print_function
import bisect
import timeit
def binarysearch(alist, item):
first = 0
last = len(alist) - 1
found = False
while first <= last and not found:
midpoint = (first + last) // 2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint - 1
else:
first = midpoint + 1
return found
def bisect_index(alist, item):
idx = bisect.bisect_left(alist, item)
if idx != len(alist) and alist[idx] == item:
found = True
else:
found = False
return found
time_tests = [
(' 600 in list(range(1000))',
'600 in alist',
'alist = list(range(1000))'),
(' 600 in list(range(10000000))',
'600 in alist',
'alist = list(range(10000000))'),
(' 600 in set(range(1000))',
'600 in aset',
'aset = set(range(1000))'),
('6000000 in set(range(10000000))',
'6000000 in aset',
'aset = set(range(10000000))'),
('binarysearch(list(range(1000)), 600)',
'binarysearch(alist, 600)',
'from __main__ import binarysearch; alist = list(range(1000))'),
('binarysearch(list(range(10000000)), 6000000)',
'binarysearch(alist, 6000000)',
'from __main__ import binarysearch; alist = list(range(10000000))'),
('bisect_index(list(range(1000)), 600)',
'bisect_index(alist, 600)',
'from __main__ import bisect_index; alist = list(range(1000))'),
('bisect_index(list(range(10000000)), 6000000)',
'bisect_index(alist, 6000000)',
'from __main__ import bisect_index; alist = list(range(10000000))'),
]
for display, statement, setup in time_tests:
result = timeit.timeit(statement, setup, number=1000000)
print('{0:<45}{1}'.format(display, result))
结果:
# Python 2.7
600 in list(range(1000)) 5.29039907455
600 in list(range(10000000)) 5.22499394417
600 in set(range(1000)) 0.0402979850769
6000000 in set(range(10000000)) 0.0390179157257
binarysearch(list(range(1000)), 600) 0.961972951889
binarysearch(list(range(10000000)), 6000000) 3.014950037
bisect_index(list(range(1000)), 600) 0.421462059021
bisect_index(list(range(10000000)), 6000000) 0.634694814682
# Python 3.4
600 in list(range(1000)) 8.578510413994081
600 in list(range(10000000)) 8.578105041990057
600 in set(range(1000)) 0.04088461003266275
6000000 in set(range(10000000)) 0.043901249999180436
binarysearch(list(range(1000)), 600) 1.6799193460028619
binarysearch(list(range(10000000)), 6000000) 6.099467994994484
bisect_index(list(range(1000)), 600) 0.5168328559957445
bisect_index(list(range(10000000)), 6000000) 0.7694612839259207
# PyPy 2.6.0 (Python 2.7.9)
600 in list(range(1000)) 0.122292041779
600 in list(range(10000000)) 0.00196599960327
600 in set(range(1000)) 0.101480007172
6000000 in set(range(10000000)) 0.00759720802307
binarysearch(list(range(1000)), 600) 0.242530822754
binarysearch(list(range(10000000)), 6000000) 0.189949035645
bisect_index(list(range(1000)), 600) 0.132127046585
bisect_index(list(range(10000000)), 6000000) 0.197204828262
答案 1 :(得分:2)
为什么在测试列表中是否包含元素时,是否期望O(1)? 如果您对列表一无所知(就像它在示例中那样排序),那么您必须浏览每个元素并进行比较。
所以你得到O(N)。
Python列表不能假设您存储在其中的内容,因此他们必须使用list.__contains__
的简单实现。
如果您想要更快的测试,那么您可以尝试使用字典或设置。
答案 2 :(得分:0)