我了解哈希表查找的平均情况为O(1)
,但这是否包括计算给定输入的哈希值本身所需的时间?我尝试在Google上寻找答案,阅读了所有需要的文档,但是找不到Python中内部hash()
函数的实现。一些网站指出,计算哈希值需要花费固定的时间,有人说它是O(k)
,其中k
是输入的长度。如果您能帮助我找到正确的答案,我将非常高兴。在此先感谢:)
答案 0 :(得分:1)
我做了一个小测试来检验假设。结果似乎并不取决于输入的长度。
import datetime
x = ['a','aa','aaaaaaaaaaaaaaaaaaaaaaaaaaaa','aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa']
for i in range(len(x)):
for j in range(len(x)):
print "Checking for: " + x[i] + " " + x[j]
a = datetime.datetime.now()
h = hash((x[i],x[j]))
b = datetime.datetime.now()
c = b - a
print "Time taken : " + str(c.microseconds)
结果
Checking for: a a
Time taken : 87
Checking for: a aa
Time taken : 10
Checking for: a aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: a aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: a aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aa a
Time taken : 9
Checking for: aa aa
Time taken : 8
Checking for: aa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa a
Time taken : 10
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aa
Time taken : 8
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 8
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 11
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a
Time taken : 10
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 8
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
答案 1 :(得分:1)
这完全取决于要散列的类型。以下是CPython 2.7.13中的一些简单测试,这不是唯一的选择:
>>> pprint.pprint([(i, timeit.timeit('hash(n)', setup='n="a"*{}'.format(6400*i), number=1)) for i in range(16)])
[(0, 1.9073486328125e-06),
(1, 1.6927719116210938e-05),
(2, 3.314018249511719e-05),
(3, 4.887580871582031e-05),
(4, 6.4849853515625e-05),
(5, 8.106231689453125e-05),
(6, 9.679794311523438e-05),
(7, 0.00011301040649414062),
(8, 0.00012993812561035156),
(9, 0.00014495849609375),
(10, 0.00016188621520996094),
(11, 0.0001780986785888672),
(12, 0.00019288063049316406),
(13, 0.0002090930938720703),
(14, 0.000225067138671875),
(15, 0.00024199485778808594)]
>>> pprint.pprint([(i, timeit.timeit('hash(n)', setup='n="a"*{}'.format(6400*i))) for i in range(16)])
[(0, 0.09920382499694824),
(1, 0.09032988548278809),
(2, 0.09069585800170898),
(3, 0.09006309509277344),
(4, 0.09059309959411621),
(5, 0.09033513069152832),
(6, 0.09037399291992188),
(7, 0.09031510353088379),
(8, 0.09110498428344727),
(9, 0.09074902534484863),
(10, 0.0909719467163086),
(11, 0.09081602096557617),
(12, 0.09112405776977539),
(13, 0.09091711044311523),
(14, 0.09103798866271973),
(15, 0.09085893630981445)]
请注意,如何对新创建的字符串进行哈希处理是O(n),但是每个字符串都在对其哈希进行缓存,因此重复时它会摊分到固定时间(number=1000000
是timeit
的默认值)。
>>> pprint.pprint([(i, timeit.timeit('hash(n)', setup='n=2**{}'.format(64*i))) for i in range(16)])
[(0, 0.09280180931091309),
(1, 0.09100484848022461),
(2, 0.09413909912109375),
(3, 0.09609699249267578),
(4, 0.10647201538085938),
(5, 0.1146399974822998),
(6, 0.12569880485534668),
(7, 0.1291029453277588),
(8, 0.13350296020507812),
(9, 0.1369338035583496),
(10, 0.14037799835205078),
(11, 0.14420413970947266),
(12, 0.1485278606414795),
(13, 0.15162205696105957),
(14, 0.15520405769348145),
(15, 0.15993809700012207)]
long
也是O(n),其中n是数字的宽度,因此是对数。粒度为digit
的粒度,通常为2 ** 30,专门用于较小整数的哈希。
其他对象将具有自己的行为,例如tuples将其内容的哈希时间大致相加。