我有一个低于一千的整数列表和一个哈希函数,它将它转换为一个整数但更大的整数。下面有一个哈希函数代码:
def hash_function(lst):
hsh = 0
for i, item in enumerate(lst):
hsh += item * pow(10, i * 3)
return hsh
假设lst
有大约4-5个项目。
比较两个整数比两个整数更小的整数更有效吗?为什么或者为什么不?我必须比较几十万个哈希值。
答案 0 :(得分:0)
我想出了一个快速测试来显示内置列表比较和你的哈希函数之间的差异。
import time
import random
import sys
def compareRegular(a, b):
return a == b
def listHash(lst):
hsh = 0
for i, item in enumerate(lst):
hsh += item * pow(10, i * 3)
return hsh
def compareHash(a, b):
return listHash(a) == listHash(b)
def compareLists(hugeList, comparison):
output = []
for i, lstA in enumerate(hugeList[:-1]):
for j, lstB in enumerate(hugeList[i + 1:]):
if comparison(lstA, lstB):
output.append([i, j])
return output
def genList(minValue, maxValue, numElements):
output = []
for _ in range(1000):
smallList = []
for _ in range(numElements):
smallList.append(random.randint(minValue, maxValue))
output.append(smallList)
return output
random.seed(123)
hugeListA = genList(-sys.maxint - 1, sys.maxint, 5)
hugeListB = genList(0, 100, 5)
print "Test with huge numbers in our list"
start = time.time()
regularOut = compareLists(hugeListA, compareRegular)
end = time.time()
print "Regular compare takes:", end - start
start = time.time()
hashOut = compareLists(hugeListA, compareHash)
end = time.time()
print "Regular compare takes:", end - start
print "Are both outputs the same?", regularOut == hashOut
print
print "Test with smaller number in our lists"
start = time.time()
regularOut = compareLists(hugeListB, compareRegular)
end = time.time()
print "Regular compare takes:", end - start
start = time.time()
hashOut = compareLists(hugeListB, compareHash)
end = time.time()
print "Regular compare takes:", end - start
print "Are both outputs the same?", regularOut == hashOut
在我的电脑上输出:
Test with huge numbers in our list
Regular compare takes: 0.0940001010895
Regular compare takes: 3.38999986649
Are both outputs the same? True
Test with smaller number in our lists
Regular compare takes: 0.0789999961853
Regular compare takes: 3.01400017738
Are both outputs the same? True
开发python的人肯定花了很多时间思考这样的事情。我个人不知道内置列表比较实际上是如何工作的,但我很确定它不像你的哈希函数那样在Python解释器中执行。许多python内置的函数和类型都由本机执行的C代码支持,列表比较函数可能属于这一类。
即使您以类似的方式实现了哈希函数并使其本机执行,它仍然可能会更慢。你基本上是在看N次调用pow
或N次比较。即使它们是可变大小的整数,memcmp
肯定也不会比从内存和对其执行浮点运算的相同值加载时间更长。