在尝试优化模拟树结构的程序的速度时(“Tree”存储在DICT中,笛卡尔坐标x,y坐标对作为键)我发现将它们的唯一地址存储在字典中作为元组而不是字符串可以大大加快运行时间。
我的问题是,如果Python针对字典和散列中的字符串键进行了优化,为什么在这个示例中使用元组这么快?执行完全相同的任务时,字符串键似乎需要花费60%的时间。在我的例子中,我是否忽略了一些简单的东西?
我引用这个帖子作为我的问题的基础(以及其他使得字符串更快的断言):Is it always faster to use string as key in a dict?
下面是我用来测试方法的代码,并给它们计时:
import time
def writeTuples():
k = {}
for x in range(0,500):
for y in range(0,x):
k[(x,y)] = "%s,%s"%(x,y)
return k
def readTuples(k):
failures = 0
for x in range(0,500):
for y in range(0,x):
if k.get((x,y)) is not None: pass
else: failures += 1
return failures
def writeStrings():
k = {}
for x in range(0,500):
for y in range(0,x):
k["%s,%s"%(x,y)] = "%s,%s"%(x,y)
return k
def readStrings(k):
failures = 0
for x in range(0,500):
for y in range(0,x):
if k.get("%s,%s"%(x,y)) is not None: pass
else: failures += 1
return failures
def calcTuples():
clockTimesWrite = []
clockTimesRead = []
failCounter = 0
trials = 100
st = time.clock()
for x in range(0,trials):
startLoop = time.clock()
k = writeTuples()
writeTime = time.clock()
failCounter += readTuples(k)
readTime = time.clock()
clockTimesWrite.append(writeTime-startLoop)
clockTimesRead.append(readTime-writeTime)
et = time.clock()
print("The average time to loop with tuple keys is %f, and had %i total failed records"%((et-st)/trials,failCounter))
print("The average write time is %f, and average read time is %f"%(sum(clockTimesWrite)/trials,sum(clockTimesRead)/trials))
return None
def calcStrings():
clockTimesWrite = []
clockTimesRead = []
failCounter = 0
trials = 100
st = time.clock()
for x in range(0,trials):
startLoop = time.clock()
k = writeStrings()
writeTime = time.clock()
failCounter += readStrings(k)
readTime = time.clock()
clockTimesWrite.append(writeTime-startLoop)
clockTimesRead.append(readTime-writeTime)
et = time.clock()
print("The average time to loop with string keys is %f, and had %i total failed records"%((et-st)/trials,failCounter))
print("The average write time is %f, and average read time is %f"%(sum(clockTimesWrite)/trials,sum(clockTimesRead)/trials))
return None
calcTuples()
calcStrings()
谢谢!
答案 0 :(得分:4)
测试结果不公平(因此时间差异)。您在format
循环中调用writeStrings
的次数与在writeTuples
循环中调用次数相同,并且您在readStrings
中对其进行了无限次调用。要做一个更公平的测试,你需要确保:
%
readStrings
和readTuples
每个内循环都会对%
进行一次或零次调用。答案 1 :(得分:0)
我想说速度的差异是由于访问者密钥的字符串格式化所致。
在writeTuples中你有这一行:
k[(x,y)] = ...
在传递给k的访问器之前,它创建一个新元组并赋值(x,y)。
在writeStrings中你有这一行:
k["%s,%s"%(x,y)] = ...
与writeTuples中的所有计算相同,但也有解析字符串“%s,%s”的开销(这可能在编译时完成,我不确定)但是它还必须构建数字中的新字符串(例如“12,15”)。我相信这就是增加运行时间。
答案 2 :(得分:0)
正如其他人所说,字符串格式化是个问题。
这是预先计算所有字符串的快速版本...
在我的机器上,写字符串比编写元组快约27%。写/读的速度提高了约22%。
我很快就重新格式化了将你的东西简化为timeit。如果逻辑有点不同,你可以计算读取与写入之间的差异。
import timeit
samples = []
for x in range(0,360):
for y in range(0,x):
i = (x,y)
samples.append( ( i, "%s,%s"%i) )
def write_tuples():
k = {}
for pair in samples:
k[pair[0]] = True
return k
def write_strings():
k = {}
for pair in samples:
k[pair[1]] = True
return k
def read_tuples(k):
failures = 0
for pair in samples:
if k.get(pair[0]) is not None: pass
else: failures += 1
return failures
def read_strings(k):
failures = 0
for pair in samples:
if k.get(pair[1]) is not None: pass
else: failures += 1
return failures
stmt_t1 = """k = write_tuples()"""
stmt_t2 = """k = write_strings()"""
stmt_t3 = """k = write_tuples()
read_tuples(k)"""
stmt_t4 = """k = write_strings()
read_strings(k)"""
t1 = timeit.Timer(stmt=stmt_t1, setup = "from __main__ import samples, read_strings, write_strings, read_tuples, write_tuples")
t2 = timeit.Timer(stmt=stmt_t2, setup = "from __main__ import samples, read_strings, write_strings, read_tuples, write_tuples")
t3 = timeit.Timer(stmt=stmt_t3, setup = "from __main__ import samples, read_strings, write_strings, read_tuples, write_tuples")
t4 = timeit.Timer(stmt=stmt_t4, setup = "from __main__ import samples, read_strings, write_strings, read_tuples, write_tuples")
print "write tuples : %s" % t1.timeit(100)
print "write strings : %s" % t2.timeit(100)
print "write/read tuples : %s" % t3.timeit(100)
print "write/read strings : %s" % t4.timeit(100)
答案 3 :(得分:0)
我在Core i5 1.8GHz计算机上运行您的代码并获得以下结果
0.076752
与0.085863
元组相对于循环的字符串0.049446
与0.050731
0.027299
与0.035125
所以元组似乎是胜利,但是你在写函数中进行了两次字符串转换。将writeStrings
更改为
def writeStrings():
k = {}
for x in range(0,360):
for y in range(0,x):
s = "%s,%s"%(x,y)
k[s] = s
return k
0.101689
与0.092957
元组相对于循环的字符串0.064933
与0.044578
0.036748
与0.048371
首先要注意的是结果有很多变化,所以你可能想要将trials=100
更改为更大的东西,回想一下python的timeit
我认为默认为10000 。我做了trials=5000
0.081944
与0.067829
元组相对于循环的字符串0.052264
与0.032866
0.029673
与0.034957
所以字符串版本更快,但正如在其他帖子中已经指出的那样,不是字典查找,这是字符串转换正在受到伤害。