我目前正在leetcode.com上做一些编码问题,并且由于输出相同结果的两个算法之间的计算时间差异而感到难过。
问题: 给定一个字符串数组字,找到length(word [i])* length(word [j])的最大值,其中两个字不共享公共字母。您可以假设每个单词仅包含小写字母。如果不存在这两个单词,则返回0。
至少需要比较字符串对的每个组合才能计算出答案。我意识到按位更快/更有效率,但我选择忽略它是为了这个问题。
您可以在本文的底部找到我正在使用的测试用例单词列表。
算法1:
curr_max=0
for i in range(0,len(words)):
for j in range(0,len(words)):
if i<j:
curr_word=set(words[i])
other_word=words[j]
for char in curr_word:
if char in other_word:
break
else:
curr_max=max(curr_max,len(words[i])*len(other_word))
print(curr_max)
这个算法本质上是一个嵌套的for循环,比较每个可能的单词组合,如果不是“if i”,它将包括双重计数和自我计数
算法2:
curr_max = 0
while words:
curr_word = set(words[0])
curr_len = len(words[0])
words = words[1:]
for word in words:
for char in curr_word:
if char in word:
break
else:
curr_max = max(curr_max, curr_len*len(word))
print(curr_max)
该算法基本上将列表中的第一个单词与所有其他单词进行比较,然后将其从列表中删除并重复该过程。它大约需要0.02秒,大约快4倍。我很难理解为什么差异如此之大。提前感谢您的帮助!
我用来比较时间的输入是:
words=["eedfdddadbcc","acdadecebafaebaec","dfde","ececbefe","bbafebbeccbddddbd","eafeffddbbbf","cbd","abadeaddfbcfbadb","ffdcacaebbeaa","fbadcfeede","bdcefdbfec","bbadfccdfebefd","dbefdfabededb","cbbccdecfbbe","abaeeecdbecebafedbfee","fbefbdfc","fffafb","bfadbdefbfedbddff","cbccbfdadbfe","bacaafecbbfaae","fcbffdbefcfbccd","fefaadfaaafdfbdaff","ecabaff","ccbdefcdcfac","bbbfafbffcefbc","edecefa","bcdfbcebabae","aaefecfcbbccfaeeaf","beaabaaeaebef","adfac","acedfdabccebc","efbfbef","bfccadbcbcfcdabfa","ffcaddbcf","dfae","ccadeeebeaabddebcadec","babaa","ebbdbaabddfdddad","fafaddbaebdaa","eeeeddadedfe","effbca","abcddfa","cbadcfeffeaaeecbbfe","ceaabcfaaaefeeadf","acecadddde","ece","dc","bfafdefbbdafacdcfb","fbdcad","dbaaffcdbcbdea","baaee","bebed","beaedceceaa","eacbcfdbcefbaddffcac","acddaedacfeedffad","efebff","efcbf","cdfffffaacfacafb","adacaceea","fceeffededbcfbfaaf","eafaeffcbfde","debadcddbdbabefdbe","ef","eeeeabfbaabddaecb","eeadcdcdaacaabe","ebcbffdefafdcbcebec","eb","adedefbaabfcefbea","ddceabfddaaefcea","ddffb","fdadfac","de","cbcdcbddcdabeb","ccffeeafbbbf","ccba","dab","bbdbeefdbef","cbec","ffcbefdbfdacdbdbf","adfad","ceacdcbfbdbbaebbd","ecfeaefff","ddbbdaefddeebd","eeeee","abdadc","eafecdbdef","aeedaeeaebaaeecdd","dddeebcbdea","bcaadedacb","ebdeadddcafa","ecbdbcbfccbdffaef","fddcffbfffa","accbdcfcedeabeab","cfbbefbddcaeecfbfacc","efffdaacbafeecdad","aaadfa","efeccbabdefaf","defebaddaafdcd","ecebcaacdaccaddcfeee","fdbbfecaffeafaa","bafccdbea","caa","deedefdeccead","bbfbfeaeddacfacea","daaefbbcbcdbfbfdda","aceed","cfeadadadbcff","eaefcdca","cefebdbaafeabdbdeaafd","abec","aeececad","cfeabcbaeaebdbcaada","aac","ebabffeb","fa","cf","dcebedefc","dbaedceecf","ffebaedafccceb","faefbeacaddefbe","eeadbfabfbbbfaeffaeea","affdecaca","ccfdcbdefcdfaddbbeaed","bc","feafaaabaceade","bebdfbbad","eeacaefaddacac","fff","aeddcd","ccffbabbdfc","ecddbcdeecdfbbb","debdbcdcafdcd","cfaebaeddbbd","efdada","becdccaeffeedcadbdd","feedbacc","cbbeebcdad","bfdcbdfdbcceadded","cfdbfdddafadadddcba","bcedeaeeaac","ffdcfccffdfffaebf","afffceaaadbbedfdd","faaeebdfbfefddebed","eedafbddeeaaadcdeaccc","eeceadafa","ebcfaccabea","eebdbbedcaedcbdcfaba","ecfcadaebacbdfdccebe","cbbabdadaee","cfeea","dec","cfedcbaabbaef","aacdabcbf","dfdbacadbebeedcd","bccccfdcdfe","cfcacdbcdccddcadce","dafafeccfaccaadeabbf","eaffaaffefccde","bbfbddccfda","fbdbbbbfbe","eafbcafbdbead","edbcdcefdc","fe","aafdcabce","ddafedceddcdcbfbcafe","dabcafbcfafeeadbbbef","beeaacd","cadeabebdbcfbbdfe","ecfefbfbbfa","fedacafcc","bcdcefecbcebaeeccdbd","fefde","cafba","bdabeaabbdbbbccecebda","dfeaadbeaaeefdfbed","dbaecde","cfbdfffbdeeeeb","fc","decadcacfaabca","cebbdff","badabbddcfed","fcce","fedfadefcf","acfccfbfcda","debfc","bebafeaeffe","ceaefbbcefacbbacb","cebbaeb","cadedfdafecdfb","bfefdfbaceddfcbade","cefeefaeddafbbdcade","faceadcefbffadb","cfbacafae","dfbfadfdccedbcbeaae","dbbccdddaf","ebbcbcebdddcedcfdcfaa","ccedffbcdbaedfaeb","ccfeaceaaaaeee","faade","afaaacaecbffdbadcbcd","cebfbbefbbdabbbffea","cdaadba","bbefdcacaaadbbbdedec","adabfbebdb","fcfefadcbadaacbdcfdbb","adddadebfc","fb","ecfebaacbdabece","dabacfdecfe","eeeecc","eabbe","fcdffababd","aafdbbcfdecbccca","efebaaadfecccecaa","cffefdbf","bcbdd","eaaccdcfdbbbcf"]
更新OP以帮助人们协助我解决此问题! 我现在用这个生成单词:
words = [''.join([choice('abcdefghijklmnopqrstuvwxyz') for _ in range(randrange(2, 22))]) for _ in range(250)]
我正在执行第一个算法:
t1=time.time()
curr_max=0
for i in range(0,len(words)):
for j in range(0,i):
curr_word=set(words[i])
other_word=words[j]
for char in curr_word:
if char in other_word:
break
else:
curr_max=max(curr_max,len(words[i])*len(other_word))
print(curr_max)
t0=time.time()
print(t0-t1)
我看到的结果是在0.1秒范围内。
我正在使用的第二个算法:
t1=time.time()
curr_max = 0
while words:
curr_word = set(words[0])
curr_len = len(words[0])
words = words[1:]
for word in words:
for char in curr_word:
if char in word:
break
else:
curr_max = max(curr_max, curr_len*len(word))
print(curr_max)
t0=time.time()
print(t0-t1)
我看到的结果是0.04-0.05秒。任何人都可以复制这个吗?
答案 0 :(得分:5)
两种算法看起来都像是相同的工作量。两者都重新创建itertools.combinations()
function。
但是,第一种方法更频繁地重新创建集合,并执行额外的n ** 2 i < j
次测试(对于n个单词)!
你正在创建超过2种组合的len(n),所以n! /(n - 2)! (见Wikipedia),这比n ** 2要小得多:
>>> import math
>>> n = 250
>>> math.factorial(n) / (math.factorial(2) * math.factorial(n - 2))
31125.0
>>> n ** 2
62500
>>> n ** 2 - (math.factorial(n) / (math.factorial(2) * math.factorial(n - 2)))
31375.0
因此,对于您的特定情况,算法#1执行的循环次数是算法#2的两倍多。随着n的增加,产品除以组合的数量接近2,所以它总是至少做两倍的工作。
接下来,您只在算法#2中创建一组words[0]
,但为算法#1的每个内循环执行
# algorithm #1
for ... # current word loop
for ... # other word loop
set(words[i])
# algorithm #2
while # current word loop
set(words[0])
for ... # other word loop
导致它变慢的那些差异;创建(N超过2个)集合而不仅仅是N集可能会使你在这里的大部分性能成本。
为了进行适当的比较,你应该使用多次重复测试的timeit
module,确保使用最准确的时钟来测量花费的时间并禁用Python垃圾收集器(所以它赢了#39;干扰)。我已经包含了一个随机单词列表,对于破坏性算法(你的#2),我每次都必须克隆列表,为此我通过减去相同数量的裸列表副本的时间进行补偿。
我跑的脚本:
from random import choice, randrange
from timeit import timeit
def naive_loop(words):
curr_max=0
for i in range(0,len(words)):
for j in range(0,len(words)):
if i<j:
curr_word=set(words[i])
other_word=words[j]
for char in curr_word:
if char in other_word:
break
else:
curr_max=max(curr_max,len(words[i])*len(other_word))
return curr_max
def destructive_loop(words):
curr_max = 0
while words:
curr_word = set(words[0])
curr_len = len(words[0])
words = words[1:]
for word in words:
for char in curr_word:
if char in word:
break
else:
curr_max = max(curr_max, curr_len*len(word))
return curr_max
def reduced_set_calls_loop(words):
curr_max=0
for i in range(0,len(words)):
curr_word=set(words[i])
for j in range(0,len(words)):
if i<j:
other_word=words[j]
for char in curr_word:
if char in other_word:
break
else:
curr_max=max(curr_max,len(words[i])*len(other_word))
return curr_max
words = [''.join([choice('abcdef') for _ in range(randrange(2, 22))]) for _ in range(250)]
number = 100
print('Naive:', timeit('naive_loop(words)', 'from __main__ import naive_loop, words', number=number))
print('Destructive:', # don't include time to copy a list
timeit('destructive_loop(words[:])', 'from __main__ import destructive_loop, words', number=number) -
timeit('words[:]', 'from __main__ import naive_loop, words', number=number))
print('Reduced set calls:', timeit('reduced_set_calls_loop(words)', 'from __main__ import reduced_set_calls_loop, words', number=number))
和结果:
Naive: 1.8516130640055053
Destructive: 0.3646556100138696
Reduced set calls: 0.5927464940032223
因此,移动set()
中的reduced_set_calls_loop()
电话会大大改善第一个版本。通过将if i < j
替换为for j in range(i):
循环来减少循环次数可进一步缩小差距:
>>> def reduced_iteration_loop(words):
... curr_max=0
... for i in range(0,len(words)):
... curr_word=set(words[i])
... for j in range(i):
... other_word=words[j]
... for char in curr_word:
... if char in other_word:
... break
... else:
... curr_max=max(curr_max,len(words[i])*len(other_word))
... return curr_max
...
>>> print('Reduced iteration:', timeit('reduced_iteration_loop(words)', 'from __main__ import reduced_iteration_loop, words', number=number))
Reduced iteration: 0.44450017900089733
令我惊讶的是,你的破坏性循环比使用itertools.combinations()
要快:
>>> from itertools import combinations
>>> def destructive_loop_empty(words):
... while words:
... curr_word, words = words[0], words[1:]
... for word in words:
... pass
...
>>> def empty_combinations(words):
... for a, b in combinations(words, 2):
... pass
...
>>> timeit('destructive_loop_empty(words[:])', 'from __main__ import destructive_loop_empty, words', number=1000)
0.324253979997593
>>> timeit('empty_combinations(words[:])', 'from __main__ import empty_combinations, words', number=1000)
0.5626872480061138
我们可以使用set disjunctions使您的算法#2更快,而不是单独测试每个字符。因为我们会反复测试单词,所以在字典中预先创建集合是有意义的,充当我们可以在测试时绘制的缓存。
最后,我们可以通过在字典中存储长度来制作非破坏性版本,并且只是循环遍历值(我们会破坏字典):
def nondestructive_loop(words):
curr_max = 0
words = {w: (set(w), len(w)) for w in words}
while words:
curr_word, curr_word_length = words.popitem()[1]
for other, other_length in words.values():
if curr_word.isdisjoint(other):
curr_max = max(curr_max, curr_word_length * other_length)
return curr_max
这是我能够做到最快的事情:
>>> print('Nondestructive:', timeit('nondestructive_loop(words)', 'from __main__ import nondestructive_loop, words', number=number))
Nondestructive: 0.2944725830020616
削减另外20%。
因此,总而言之,直接在列表上进行迭代比从range()
生成索引,然后索引到列表中更快。差异足够大,值得你破坏列表(或字典)!
这也是itertools.combinations()
慢的原因; 使用索引,因为它必须支持大于2的组合(这意味着你不能从输入序列中删除)。